Sumo Logic Illuminate White RGB

October 6–7, 2020 | A virtual experience Learn more

Learn more
Back to blog results

January 14, 2020 By Scott Fitzpatrick

Understanding the Apache Access Log: View, Locate and Analyze

As any developer or system administrator will tell you, log files are an extremely useful tool for debugging issues within a web application. In fact, log files are typically utilized as the primary source of information when a website is malfunctioning.

One specific log file that can be used in debugging applications (or simply gaining insight into visitor activity) is the access log produced by an Apache HTTP server. Below, I will get into the particulars of these logs: I’ll explain what gets recorded in the Apache access logs, where they can be found, and how to make sense of the data contained in the file. Since the real power of log data comes from comprehending the meaning of the data through analysis, I will also discuss the benefits of working with a log management and analytics platform (such as Sumo Logic) to derive valuable insights from access log data.

What are Apache Access Logs?

As mentioned above, the Apache access log is one of several log files produced by an Apache HTTP server. This particular log file is responsible for recording data for all requests processed by the Apache server. So if an individual visits a webpage on your site, the access log file will contain details regarding this event.

This information is valuable in a variety of situations: for example, if a common request is failing for each individual trying to get to a particular web page, the link may be pointing to a page that no longer exists; if a certain page on the site is taking longer than it should to load, log entries could indicate SQL queries that could be refactored to improve performance; if one particular page on the site is very popular, aggregating data from access logs could shine a light on commonly requested resources, thus enabling businesses to increase their popularity by providing more related content.

Where can I find Apache Access Logs?

The location of the Apache access logs is dependent upon the system on which the Apache HTTP server is running. The majority of Apache HTTP server instances run on Linux distributions. So, for the purposes of this article, we will stick to detailing where the Apache access logs can be found on a Linux machine.

On the Ubuntu Linux distribution, for example, access log records will be written to the following location by default:

/var/log/apache2/access.log

The default location may vary slightly on other Linux distributions, but you will not have to look very far in most cases. Ultimately, the location and format (more on this later) of the access logs are defined by a CustomLog directive which can be viewed and modified within your Apache HTTP server configuration.

Interpreting the Apache Access Logs

Now that you know what Apache access logs are and where they can be found, we can explain how to interpret the entries so that your development team and other IT personnel can make good use of them.

Reading Apache Access Logs

Making sense of the Apache access logs requires that the analyst understand the format in which the access logs are being recorded. As mentioned above, the format for the access logs is defined in the CustomLog directive along with the location. We will take a look at two popular log formats that are often utilized with Apache access logs below.

Common Log Format

The Common Log Format is a standardized text file format used by various web servers in generating server log files. With an Apache HTTP server, the Common Log Format can be used to produce access logs that are straightforward enough for developers and administrators to read. In addition, as it is a standardized format in use by multiple web servers, CLF-formatted log files can be easily used by many log analysis platforms.

An access log record written in the Common Log Format will look something like this:

127.0.0.1 - Scott [10/Dec/2019:13:55:36 -0700] "GET /server-status HTTP/1.1" 200 2326

The fields in the above sample record represent the following:

  • 127.0.0.1 - IP address of the client that made the request;
  • The hyphen defining the second field in the log file is the identity of the client. This field is often returned as a hyphen and Apache’s HTTP server documentation recommends that this particular field not be relied upon except in the case of a controlled internal network.
  • Scott - userid of the person requesting the resource;
  • [10/Dec/2019:13:55:36 -0700] - date and time of the request;
  • “GET /server-status HTTP/1.1" - request type and resource being requested;
  • 200 - HTTP response status code;
  • 2326 - size of the object returned to the client.

Combined Log Format

Another format that is often used with Apache access logs is the Combined Log Format. This format is very similar to the Common Log Format but contains a few extra fields to provide more information for use in analysis and debugging operations. An access log record that is recorded in the Combined Log Format looks something like this:

127.0.0.1 - Scott [10/Dec/2019:13:55:36 -0700] "GET /server-status HTTP/1.1" 200 2326 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"

As you can see, the first seven fields are identical to those in Common Log Format. The remaining fields represent two additional properties:

  • "http://localhost/" - This is the HTTP referer, which represents the address from which the request for the resource originated.
  • "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" - This is the User Agent, which identifies information about the browser that the client is using to access the resource.

The “CustomLog” Directive

Earlier, I mentioned that the configuration for Apache access logs is done via the CustomLog directive within an Apache HTTP server configuration file. Let’s take a look at a sample access log configuration to show the flexibility provided by the CustomLog directive:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

CustomLog /var/log/apache2/access.log combined

Here, we defined the combined log format via the LogFormat directive, and we followed that up by defining the location and format (combined) for the access log using the CustomLog directive. As you can see, modifying the location or format of the access log is a straightforward process. In addition, the use of the CustomLog directive affords us several other capabilities that we will describe below.

Multiple Access Logs

There is no rule that says you can’t configure multiple access logs for your Apache HTTP server, and the process is actually pretty easy; all you need to do is simply add additional CustomLog directives to add an extra, customized access log file:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

LogFormat "%{User-agent}i" agent

CustomLog /var/log/apache2/access.log combined

CustomLog /var/log/apache2/agent_access.log agent

Conditional Logs

In addition, it’s possible to write to access logs conditionally. This could be useful for a variety of reasons, including the exclusion of records associated with particular clients. Typically, this is done by setting environment variables and referencing them via the “env” clause. Visit the official documentation on the CustomLog directive for more information.

Log Rotation & Piped Logs

Like anything else on a server, log files take up space. And on a relatively busy Apache server, log files such as access logs can grow quickly. Therefore, it’s important to have processes in place for regularly moving or deleting old log files. Luckily, an Apache HTTP server has the ability to do this through the use of graceful restarts and piped log processes.

A graceful restart of an Apache server allows for restarting without losing client connections. This restart enables Apache to open and write to new log files without client interruption, thereby allowing the execution of processing to compress or delete old log files in the interest of saving space.

Piped log processes, on the other hand, can allow for log rotation to be performed without a server restart; for example, a program called rotatelogs is included with Apache HTTP server. Rather than simply writing to a file, access log entries can be written through a pipe to this particular program. The rotatelogs program includes options to rotate logs conditionally based on time or size.

Analyzing Apache Access Logs with Sumo Logic

Collecting massive amounts of data in log files is only useful if the data can be managed effectively and analyzed easily. When done properly, it produces valuable insights that can be leveraged to identify opportunities for improvement within your web server configuration or application. When working with Apache access logs, it’s best to integrate with Sumo Logic to collect your Apache log files, which makes the process for producing valuable visualizations less painful than ever.

The process for getting started is relatively easy. In fact, by simply configuring a SumoLogic collector and Local File Source for the Apache access log, you can be up and running in a basic sense in a matter of minutes. Check out Sumo Logic today to see how they can improve your processes for log management and data analysis.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic Continuous Intelligence Platform™

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Scott Fitzpatrick

Scott Fitzpatrick

Scott Fitzpatrick is a Fixate IO Contributor and has nearly 8 years of experience in software development. He has worked with many languages and frameworks, including Java, ColdFusion, HTML/CSS, JavaScript and SQL.

More posts by Scott Fitzpatrick.

People who read this also enjoyed