How to Analyze Apache Error Logs | Sumo Logic

# Apache Error Log Analysis

Gain Deep Insight into Your Apache Server Environment

### Topic Filter

Done

Your Apache access and error logs contain a wealth of actionable insights about potential server configuration and web application issues. The problem is, this information is hidden within millions of log messages. The goal of Apache log analytics is to efficiently extract these insights so you can respond to problems before they impact your users.

Apache log analysis revolves around two activities: monitoring and troubleshooting.

First, you need to track key performance indicators in real-time dashboards so you can identify abnormal behavior as it’s happening. Then, when these dashboards indicate that something has gone wrong, you need a powerful query language to dig deeper into relevant log messages.

Together, the monitoring and troubleshooting features of an Apache log analyzer results in faster root cause analysis, increased uptime, and fewer headaches.

Apache error log analysis makes it easier to monitor problems in real time and troubleshoot critical issues when they occur. Your server’s error logs contain all the information you need to do these things, but extracting useful insights from millions of log entries can be tricky without a dedicated tool.

To follow along with the example queries, you can sign up for a free Sumo Logic account. Sumo Logic provides all the search, aggregation, and visualization tools you need to quickly identify the root cause of your website’s Apache errors.

## Apache System-Critical Error Log Analysis

Apache error log analysis makes it easier to monitor problems in real time and troubleshoot critical issues when they occur. Your server’s error logs contain all the information you need to do these things, but extracting useful insights from millions of log entries can be tricky without a dedicated tool.

## Isolating Apache System-Critical Error Logs

Depending on your LogLevel directive, Apache error logs can contain verbose details about the inner workings of your servers. A good place to start your error log analysis is to strip away this noise by isolating serious errors.

In Sumo Logic, you can extract emergency-, alert-, and critical-level error messages with the following query:

_sourceCategory=Apache/Error
| parse regex "$.*:(?<log_level>[a-z]+)$"
| where log_level in ("emerg", "alert", "crit")


Sumo Logic is designed to record all of your log data, which is why we need to select Apache error logs with the _sourceCategory metadata field. Also note the regular expression that parses the log_level assumes the default Apache 2.4 error log format.

Running this query will list all of the matching error log entries in the Messages tab, as shown above. This gives you a lot of debugging info, but it’s nothing you couldn’t find with a text editor. The real power of Apache log analytics is the ability to aggregate and visualize these error logs.

## Monitoring System-Critical Errors in Apache

To stay on top of system-critical errors, we can set up a live panel that displays the number of errors in real-time. First, we need to group logs into 5-minute intervals with the timeslice operator. This lets us count the total logs in each group with the count operator:

_sourceCategory=Apache/Error
| parse regex "$.*:(?<log_level>[a-z]+)$"
| where log_level in ("emerg", "alert", "crit")
| timeslice 5m
| count by _timeslice


Visualizing the results as an area chart gives us a clear picture of how many errors our Apache system is generating. We can then save this chart as a panel by adding it to a dashboard. Sumo Logic periodically re-executes the underlying query and updates the panel automatically.

This kind of real-time window into your Apache servers is the perfect complement to continuous integration environments. If an update to your web application causes serious problems, this panel will let you know immediately. You can then roll back the update and fix those issues before they affect too many of your visitors.

## Monitoring Apache Error Reasons

Knowing how many errors are occurring is a great first step towards making sense of our error logs, but it’s also useful to know what kinds of errors are occurring. Using the exact same process, we can create another panel that displays the most common error reasons. First, we need to form a query that extracts the information we’re after:

_sourceCategory=Apache/Error
| parse regex "$.*:(?<log_level>\w+)$ .*\] (?<reason>.*)$" | where log_level in ("emerg", "alert", "crit") | count reason | sort _count | limit 10  Then we can save the resulting table as a live panel: The idea is to build up dashboards that contain all the metrics you’ll need when your system crashes and you have to switch into troubleshooting mode. ## Identifying Malicious Client IPs in Apache Logs Real-time monitoring lets you know that errors are occurring, but you also need to understand why they’re occurring. After your dashboards tell you that something has gone wrong, the next step is to look for more specific information with custom queries. This is the troubleshooting aspect of Apache log analytics. We already know the most common error reasons from our panel in the previous section. Now, it’s time to ask deeper questions like who is causing system-critical errors: _sourceCategory=Apache/Error | parse regex "(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" | parse regex "$.*:(?<log_level>[a-z]+)$" | where client_ip !="" AND log_level in ("emerg", "alert", "crit") | count client_ip | top 10 client_ip by _count  Any users causing a disproportionate number of errors will be immediately apparent after visualizing the results as a pie chart. If you do happen to find that a particular user or IP range is crashing your system, you can block those addresses with a deny from directive in your .htaccess file: order allow,deny deny from 142.181.34.9 allow from all  ## Identifying Apache Server Starts/Stops Troubleshooting often involves looking for specific kinds of errors. For example, the data in your panels might suggest that a server is rebooting too often. You can perform a custom query to get a clearer picture of server start and stop events: _sourceCategory=Apache/Error | parse regex ".*\] (?<reason>.*)$"
| if(reason matches "caught SIGTERM, shutting down", 1, 0) as server_stop
| if(reason matches "*-- resuming normal operations", 1, 0) as server_start
| timeslice by 5m
| sum(server_stop) as server_stops, sum(server_start) as server_starts by _timeslice


This inspects each log entry and looks for the specific messages that Apache generates every time it starts or stops. Any abnormal behavior is easy to see after graphing the results as a stacked column chart:

But, this is only the beginning of the troubleshooting process. To find the root cause of the restart events, you’ll need to perform more custom queries around the time frames indicated by the above results.

The actual queries involved in Apache log analytics aren’t generally all that complicated. The hard part is figuring out which questions to ask and how to find the answer in your log data. As we saw in this section, effective analysis requires intimate knowledge of the log messages produced by your server.

A good way to approach error log analysis is peacetime preparation followed by wartime troubleshooting. During peacetime, you’re getting ready for when things go wrong by configuring panels that contain all the metrics you’re interested in. During wartime, these panels guide your debugging efforts and help you write custom queries that identify the root cause of the problem.

## Analyzing Apache Status Code Response Errors

Unlike system-critical errors, Apache 400- and 500-level status codes usually relate more to content and linking issues rather than problems with your server configuration. In this article, we’ll learn how a dedicated Apache access log analyzer can make it much easier to monitor and troubleshoot status code errors.

## Isolating Apache Status Code Errors

To isolate access logs that contain 400- and 500-level status codes, we need to extract the status code from each log using the parse operator. Then, it’s easy to constrain the query to find status code errors with a where clause:

_sourceCategory=Apache/Access
| parse "HTTP/1.1\" * " as status_code
| where num(status_code) >= 400


_sourceCategory is a metadata field that Sumo Logic attaches to each log message as it’s collected, and Apache/Access is the canonical label for Apache access logs. If you used a different value when setting up your source, be sure to change your query accordingly.

Even for moderately busy websites, Apache servers produce millions of access logs. The first step towards identifying useful trends in all this data is to get rid of logs that we’re not interested in. This allows us to perform calculations with relevant log entries and visualize the results. In turn, this makes it much easier to monitor potential problems than sifting through Apache logs with grep.

## Monitoring Apache Status Code Errors

For example, we can graph our status code errors over time with the following query:

_sourceCategory=Apache/Access
| parse "HTTP/1.1\" * " as status_code
| where num(status_code) >= 400
| timeslice 5m
| count as count by _timeslice, status_code
| transpose row _timeslice column status_code as *


After adding the timeslice and count operators, Sumo Logic automatically enables its graphing capabilities. All it takes is a few clicks to display these results as a stacked column chart. This gives us an at-a-glance view of every status code error in our Apache system.

But, the monitoring capabilities of Sumo Logic revolve around live dashboards, not custom searches. Dashboards consist of multiple panels that track different key performance indicators (KPIs) in real time. The idea is to save our chart as a panel so we always have a transparent window into our Apache web server’s operations.

We now have a lot of status code information at our fingertips. If a PHP script starts to hang, we’ll see a spike in 500 errors. If a referring site contains broken links, we’ll see 404 errors go up. Even obscure errors like 503s caused by an overloaded server will be readily apparent.

## Monitoring Apache 404 URLs

Configuring live dashboards is all about preparing for when your server breaks. To this end, we should probably include another panel that displays which URLs are generating 404 errors.

_sourceCategory=Apache/Access
| parse regex  "[A-Z]+ (?<url>.+) HTTP/1\.1\" (?<status_code>\d+) "
| where num(status_code)=404
| count as count by url
| sort count 
| limit 10


Just like our other panel, we can save this query in a dashboard so the information is readily accessible.

Of course, you’ll likely have more sophisticated dashboards set up for production monitoring, but even these two panels give us a realistic glimpse into the utility of Apache access log analytics. A common scenario might be:

• You see that 404s are spiking in our first panel.
• So, you look at our second panel to see which URLs are causing 404s.
• It turns out one particular URL is causing most of them, which means that it’s time to dig deeper to find the root cause of the 404s.

This is where you switch into troubleshooting mode and start running custom queries that investigate the data in your pre-configured panels.

## Identifying 404 Referrers

Odds are, these 404 errors are coming from a broken link. To figure out where this broken link is, we need to find all the referrers that are pointing people to the missing page. If the URL is /about-us, the following query will do just that:

_sourceCategory=Apache/Access
| parse regex  "[A-Z]+ (?<url>.+) HTTP/1\.1\" (?<status_code>\d+)\s\S+\s\"(?<referrer>\S+)\""
| where num(status_code)=404
| where url matches "*about-us*"
| count as count by referrer
| sort count 
| limit 10


The matches operator recognizes asterisk wildcards, making it easier to search for slugs in a URL. This query then tallies up how many times each referrer sent someone to the missing resource.

If you find external websites in this list, it probably means you changed a URL and forgot to add a redirect. Alternatively, you may find pages from your own site in the results, which could indicate broken internal links or missing media resources.

This query is also a good demonstration of the separation of concerns involved in Apache log analytics: monitoring vs. troubleshooting. You wouldn’t want to save this query as a live panel, because it’s much too specific to be of use as a monitoring metric.

## Identifying Unusual Behavior with Outliers

A certain amount of status code errors are expected based on your traffic volume. It’s important to keep this in mind when searching for atypical behavior because it means we need to replace questions like “Have there been more than a hundred 404 errors?” with “Have 404 errors fallen outside the expected range?”

One way to represent that “expected range” is as a multiple of the standard deviation around a rolling average. This is precisely what the outlier operator was designed to do:

_sourceCategory=Apache/Access
| parse "HTTP/1.1\" * " as status_code
| where num(status_code)=404
| timeslice 5m
| count _timeslice
| outlier _count window=6, threshold=2.5


This calculates the moving average of 404 errors using 6 data points, then detects when the number of 404 errors is beyond 2.5 standard deviations of that average. Graphing the results as a line chart shows both the range and any outliers that were detected:

The outlier operator can be useful in both panels or troubleshooting queries, but it really shines when used in real-time alerts (requires Sumo Logic Professional). It avoids setting static thresholds for the alerts, which often results in false-positives when your traffic is volatile or cyclical.

## Identify Apache Errors with Sumo Logic

While most status code errors are relatively straightforward to fix, identifying them with real-time visualizations is much more reliable and convenient than manually clicking through every link in your site or inspecting the raw text of your Apache log files.

You’re not just watching 500 errors occur; you’re figuring out why they’re occurring with troubleshooting queries, getting your developers to implement a solution, and verifying that it worked back in your live dashboards.

As a data structure, Apache logs are pretty simple. As you add more servers for load balancing, high-availability or new development environments, making sense of your log files becomes increasingly difficult. When you have a hundred servers generating millions of log messages, getting to the root cause of an issue is time-consuming and error prone.

What is needed is a dedicated Apache log analyzer tool to centralize your logs, monitor errors and provide the ability to troubleshoot issues as they occur in real time.

## Apache Error Log Analysis and Sumo Logic

Sumo Logic has built an app to specifically analyze and visualize errors logged by Apache servers. With the Sumo Logic App for Apache, you can:

• Monitor 404 errors
• Identify 404 URLs and referrers
• Set dynamic thresholds to alert on “abnormal” levels of 500- errors
• Optimize web resources
• Identify misbehaving bots
• Speed up Apache response times

Apache log analytics doesn’t exist in isolation. A tool like Sumo Logic is meant to integrate tightly with the rest of your web development workflow. You’re not just watching 500 errors occur; you’re figuring out why they’re occurring with troubleshooting queries, getting your developers to implement a solution, and verifying that it worked back in your live dashboards.

### Request A Free Sumo Logic Demo

Fill out the form below and a Sumo Logic representative will contact you to schedule your free demo.
“Sumo Logic brings everything together into one interface where we can quickly scan across 1,000 servers and gigabytes of logs and quickly identify problems. It’s awesome software and awesome support.”

Jon Dokuli,
VP of Engineering

### Thank you for signing up for Sumo Logic.

We are creating your account now.