
Get the report
MoreMay 11, 2019
Monitoring the health of a large Apache system is hard. As you add more servers to your infrastructure—whether it be for load balancing, high-availability, or simply separate development/production servers—making sense of your log files becomes increasingly difficult.
A dedicated Apache log analyzer solves this problem by providing a central location for managing logs, as well as built-in monitoring and troubleshooting tools. Instead of relying on custom scripts, it automatically collects logs from all of your servers. And, instead of manually searching the raw text of those logs with grep
, you can quickly extract relevant log messages with an intuitive query language.
You can sign up for a free Sumo Logic account to follow along with the example queries and begin centralizing your own Apache log data.
Sumo Logic is designed to collect both access logs and error logs from all of your Apache servers. This makes it possible to analyze your entire Apache infrastructure from a single interface, but it also means you need to understand how to isolate individual servers.
_sourceHost=www.example.com _sourceCategory=Apache/Error | parse regex "\[.*:(?<log_level>\w+)\] .*\] (?<reason>.*)$" | count by reason</reason></log_level>
The _sourceHost
and _sourceCategory
fields are metadata that Sumo Logic attaches to each log message as it’s collected. The former lets you identify individual Apache servers, while the latter lets you inspect access logs independently of error logs. You can customize the values for both fields while configuring your sources.
After running the above query, you’ll find all of the error logs from your www.example.com
source host in the Messages tab:
The ability to explore logs from a single server is essential for IT operations of any size. If you’re running a small website, it lets you view either development errors or production errors in isolation. For larger organizations, it lets you track the performance of load balancing clusters or diagnose outages in a high-availability cluster.
The _sourceHost
field not only lets you isolate logs from individual servers, but also enables metric comparisons across servers. For example, the following query counts the number of system-critical Apache errors in each server over time:
_sourceCategory=Apache/Error | parse regex "\[.*:(?<log_level>[a-z]+)\]" | where log_level in ("emerg", "alert", "crit") | timeslice 5m | count as count by _timeslice, _sourceHost | transpose row _timeslice column _sourceHost</log_level>
This is similar to the query we ran in Analyzing System-Critical Apache Errors, except it tracks errors on a per-server basis. Visualizing the results as a line chart gives us a real-time, at-a-glance window into our entire Apache infrastructure.
In Sumo Logic, the recommended workflow is to set up a real-time alert (requires Sumo Logic Professional) to let you know when a system-critical error occurred. When you receive the alert, you can pull up this panel and immediately determine which server needs attention. No queries, no grep
, and no SSH’ing into any of your servers is required.
Altering the previous query to display 500-level status codes from the access log instead of system-critical errors is trivial:
_sourceCategory=Apache/Access | parse regex "[A-Z]+ (?<url>.+) HTTP/1\.1\"\s+(?<status_code>\d+)\s" | where num(status_code) >= 500 | timeslice 5m | count as count by _timeslice, _sourceHost | transpose row _timeslice column _sourceHost</status_code></url>
Between these two panels, you can see every error in every server as they’re occurring—regardless of whether you have one Apache server or a thousand. This kind of transparency is simply not feasible when you’re manually sifting through your log files.
Sumo Logic includes built-in geolocation support, which can provide unique insights in a multi-server environment. For example, if you have a production server dedicated to East coast users and another one for West coast users, you can get immediate feedback about whether their routing is configured correctly:
_sourceHost=east.example.com _sourceCategory=Apache/Access | parse regex "(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" | lookup latitude, longitude from geo://default on ip = client_ip | count by latitude, longitude | sort _count</client_ip>
The lookup
operator converts the client IP address to latitude and longitude coordinates, and the resulting latitude
and longitude
fields automatically enable Sumo Logic’s map visualization. This generates an interactive map showing visitor locations for the specified _sourceHost
:
east.example.com
The number of separate components required to manually extract this kind of information from Apache access logs can be overwhelming:
Sumo Logic makes it possible to do all these things without writing a single line of code.
Analyzing Apache access and error logs can tell you precisely what went wrong in your web server infrastructure. The ability to do this in real time vastly reduces the mean time to resolution for server configuration and web application issues.
The value of a centralized Apache log analytics solution compounds when applied to a multi-server environment. Simply collecting logs from dozens of servers can be a burden, and extracting useful information from them often requires a great deal of technical skill.
The result for many companies is that they simply aren’t data mining their Apache logs. A tool like Sumo Logic ensures you aren’t ignoring the valuable insights in your log data by providing a transparent window into your web server operations.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial