Back to blog results

November 3, 2015 By Michael Churchman

Log Analysis in a DevOps Environment

Log AnalysisLog analysis is a first-rate debugging tool for DevOps. But if all you’re using it for is finding and preventing trouble, you may be missing some of the major benefits of log analysis. What else can it offer you? Let’s talk about growth.

First of all, not all trouble shows up in the form of bugs or error messages; an “error-free” system can still be operating far below optimal efficiency by a variety of important standards. What is the actual response time from the user’s point of view? Is the program eating up clock cycles with unnecessary operations? Log analysis can help you identify bottlenecks, even when they aren’t yet apparent in day-to-day operations.

Use Cases for Log Analysis

Consider, for example, something as basic as database access. As the number of records grows, access time can slow down, sometimes significantly; there’s nothing new about that. But if the complexity and the number of tables in the database are also increasing, those factors can also slow down retrieval.

If the code that deals with the database is designed for maximum efficiency in all situations, it should handle the increased complexity with a minimum of trouble. The tricky part of that last sentence, however, is the phrase “in all situations”. In practice, most code is designed to be efficient under any conditions which seem reasonable at the time, rather than in perpetuity. A routine that performs an optional check on database records may not present any problem when the number of records is low, or when it only runs occasionally, but it may slow the system down if the number of affected records is too high, or if it is done too frequently. As conditions change, hidden inefficiencies in existing code are likely to make themselves known, particularly if the changes put greater demands on the system.

As inefficiencies of this kind emerge (but before they present obvious problems in performance) they are likely to show up in the system’s logs. As an example, a gradual increase in the time required to open or close a group of records might appear, which gives you a chance to anticipate and prevent any slowdowns that they might cause.

Log analysis can find other kinds of potential bottlenecks as well. For example, intermittent delays in response from a process or an external program can be hard to detect simply by watching overall performance, but they will probably show up in the log files. A single process with significant delays in response time can slow down the whole system. If two process are dependent on each other, and they each have intermittent delays, they can reduce the system’s speed to a crawl or even bring it to a halt. Log analysis should allow you to recognize these delays, as well as the dependencies which can amplify them.

Log Data Analytics – Beyond Ops

Software operation isn’t the only thing that can be made more efficient by log analysis. Consider the amount of time that is spent in meetings simply trying to get everybody on the same page when it comes to discussing technical issues. It’s far too easy to have a prolonged discussion of performance problems and potential solutions without the participants having a clear idea of the current state of the system. One of the easiest ways to bring such a meeting into focus and shorten discussion time is to provide everybody involved with a digest of key items from the logs, showing the current state of the system and highlighting problem areas.

Log analysis can also be a major aid to overall planning by providing detailed picture of how the system actually performs. It can help you map out the parts of the system are the most sensitive to changes in the performance in other areas, allowing you to avoid making alterations which are likely to degrade performance. It can also reveal unanticipated dependencies, as well as suggesting potential shortcuts in the flow of data.

Understanding Scalability via Log Analysis

One of the most important things that log analysis can do in terms of growth is to help you understand how the system is likely to perform as it scales up. When you know the time required to perform a particular operation on 100,000 records, you can roughly calculate the time required to do the same operation with 10,000,000 records. This in turn allows you to consider whether the code that performs the operation will be adequate at a larger scale, or whether you will need to look at a new strategy for producing the same results.

Observability and Baseline Metrics

A log analysis system that lets you establish a baseline and observe changes to metrics in relation to that baseline is of course extremely valuable for troubleshooting, but it can also be a major aid to growth. Rapid notification of changes in metrics gives you a real-time window into the way that the system responds to new conditions, and it allows you to detect potential sensitivities which might otherwise go unnoticed. In a similar vein, a system with superior anomaly detection features will make it much easier to pinpoint potential bottlenecks and delayed-response cascades by alerting you to the kinds of unusual events which are often signatures of such problems.

All of these things — detecting bottlenecks and intermittent delays, as well as other anomalies which may signal future trouble, anticipating changes in performance as a result of changes in scale, recognizing inefficiencies — will help you turn your software (and your organization) into the kind of lean, clean system which is so often necessary for growth. And all of these things can, surprisingly enough, come from something as simple as good, intelligent, thoughtful log analysis.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Michael Churchman

Michael Churchman

Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.

More posts by Michael Churchman.

People who read this also enjoyed