Log analysis involves large volume of data. Startups can generate a couple of gigabytes of data per day, and the largest web companies like Facebook log many terabytes of log data every day. (Back in 2009, Facebook was already logging a full 25TB daily.)
How can you make sense of all that data? If you try to analyze it by manually without any tools, good luck. Today, the only way to stay on top of the massive amounts of data that your organization produces— whether you’re a small startup or a Fortune 500 company—is to leverage machine learning to help interpret log data in an automated fashion.
Manual log analysis is a futile effort
Most logs are not interesting, and don’t have much value. Reading through every line of log data manually is not feasible. You’re typically searching for a needle in a haystack. You search for keywords like error, failed, or abort. If you’re lucky, you may find an occasional issue this way, but most likely you’ll miss out on important events that aren’t included in the results of the keywords you searched for.
Manual log analysis depends on the expertise of the person doing the analysis. If they have a deep understanding of the system, know what’s been changed most recently, and how every part of the system behaves normally, they may gain some momentum reviewing logs manually. However, this is a serious limitation for a DevOps team. It puts the team at the mercy of the one superhero, and if for some reason that person isn’t available, or isn’t able to resolve the issue, the entire operation is in jeopardy.
Manual log analysis may be feasible in a development or QA environment. In production, however, it is unmanageable. In production, access to sensitive log data is restricted, and admins can’t jump from server to server within a large environment in order to grep log data.
Machine learning is the answer
The solution is not to train humans to read logs, but to train algorithms to do so. Computers have historically proven to be better at crunching large volumes of data much faster than humans, and with a higher accuracy level.
Computers have proven to be able to beat humans at numerous games. This ability to analyze data accurately at great speed is making machines capable of driving cars, recognizing images, operate entire factories, and detect cyber threats. With such varied uses, it’s no surprise that log analysis is also being redefined by machine learning.
Types of machine learning algorithms
There are two types of machine learning algorithms—supervised and unsupervised. Supervised algorithms are presented with input data that’s labeled. Their job is to imitate the labeling when fed with similar new data. Unsupervised algorithms are fed with data that’s not labeled, and are expected to group the data into clusters. This way, they can identify which types of data are within the normal range, and which don’t fall into any existing cluster and are an anomaly.
Log analysis uses a variety of machine learning techniques. It uses supervised techniques to classify data. The input data is the raw logs, and the output is a decision whether the log data is in the normal range, or if there’s an anomaly. Similarly, algorithms that perform log analysis should be able to take in large quantities of unstructured data and cluster them into meaningful groups. Any data points that fall outside the regular clusters are considered suspicious.
Statistical analysis looks at changes in the system and assigns a likelihood of any particular change being abnormal. This way changes that are normal are overlooked, and sudden unexpected changes are immediately identified and reported.
Another powerful use case for machine learning algorithms is to predict the possible outcome of an attack, or an incident. For example, if a cluster of servers fail, the algorithm could analyze the probability of related services being affected, and give you time to find a backup for those services. This would put you ahead of the curve, and is something humans or traditional log analysis tools aren’t capable of doing.
Machine learning in log analysis
Many log analytics tools today train machine learning algorithms to analyze logs. A log analysis service has a big advantage over any organization doing this in-house because they have the advantage of possessing more data.
Along with data, these algorithms need a robust infrastructure made up of the top big data analytics tools like Hadoop and Spark, databases like Cassandra, a programming language like Scala, and heavy-duty infrastructure that provides the necessary compute and memory required for the task.
Just as with an algorithm that learns to play chess, or drive a car, the more data you feed it, the smarter it becomes. But along with data, it also needs the right mix of machine learning algorithms, supporting technologies, and powerful infrastructure. That’s what today’s breed of log analytics tools are enabling.