Sign up for a live Kubernetes or DevSecOps demo

Click here
Back to blog results

July 31, 2017 By Twain Taylor

Machine Learning and Log Analysis

Log analysis involves large volume of data. Startups can generate a couple of gigabytes of data per day, and the largest web companies like Facebook log many terabytes of log data every day. (Back in 2009, Facebook was already logging a full 25TB daily.)

How can you make sense of all that data? If you try to analyze it by manually without any tools, good luck. Today, the only way to stay on top of the massive amounts of data that your organization produces— whether you’re a small startup or a Fortune 500 company—is to leverage machine learning to help interpret log data in an automated fashion.

Implementing DevSecOps in the Cloud

In this webinar, George Gerchow, VP of Security and Compliance at Sumo Logic, will do a deep dive into the steps it takes to successfully implement and maintain DevSecOps in your organization at scale.

Manual log analysis is a futile effort

Most logs are not interesting, and don’t have much value. Reading through every line of log data manually is not feasible. You’re typically searching for a needle in a haystack. You search for keywords like error, failed, or abort. If you’re lucky, you may find an occasional issue this way, but most likely you’ll miss out on important events that aren’t included in the results of the keywords you searched for.

Manual log analysis depends on the expertise of the person doing the analysis. If they have a deep understanding of the system, know what’s been changed most recently, and how every part of the system behaves normally, they may gain some momentum reviewing logs manually. However, this is a serious limitation for a DevOps team. It puts the team at the mercy of the one superhero, and if for some reason that person isn’t available, or isn’t able to resolve the issue, the entire operation is in jeopardy.

Manual log analysis may be feasible in a development or QA environment. In production, however, it is unmanageable. In production, access to sensitive log data is restricted, and admins can’t jump from server to server within a large environment in order to grep log data.

Machine learning is the answer

The solution is not to train humans to read logs, but to train algorithms to do so. Computers have historically proven to be better at crunching large volumes of data much faster than humans, and with a higher accuracy level.

Computers have proven to be able to beat humans at numerous games. This ability to analyze data accurately at great speed is making machines capable of driving cars, recognizing images, operate entire factories, and detect cyber threats. With such varied uses, it’s no surprise that log analysis is also being redefined by machine learning.

Types of machine learning algorithms

There are two types of machine learning algorithms—supervised and unsupervised. Supervised algorithms are presented with input data that’s labeled. Their job is to imitate the labeling when fed with similar new data. Unsupervised algorithms are fed with data that’s not labeled, and are expected to group the data into clusters. This way, they can identify which types of data are within the normal range, and which don’t fall into any existing cluster and are an anomaly.

Log analysis uses a variety of machine learning techniques. It uses supervised techniques to classify data. The input data is the raw logs, and the output is a decision whether the log data is in the normal range, or if there’s an anomaly. Similarly, algorithms that perform log analysis should be able to take in large quantities of unstructured data and cluster them into meaningful groups. Any data points that fall outside the regular clusters are considered suspicious.

Statistical analysis looks at changes in the system and assigns a likelihood of any particular change being abnormal. This way changes that are normal are overlooked, and sudden unexpected changes are immediately identified and reported.

Another powerful use case for machine learning algorithms is to predict the possible outcome of an attack, or an incident. For example, if a cluster of servers fail, the algorithm could analyze the probability of related services being affected, and give you time to find a backup for those services. This would put you ahead of the curve, and is something humans or traditional log analysis tools aren’t capable of doing.

Machine learning in log analysis

Many log analytics tools today train machine learning algorithms to analyze logs. A log analysis service has a big advantage over any organization doing this in-house because they have the advantage of possessing more data.

Along with data, these algorithms need a robust infrastructure made up of the top big data analytics tools like Hadoop and Spark, databases like Cassandra, a programming language like Scala, and heavy-duty infrastructure that provides the necessary compute and memory required for the task.

Conclusion

Just as with an algorithm that learns to play chess, or drive a car, the more data you feed it, the smarter it becomes. But along with data, it also needs the right mix of machine learning algorithms, supporting technologies, and powerful infrastructure. That’s what today’s breed of log analytics tools are enabling.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Categories

Twain Taylor

Twain Taylor

Twain Taylor is a member of the Sumo Logic Community. Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. His work involved reviewing stack traces, and resolving issues affecting both customers and the Support team, and handling escalations. Later, he built branded social media applications, and automation scripts to help startups better manage their marketing operations. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications.

More posts by Twain Taylor.

People who read this also enjoyed

Blog

How Sumo Logic Maps DevOps Topologies

Blog

Understanding the Impact of the Kubernetes Security Flaw and Why DevSecOps is the Answer

Blog

Complete Visibility of Amazon Aurora Databases with Sumo Logic

Sumo Logic provides digital businesses a powerful and complete view of modern applications and cloud infrastructures such as AWS. Today, we’re pleased to announce complete visibility into performance, health and user activity of the leading Amazon Aurora database via two new applications – the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. Amazon Aurora is a MySQL and PostgreSQL-compatible relational database available on the AWS RDS platform. Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. By providing complete visibility across your Amazon Aurora databases with these two applications, Sumo Logic provides the following benefits via advanced visualizations: Optimize your databases by understanding query performance, bottlenecks and system utilization Detect and troubleshoot problems by identifying new errors, failed connections, database activity, warnings and system events Monitor user activity by detecting unusual logins, failed events and geo-locations In the following sections of this blog post, we discuss details how these applications provide value to customers. Amazon Aurora Logs and Metrics Sources Amazon provides a rich set of log and metrics sources for monitoring and managing Aurora databases. The Sumo Logic Aurora MySQL ULM app works on the following three log types: AWS CloudTrail event logs AWS CloudWatch metrics AWS CloudWatch logs For Aurora MySQL databases, error logs are enabled by default to be pushed to CloudWatch. Aurora MySQL also supports slow query logs, audit logs, and general logs to be pushed to CloudWatch, however, you need to select this feature on CloudWatch. The Sumo Logic Aurora PostgreSQL ULM app works on the following log types: AWS Cloud Trail event logs AWS CloudWatch metrics For more details on setting up logs, please check the documentation for the Amazon Aurora PostgreSQL app and the Amazon Aurora MySQL app. Installing the Apps for Amazon Aurora Analyzing each of the above logs in isolation to debug a problem, or understand how your database environments are performing can be a daunting and time-consuming task. With the two new Sumo applications, you can instantly get complete visibility into all aspects of running your Aurora databases. Once you have configured your log sources, the Sumo Logic apps can be installed. Navigate to the Apps Catalog in your Sumo Logic instance and add the “Aurora MySQL ULM” or “Aurora PostgreSQL ULM” apps to your library after providing references to sources configured in the previous step. Optimizing Database Performance As part of running today’s digital businesses, customer experiences is a key outcome and towards that end closely monitoring the health of your databases is critical. The following dashboards provide an instant view on how your Amazon Aurora MySQL and PostGreSQL databases are performing across various important metrics. Using the queries from these dashboards, you can build scheduled searches and real-time alerts to quickly detect common performance problems. The Aurora MySQL ULM Logs – Slow Query Dashboard allows you to view log details on slow queries, including the number of slow queries, trends, execution times, time comparisons, command types, users, and IP addresses. The Aurora MySQL ULM Metric – Resource Utilization Monitoring dashboard allows you to view analysis of resource utilization, including usage, latency, active and blocked transactions, and login failures. The Aurora PostgreSQL ULM Metric – Latency, Throughput, and IOPS Monitoring Dashboard allows you to view granular details of database latency, throughput, IOPS and disk queue depth. It is important to monitor the performance of database queries. Latency and throughput are the key performance metrics. Detect and Troubleshoot Errors To provide the best service to your customers, you need to take care of issues quickly and minimize impacts to your users. Database errors can be hard to detect and sometimes surface only after users report application errors. The following set of dashboards help quickly surface unusual or new activity across your AWS Aurora databases. The Aurora MySQL ULM Logs – Error Logs Analysis Dashboard allows you to view details for error logs, including failed authentications, error outliers, top and recent warnings, log levels, and aborted connections. Monitor user activity With cloud environments, its becoming even more critical to investigate user behavior patterns and make sure your database is being accessed by the right staff. The following set of dashboards track all user and database activity and can help prioritize and identify patterns of unusual behavior for security and compliance monitoring. The Aurora MySQL ULM Logs – Audit Log Analysis Dashboard allows you to view an analysis of events, including accessed resources, destination and source addresses, timestamps, and user login information. These logs are specifically enabled to audit activities that are of interest from an audit and compliance perspective. The Aurora MySQL Logs – Audit Log SQL Statements Dashboard allows you to view details for SQL statement events, including Top SQL commands and statements, trends, user management, and activity for various types of SQL statements. You can drill deeper into various SQL statements and commands executed by clicking on the “Top SQL Commands” panel in the dashboard. This will open up the Aurora MySQL ULM – Logs – Audit Log SQL Statements dashboard, which will help with identifying trends, specific executions, user management activities performed and dropped objects. The Aurora PostgreSQL ULM CloudTrail Event – Overview Dashboard allows you to view details for event logs, including geographical locations, trends, successful and failed events, user activity, and error codes. In case you need to drill down for details, the CloudTrail Event – Details dashboard will help you with monitoring the most recent changes made to resources in your Aurora database ecosystem, including creation, modification, deletion and , reboot of Aurora clusters and or instances. Get Started Now! The Sumo Logic apps for Amazon Aurora helps optimize, troubleshoot and secure your AWS Aurora database environments. To get started check out the the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today. For more great DevOps-focused reads, check out the Sumo Logic blog.