Log management is the process of facilitating, transmitting, analyzing, storing, and archiving large sets of log data.
What are logs?
Logs are continuous digital records of events generated by all components of your software stack which includes (but is by no means limited to):
- Cloud infrastructure
- Application infrastructure
- Message Bus (Kafka)
- Load balancers
In other words, logs are everywhere. Every supporting service or component of a modern cloud application is logging every action or event that takes place within it. An example of log data from a client machine is found below. Specifically, it includes a timestamp followed by the Hostname, process type, log type, application, action, and TCP socket status.
Dec 10 12:58:39 Client-mac socketfilterfw : openvpn-service: Allow TCP LISTEN (in:0 out:2)
Dec 10 13:01:09 Client-mac socketfilterfw : Dropbox: Allow TCP LISTEN (in:0 out:2)
Dec 11 18:40:03 Client-mac socketfilterfw : Office365Service: Allow TCP CONNECT (in:1 out:0)
Dec 12 17:17:25 Client-mac socketfilterfw : Dropbox: Allow TCP LISTEN (in:0 out:4)
What is log management in DevOps?
Log management in DevOps and site reliability engineering is the process of gathering and analyzing log data from different application systems to monitor and improve performance, identify issues and bugs, and improve security. This process is also known as security log management.
Logs are like an x-ray report of the body. Logs provides visibility into the health of application and infrastructure stack. Without logs the internal operations on most IT components would be inscrutable. The lack of log visibility further creates operational challenges when modern applications are leveraging the cloud, the infrastructure they don’t own, and microservices architecture where three tier architecture is transformed into n-tier architecture with many-to-many communications between those services.
In the event of security incident or operation outage, your DevOps, ITOps, and SecOps team(s) do not have insight that allows them to quickly resolve the issue. This lack of visibility into their stack often creates higher application latency and more system outages, which translates into poor customer experience and customer churn.
Knowing where to look to pinpoint problems that cause customer satisfaction issues, applications to slow down, system-wide outages, or security threats, are primary reason for the existence of logs. Each individual log contains a stream of events, and includes wealth of data about software and related infrastructure performance, availability, user access, and behavior. Happily, in today’s tech-driven world, logs are pretty much ubiquitous. By analyzing these logs, one can proactively detect and resolve issues that impact the business.
Log event examples
Some examples of such log events are:
Step 1 : Instrument and collect
Install a collector to collect data from any part of your stack. One can collect logs from operating systems, containers, network devices, AWS infrastructure, application access logs, and custom events.Collection can be done using Syslog or applications directly writing the logs in to the centralized log management over HTTP. Schema on read will save lots of time for you since it eliminates need to pre-parse a log before ingesting it into a log management system.
Step 2 : Centralize and Index
Centralize the logs for easy access and visibility into relevant modules. Centralized logs will ensure that users would never have to hop from one server to the other, and then manually “grep” logs of interest from multiple systems, e.g. to search for a particular string or pattern of text within a log. Indexing allows ITOps and SecOps to quickly search for any term within a log, similar to Google search.
Step 3 : Search and Analyze
After indexing is done, ITOps can search and analyze information and also allows them to create schema on read. Analysis can be done manually or one can use native machine learning for advanced analytics to identify and compare patterns, or spot outliers.
Learn More: Log Analysis
Step 4 : Monitor and Alert
Monitor and alert is the next phase. Log Management should be able to integrate with commonly used collaboration software such as Hipchat, Slack and PagerDuty to alert users. Continuous monitoring of large volume of data and logs is inevitable, but to ensure that users are alerted in time with dynamic thresholds and the use of advanced analytics powered by machine learning is necessary.
Step 5 : Log Report and Dashboard
It is important to share reports and dashboards so that the entire team has access to the same data. And added benefit is that you can create these reports and dashboards just once, and then use them many times again without requiring other users to recreate them. Relatedly, it is critical to mention that the use of RBAC (Role Based Access Control) is mandatory to provide need-to-know access to the team.
If logs are so valuable, why can’t we just grep them to find what we are looking for? It turns out, that’s not quite so simple, for a few reasons.
- First, the volume of logs that need to be searched is massive and grows continuously.
- Second, when you look at a one log file, it provides just a single point of view, such as an application point of view or server point of view. Without the capability to correlate views across multiple logs from different components, it is very difficult to get a full picture of the problems that are occurring.
- Third, some compliance regulations mandate companies to store all logs, and to not allow developers to access production machines. In this scenario, a real-time, centralized log management solution addresses both of these requirements.
Ideally, you need a centralized log management solution that centralizes all of these logs, correlates, and analyzes them to provide meaningful insights for IT to solve SLA, performance issues, and availability problems.
<< Sumo example of a search query … >>_sourceCategory=config Notification ConfigurationItemChangeNotification| json “Message”, “Type” as single_message, type| where type == “Notification”| json field=single_message “configurationItem” as single_message| json field=single_message “resourceType”, “configurationItemStatus”, “awsRegion”| where configurationItemStatus = “ResourceDeleted”| count by resourceType
In above example, query is counting total number of deleted resources as reported by AWS Config.
In order to draw meaningful insights from these logs that are everywhere and ever-growing, you need a scalable platform that centralizes all these logs, provides a simple search interface for users to look for common exceptions, applies machine learning to detect patterns in behaviors, and helps users with insightful information to not only reactively fix the issues but also to prevent them from recurring.
Some common use cases of log management solutions are to enable developers, IT operators, and security professionals include,
- Troubleshooting and root cause analysis of modern application: Get alerted on potential operational problems with modern application stacks or any underlying infrastructure component such as performance degradation, outages or exceptions causing user experience to suffer.
- Improve code quality: Improve code quality by troubleshooting development and deployment issues before rolling into production.
- Security analytics and compliance: Ensure security and regulatory compliance through cloud audits, detection of potential threats or malicious access, and retaining logs based on the compliance mandate.
- Business insight: Identify the behavior pattern such as feature usage and get insight into user actions rather than user opinions.
Centralizing your log data means you'll need to consolidate and process information from multiple platforms. Below is information for how to manage log files across different systems, platforms, and use cases.
Sumo Logic is a cloud-native secure centralized log analytics service, which provides insights into logs through pre-built applications, identifying patterns to show outliers in behaviors of applications and systems. IT teams can then instantly act on these outliers, get to the root cause and prevent any future impact to the business
Sumo Logic can collect logs from almost any system in nearly any format, and our centralized log management service analyzes over 100 PB of data on an average day!
Sumo Logic provides everything you need to conduct real-time forensics and log management for all of your IT data without performing complex installations or upgrades, and without the need to manage and scale any hardware or storage. With fully elastic scalability, Sumo Logic is a fit for any size deployment.
The following table lists data types and some of the more common sources that produce logs and which can be collected by Sumo Logic. This list is a sample only to provide a general idea of the possible sources of log data; it is by no means complete. For more information on how Sumo Logic can help you manage log data, please visit the Sumo Logic application page.
|Technology||Popular Log Sources|
|Server / OS|
|IaaS / PaaS|
|Integration with Custom library|
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.