Sign up for a live Kubernetes or DevSecOps demo

Click here

Dan Reichert

Dan Reichert is a Sales Engineer at Sumo Logic with over a decade of experience in technology in the US Army, IBM, and various startups. He is a graduate of the iSchool at Syracuse University with a masters degree in Information Management and University of Central Florida with a bachelors degree in Information Systems Technology. He is an AWS Certified Solutions Architect - Associate.

Posts by Dan Reichert

Blog

Logs and Metrics: What are they, and how do they help me?

Blog

Use Sumo Logic to Collect Raspberry Pi Logs

June 18, 2017

Blog

Best Practices for Creating Custom Logs - Part I

Overview When logging information about your operating system, services, network, or anything else, usually there’s predefined log structures in place by the vendor. There are times when there aren’t predefined logs created by some software or you have custom application logs from your own software. Without properly planning your log syntax you’ll be using, things can get messy and your data may lose its integrity to properly tell the story. These best practices for creating custom logs can be applied to most logging solutions. The 5 W's There are 5 critical components of a good log structure*: When did it happen (timestamp) What happened (e.g., error codes, impact level, etc) Where did it happen (e.g., hostnames, gateways, etc) Who was involved (e.g., usernames) Where he, she, or it came from (e.g., source IP) Additionally, your custom logs should have a standard syntax that is easy to parse with distinct delimiters, a key-value pair, or a combination of both. An example of a good custom log is as follows: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flight/ - credit.payments.io - Success - 2 - 241.98 This log message shows when it was performed, what was performed, where it happened in your system, who performed it, and where that user came from. It’s also structured cleanly with a space-dash-space as the delimiters of each field. Optionally, you can also have key-value pairs to assist with parsing: timestamp: 2017-04-10 09:50:32 -0700 - username: dan12345 - source_ip: 10.0.24.123 - method: GET - resource: /checkout/flight/ - gateway: credit.payments.io - audit: Success - flights_purchased: 2 - value: 241.98 Once you have your log syntax and what will be going into the logs, be sure to document this somewhere. You can document it by adding a comment at the top of each log file. Without documentation, you may forget or someone may not know what something like “2” or “241.98” represents (for this example, it means 2 flights in the checkout at a value of $241.98). You can document our log syntax as such: Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value In the second part of this three part series, we'll go into deeper details around timestamps and log content. In the final part, we'll go even deeper into log syntax and documentation. *Source: Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management. Chuvakin, A., Phillips, C., & Schmidt, K. J. (2013).

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part II

Diving Deeper Now that you have an overview for custom logs and what is involved in creating a good log practice from Part I of the series, it’s time to look further into what and why you should log in your system. This will be broken up into two parts. The first will cover timestamps and content, and the second part will cover syntax and documentation. Timestamp The first and most critical component of just about any log syntax is your timestamp - the “when”. A timestamp is important as it will tell you exactly the time an event took place in the system and was logged. Without this component, you’ll be relying on your log analysis solution to stamp it based upon when it came in. Adding a timestamp at the exact point of when an entry is logged will make sure you are consistently and accurately placing the entry at the right point in time for when it occurred. RFC 3339 is used to define the standard time and date format on the internet. Your timestamp should include year, month, day, hour, minute, second, and timezone. Optionally, you’ll also want to include sub-second depending on how important and precise you’ll need your logs to get for analysis. For Sumo Logic, you can read about the different formats for timestamps that are supported here - Timestamps, Time Zones, Time Ranges, and Date Formats. Log Content To figure out what happened, this can include data such as the severity of the event (e.g., low, medium, high; or 1 through 5), success or failure, status codes, resource URI, or anything else that will help you or your organization know exactly what happened in an event. You should be able to take a single log message or entry out of a log file and know most or all critical information without depending on logs’ file name, storage locations, or automatic metadata tagging from your tool. Your logs should tell a story. If they’re complex, they should also be documented as discussed later on. Bad Logs For a bad example, you may have a log entry as such: 2017-04-10 09:50:32 -0700 Success While you know that on April 10, 2017 at 9:50am MT an event happened and it was a success, you don’t really know anything else. If you know your system inside and out, you may know exactly what was successful; however, if you handed these logs over to a peer to do some analysis, they may be completely clueless! Good Logs Once you add some more details, the picture starts coming together: 2017-04-10 09:50:32 -0700 GET /checkout/flights/ Success From these changes you know on April 10th, a GET method was successfully performed on the resource /checkout/flights/. Finally, you may need to know who was involved and where. While the previous log example can technically provide you a decent amount of information, especially if you have tiny environment, it’s always good to provide as much detail since you don’t know what you may need to know for the future. For example, usernames and user IPs are good to log: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ Success Telling the Story Now you have even more details about what happened. A username or IP may individually be enough, but sometimes (especially for security) you’ll want as much as you can learn about the user since user accounts can be hacked and/or accessed from other IPs. You have just about enough at this point to really tell a story. To make sure you know whatever you can about the event, you also want to know where things were logged. Again, while your logging tool may automatically do this for you, there’s many factors that may affect the integrity and it’s best to have your raw messages tell as much as possible. To complete this, let’s add the gateway that logged the entry: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success Now you know that this was performed on a gateway named credit.payments.io. If you had multiple gateways or containers, you may come to a point of needing to identify which to fix. Omitting this data from your log may result in a headache trying to track down exactly where it occurred. This was just 1 example of some basics of a log. You can add as much detail to this entry to make sure you know whatever you can for any insight you need now or in the future. For example, you may want to know other info about this event. How many flights were purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 Where 2 is the amount of flights. What was the total value of the flights purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.9 Where 2 is the amount of flights, and they totalled $241.98. Now that you know what to put into your custom logs, you should also consider deciding on a standard syntax throughout your logs. This will be covered in the last part of this series on best practices for creating custom logs.

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part III

Diving Even Deeper In Part I there was a general overview of custom logs and Part II discussed timestamps and log content. At this point, you have a log that contains a bunch of important data to help you analyze it to gather useful information about your systems. In this final part of this series, you’ll learn about how to organize the data in your logs and how to make sure you properly document it. Log Syntax You may have the most descriptive and helpful data in your logs, but it can be very difficult to analyze your logs if you don’t have a defined and structured syntax. There are generally 2 ways to go about structuring your logs. Key-Value When it comes to log analysis and parsing your logs, a key-value pair may be the simplest and allow for the most readable format. In our previous example, it may not be the most human-readable format and it may be a little more difficult to find anchors to parse against. You can change the message to be easier to read by humans and easier to parse in a tool like Sumo Logic: timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98 You can take it a step further and structure your logs in a JSON format: { timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98, } In Sumo Logic, you have various ways to parse through this type of structure including a basic Parse operator on predictable patterns or even Parse JSON. While it is ideal to use some sort of key-value pairing, it is not always the most efficient as you’re potentially doubling the size of an entry that gets sent and ingested. If you have low log volume, this wouldn’t be an issue; however, if you are generating logs at a high rate, it can become very costly to have log entries of that size. This brings us to the other format, which are delimited logs. Delimited Delimited logs are essentially the type of log you built in the previous examples. This means that it’s a set structure to your log format, and different content is broken up by some sort of delimiter. 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.98 Because of how this example is structured, spaces are the delimiters. To an extent, this is perfectly reasonable. The problem this provides you when parsing is figuring out where fields start and end as you see with the timestamp, though it may be the most efficient and smallest size you can get for this log. If you need to stick with this format, you’ll probably be sticking to regular expressions to parse your logs. This isn’t a problem to some, but others regular expressions can understandably be a challenge. To try and reduce the need for regular expressions, you’ll want to use a unique delimiter. A space can sometimes be one, but it may require us to excessively parse the timestamp. You may want to use a delimiter such as dash, semicolon, comma, or another character (or character pattern) that you can guarantee will never be used in the data of your fields. 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98 A syntax like this will allow you to parse out the entire message with a space-dash-space ( - ) as your delimiter of the fields. Space-dash-space would make sure that the dashes in the timestamp are not counted as a delimiter. Finally, to make sure you don’t have an entry that can be improperly parsed, always make sure you have some sort of filler in place of any fields that may not have data. For example: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Failure - x - x Furthermore from the example, you know that the event was a failure. Because it failed, it didn’t have flight totals or values. To prevent needing additional parsers for not having those fields, you simply can replace those fields with something like an ‘x’. Note that if you’re running aggregates or math against a field that may typically be a number, you may require adding some additional logic to your search queries. Documentation You may have the greatest log structure possible, but without proper documentation it’s possible to forget why something was part of your logging structure or you may forget what certain fields represented. You should always have documented what your log syntax represents. Referring back to the previous log example: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98 You can document your log syntax as such: Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value This log syntax can placed at the very start of the log file one time for future reference if necessary. Conclusion At Sumo Logic, we regularly work with those who are new to logging and have many questions around how to get the most out of their logs. While you can start ingesting your logs and getting insights almost immediately, the information provided from the tool is only as good as the data we receive. Though most vendors do a good job in sticking to standard log structures with great data to get these insights, it’s up to you to standardize a custom created log. In this series, I set out to help you create logs that have relevant data to know as much as you can about your custom applications. As long as you stick to the “5 W’s”, you structure your logs in a standard syntax, and you document it, then you’ll be on the right track to getting the most out of Sumo Logic. Be sure to sign up for a free trial of Sumo Logic to see what you can do with your logs!