Back to blog results

October 16, 2014By Derek Hall

Become Friends with Metadata to Maximize Efficiency in Sumo Logic

While performing my duties on-boarding our Sumo Free users, I’ve learned that the most consistent process that our users wish to eliminate prior to using our service is investigating incidents within siloed data sources. Needless to say the process is arduous because customers have to zero-in on individual servers/appliances to manually decipher all logs. Enough said!

Centralizing your logs, making Sumo Logic the source of truth for your data, allows for real-time, rapid coordination between Dev, Ops, and Sec teams to remediate problems, patch vulnerabilities, improve end user experience etc.

But now that you have your Apache Access and Errors logs, your Cisco ASA logs, email logs, Linux and Windows OS logs, VMware, and AWS ELB logs all living under the same roof, how do we make them play nicely together? How do you search on your apache error logs without accidentally inviting Linux and Windows error messages to the party? Meet your “bouncer” metadata!

Leveraging metadata to build searches and dashboards is the foundation for both organizing your diverse logs and optimizing performance. Starting your search with a * can be computationally expensive and cause avoidable lag time for your results. So it’s a best practice to constrain your search to a subset of your data using a metadata field before your first pipe i.e. | using the syntax _metadatafield=foo . Or simply click into the search bar and select:

Now you don’t have to remember a specific set of keywords or strings to pull up the data set you want.

Standardize your Metadata Convention

The above picture displays the primary metadata fields you’re capable of customizing that will get attached to your messages after ingestion. These include:

  • Collector – The name of the Collector entered at activation time

  • Source – The name of the Source entered when the Source is created

  • Source Category – Open tag, completely customizable. This metadata is also typically used for mapping data streams to our apps

  • Source Host – For Remote and Syslog Sources, this is a fixed value determined by the hostname you enter in the “Hostname” field (your actual system values for hosts). For a Local File Source, you can overwrite the host system value with a new value of your choice

  • Source Name – A fixed value determined by the path you enter in the “File” field when configuring a Source. This metadata tag cannot be changed

_sourceCategory is your best friend

Source Category is your best friend because it’s completely open and customizable allowing you to “categorize” your logs in a way that makes the most sense for your team. You can also provide structure by using multiple tags separated by an _ or a / . For example, say you wanted to separate your Apache logs based on staging environments, you could hierarchically categorize them to make it easy to search on them individually or together using wildcards. You might try this:

Prod/Apache/Access

Prod/Apache/Error

QA/Apache/Access

QA/Apache/Error

To search individually:

_sourcecategory=Prod/Apache/Access

_sourcecategory=QA/Apache/Access

Together:

_sourcecategory=*Apache/Access

_sourcecategory=*Apache/Error

or

_sourcecategory=Prod/Apache*

_sourcecategory=QA/Apache*

Or say you have multiple security data sources like a Cisco Firewall, Snort IDS, and Linux OS security logs, you might try:

Sec_Firewall_Cisco

Sec_IDS_Snort

OS_Linux_Sec

And we can tie all of them together with a simple: _sourcecategory=*Sec*

_sourceCategory for Apps

Additionally we use _sourcecategory to map your data to our numerous pre-built applications. Just like in search, you can use wildcards to funnel multiple different data sources into the same app. For example you may have multiple Linux or Windows OS logs that you may have categorized differently based on location, you’re not required to use the same source category for them, simply make sure the words “Linux” and “Windows” are somewhere in the metadata field and use the wildcards to create a custom data source to funnel all of it to the app.

Here are some additional metadata fields that are not customizable but are still attached to all of your messages, these can be used to refine your queries:

Name

Description

_messageCount

A sequence number (per Source) added by the Collector when the message was received.

_messageTime

The timestamp of the message. If the message doesn’t have a timestamp, messageTime uses the receiptTime.

_raw

The raw log message.

_receiptTime

The time the Collector received the message.

_size

The size of the log message.

For additional tips on leveraging your metadata check out help.sumologic.com. And if you’re not familiar with Sumo please check out our Sumo Logic Free service and enjoy!

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Derek Hall

More posts by Derek Hall.