Field extraction rules compress queries into short phrases, filter out unwanted fields and drastically speed up query times. Fifty at a time can be stored and used in what Sumo Logic calls a “parser library.”
These rules are a must once you move from simple collection to correlation and dashboarding. Since they tailor searches prior to source ingestion, the rules never collect unwanted fields, which can drastically speed up query times. Correlations and dashboards require many queries to load simultaneously, so the speed impact can be significant.
Setting Up Field Extraction Rules
The Sumo Logic team has written some templates to help you get started with common logs like IIS and Apache. While you will need to edit them, they take a lot of the pain out of writing regex parsers from scratch (phew). And if you write your own reusable parsers, save them as a template so you can help yourself to them later.
To get started, find a frequently used query snippet. The best candidates are queries that (1) are used frequently and (2) take a while to load. These might pull from dense sources (like iis) or just crawl back over long periods of time. You can also look at de facto high usage queries saved in dashboards, alerts and pinned searches.
Once you have the query, first take a look at what the source pulls without any filters. This is important both to ensure that you collect what’s needed, and that you don’t include anything that will throw off the rules. Since rules are “all or nothing,” only include persistent fields. In the example below, I am pulling from a safend collector. Here’s the output from a collector on a USB:
2014-10-09T15:12:33.912408-04:00 safend.host.com [Safend Data Protection] File Logging Alert details: User: firstname.lastname@example.org, Computer: computer.host.com, Operating System: Windows 7, Client GMT: 10/9/2014 7:12:33 PM, Client Local Time: 10/9/2014 3:12:33 PM, Server Time: 10/9/2014 7:12:33 PM, Group: , Policy: Safend for Cuomer Default Policy, Device Description: Disk drive, Device Info: SanDisk Cruzer Pattern USB Device, Port: USB, Device Type: Removable Storage Devices, Vendor: 0781, Model: 550A, Distinct ID: 3485320307908660, Details: , File Name: F:\SOME_FILE_NAME, File Type: PDF, File Size: 35607, Created: 10/9/2014 7:12:33 PM, Modified: 10/9/2014 7:12:34 PM, Action: Write
There are certainly reasons to collect all of this (and note that the rule won’t limit collection on the source collector) but I only want to analyze a few parameters.
To get it just right, filter it in the Field Extraction panel:
Below is the simple Parse Expression I used. Note that more parsing tools are supported that can grep nearly anything that a regular query can. But in this case, I just used parse and nodrop.
Nodrop tells the query to pass results along even if the query returns nothing from that field. In this case, it acts like an OR function that concatenates the first three parse functions along with the last one. So if ‘parse regex “Action…”‘ returns nothing, nodrop commands the query to “not drop”, return a blank, and in this case, continue to the next function.
Remember that Field Extraction Rules are “all or nothing” with respect to fields. If you add a field that doesn’t exist, nodrop will not help since it only works within existing fields.
Use Field Extraction Rules to Speed Up Dashboard Load Time
The above example would be a good underlying rule for a larger profiling dashboard. It returns file information only—Action on the File, File ID, File Size, and Type. Another extraction rule might return only User and User Activities, while yet another might include only host server actions.
These rules can then be surfaced as dashboard panes, combined into profiles and easily edited. They load only the fields extracted, significantly improving load time, and the modularity of the rules provides a built-in library that makes editing and sharing useful snippets much simpler.
Working With Field Extraction Rules in Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.
About the Author
Alex Entrekin served on the executive staff of Cloudshare where he was primarily responsible for advanced analytics and monitoring systems. His work extending Splunk into actionable user profiling was featured at VMworld: “How a Cloud Computing Provider Reached the Holy Grail of Visibility.” Alex is currently an attorney, researcher and writer based in Santa Barbara, CA. He holds a J.D. from the UCLA School of Law.