The Sumo query language can be a source of joy and pain at times. Achieving mastery is no easy path and all who set on this path may suffer greatly until they see the light. The Log Operators Cheat Sheet is a valuable resource to learn syntax and semantics of the individual operators, but the bigger questions become “how can we tie them together” and “how can we write query language that matters?”
Fortunately, there are people who have walked this path before — and succeeded. In this post, we consolidate their queries (we’ve analyzed a 615 MB sample of query string for this) into a set of query design patterns, so that everyone can learn from their wisdom and become a Sumo Master.
Anatomy of a Query
A Sumo query decomposes in SearchExpression and TargetExpression. SearchExpression identifies the relevant log lines through search keywords and TargetExpression specifies how to slice and dice the data to provide meaning. A TargetExpression can have one or more clauses. Clauses combine operators and their arguments.
Typically, a SearchExpression contains search keywords like ‘error’ that select a relevant set of log lines from the log stream. The TargetExpression is then used to slice and dice the data to extract insights such as an error code. A TargetExpression is specified by a sequence for clauses. Clauses are separated through the pipe ‘|’ symbol. Each clause contains one operator that specifies its function and some arguments that are specific to the log lines under consideration. Finally, a TargetExpression may or may not end with an aggregation operator such as ‘count’ to produce a condensed view.
There are myriads of sequences of operators to retrieve information from the logs. It can be challenging to find an effective sequence. There are, however, some canonical operator sequences that are used more often than others. To find these, we’ve sampled a representative set of user queries, derived the operator sequences and counted their occurrence. This led us to the five design patterns that we’ll out line below.
Common Operator Sequences
This table shows the probability of different operators to occur. The first table shows shows the occurrence probability of individual operators. The second and the third table show the occurrence of operator tuples and triples, respectively. In the top 10 operator list `parse` takes the lead, followed by `where` and `count.` These three already form a powerful trifecta. Learn all 10 operators and you are ready for most of the action. Moreover, we use this table to distill some common patterns.
Pattern #1: Parse Where it Counts
The Parse Where it Counts pattern leverages the parse-where-count power triple. Variants of this are the most frequent queries that we process. A search expression might outline some kind of data. Parsing filters and extracts specific values such as numbers, status codes, timestamps from the search result. The `where` clause enforces some condition such as a threshold or matches a specific string value. Finally, counting aggregates the remaining data. It can target a single number or a histogram if used with a `by` parameter.