Back to blog results

February 17, 2015By Stefan Zier

Optimizing Selectivity in Search Queries

While we at Sumo Logic constantly work to improve search performance automatically, there are some improvements that can only be made by humans who understand what question a search query answers and what the data looks like.

In a Sumo Logic query, operators are chained together with the pipe symbol ( | ) to calculate the result. Each operator sends its output to the next operator in the chain. Certain operators, such as where and parse (except with nodrop) drop certain messages. Dropped messages do not need to be processed by any subsequent operators in the chain. This is called the selectivity of an operator. The fewer messages that “make it” through the operator, the more selective the operator is.

Operator Ordering

For optimal query performance, move the most selective operators to the earliest positions in your query. This reduces the amount of data subsequent operators need to process, and thereby speeds up your query.

Query 1

error
| parse “ip=*, errorcode=*“ as ip, errorcode
| lookup ip from /my/whitelisted_ips on ip=ip
| where errorcode=”failed_login”

In the example above, Query 1 performs a lookup on all log lines, just to discard all lines where errorcode isn’t “failed_login”. Query 2 below is much more optimal, since it only performs the lookup on log lines that match the overall selectivity criteria of the query.

Query 2

error
| parse “ip=*, errorcode=*“ as ip, errorcode
| where errorcode=”failed_login”
| lookup ip from /my/whitelisted_ips on ip=ip

Data Knowledge/Result Predictions

To optimize queries and predict results, you can use knowledge of your data. Consider the following example.

Query 3

error failed_login
| parse “ip=*, errorcode=*“ as ip, errorcode
| where errorcode=”failed_login”
| lookup ip from /my/whitelisted_ips on ip=ip
| if( isNull(ip), "unsafe", "safe") as ip_status
| where ip_status="unsafe"
| count by ip
| top 10 newip, ip by _count

You may know that your top 10 values are all measured in the thousands or tens of thousands. Based on that knowledge, you can optimize this query to not evaluate any IP addresses that occur less frequently than what you expect:

Query 4

error failed_login
| parse “ip=*, errorcode=*“ as ip, errorcode
| where errorcode=”failed_login”
| count by ip
| where _count > 1000
| lookup ip from /my/whitelisted_ips on ip=ip
| if( isNull(ip), "unsafe", "safe") as ip_status
| where ip_status="unsafe"
| top 10 newip, ip by _count

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Stefan Zier

Stefan was Sumo’s first engineer and Chief Architect. He enjoys working on cloud plumbing and is plotting to automate his job fully, so he can spend all his time skiing in Tahoe.

More posts by Stefan Zier.