Blog › Authors › Yan Qiao

Yan Qiao, Software Engineer

Using the transpose operator

02.19.2013 | Posted by Yan Qiao, Software Engineer

Sumo Logic lets you access your logs through a powerful query language.  In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators.  There are currently about two dozen operators available and we are constantly adding new ones.  In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.

Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:

2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 219.142.249.227 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449

There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds.   We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed.  One way to do that is to build a histogram of the response time using the following query:

stocktrade |  extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time

Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number.  The result looks like: 

Now, it would also be interesting to see how the distribution changes over time.   That is easy with the timeslice operator:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time

and the result looks like the following:

This gets the data we want, but it is not presented in a format that is easy to digest.  For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc.  Wouldn’t it be nice if we could rearrange the data into the following table?

That is exactly what transpose does:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time

Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.

This is especially useful when the data is visualized.  The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:

The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.

That’s it!  To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!

Twitter