Yan Qiao, Software Engineer

Using the transpose operator

02.19.2013 | Posted by Yan Qiao, Software Engineer

Sumo Logic lets you access your logs through a powerful query language.  In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators.  There are currently about two dozen operators available and we are constantly adding new ones.  In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.

Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:

2013-02-14 01:41:36 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449

There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds.   We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed.  One way to do that is to build a histogram of the response time using the following query:

stocktrade |  extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time

Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number.  The result looks like: 

Now, it would also be interesting to see how the distribution changes over time.   That is easy with the timeslice operator:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time

and the result looks like the following:

This gets the data we want, but it is not presented in a format that is easy to digest.  For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc.  Wouldn’t it be nice if we could rearrange the data into the following table?

That is exactly what transpose does:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time

Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.

This is especially useful when the data is visualized.  The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:

The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.

That’s it!  To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!

Praveen Rangnath, Former Head of Product Marketing

A Few Good Logs

02.15.2013 | Posted by Praveen Rangnath, Former Head of Product Marketing


“I Want The Logs!”

In the midst of this week’s back and forth between Tesla, the New York Times, and various other media outlets and bloggers, Greylock Data Scientist in Residence (and Sumo Logic Advisory Board Member) DJ Patil posted a tweet that caught my eye: “Love that everyone is using data to have a conversation.  It’s about getting to the right answer.”

DJ is 100% correct, and throughout this Tesla/NY Times debate, we at Sumo Logic are thrilled to see the public recognition of the importance of log data — as a source of the truth.  

Yes, log data needs to be properly analyzed and understood (as the debate makes evident), but what clearly emerged from the debate is the truism that that log data holds the absolute and authoritative record of all the events that occurred.  It’s evident; just see how the discussion revolves entirely around understanding the logs.  

The Bigger Picture

There is a bigger picture to this debate, which is that log data is generated everywhere, whether it be from the car you drive, the energy meter beside your home, the device you’re using to read this blog, the server delivering this content, the network delivering this content, the device I’m using to write this post… I could go on and on.  And in the same way log files generated by a car hold the answer to whether it ran out of power or met range estimates, log files generated by applications, servers, network and virtualization infrastructure hold the answer to whether revenue generating applications are up and adequately performing, if customers are utilizing a newly developed feature, or if any part of an IT infrastructure is slow or experiencing downtime.  

It is important to remember — these are all business critical questions.  And just like Tesla needed to analyze their logs to defend their business, every enterprise, large or small, needs to be able to easily analyze and visualize their log data to ensure the health of their business.

Cars, Enterprises, and Terabytes

Before moving on, let’s not forget, enterprises are not cars, and data generated from enterprises is different from data generated by cars, particularly along three dimensions:  volume, variety, and velocity.  You got it… the 3 Vs of Big Data.  Cars do not (or at least do not yet!) generate up to terabytes of unstructured data per day.  Enterprises with large distributed IT environments do.

This is where Sumo Logic comes in.  Sumo Logic is based on the recognition that enterprises need to be able to easily analyze and visualize the massive amounts of amounts of data generated by their infrastructure and business, and that current on-premise tools just can’t scale.  Today, enterprises generate as much data in 10 minutes as they did in the entire year in 2003.  It is therefore not surprising that legacy on-premise solutions just can’t keep up.

Sumo Logic makes it possible for enterprises of all sizes to find the truth from their data.  And we do so without adding any operational overhead for our customers; Sumo Logic is a 100% cloud-based service.   Large enterprises like Netflix and Land O’Lakes use Sumo Logic.  Fast growing enterprises like PagerDuty and Okta do as well.  

You want some answers?  You have some logs?  We can handle the logs.  

Contact us here, or try it out for yourself by signing up for Sumo Logic Free.


Sanjay Sarathy, CMO

A Visit To the Other Coast

02.12.2013 | Posted by Sanjay Sarathy, CMO

Vance and I spent a week on the East Coast talking with a variety of analysts about the Sumo Logic story.  Apart from the usual questions (“where did you come up with that name?”), there were a number of interesting observations from our first ‘tour’.  

  • Different aspects of our story appeal to different analysts, depending on particular research areas. Some people latched onto our “Analytics for IT” story and were interested in a deep understanding of how we plan to take LogReduce and its associated capabilities to the proverbial next level. Others were interested in understanding just how elastic our cloud-based architecture is to support the potential bursting and scaling needs of a number of different clients.  Still others focused on the ROI potential of our solution across a variety of different use cases.  
  • Once we actually showed a live example of how LogReduce works (hello, Sumo on Sumo) everyone instinctively understood the huge operational and business value that LogReduce brings by distilling hundreds of thousands of log messages into a set of twenty to thirty real patterns. Thank goodness for ubiquitous WiFi.  
  • My most interesting meeting was with about 20 people from a particular banking outfit with whom I spent the first ten minutes explaining what log files were and why analyzing them could uncover real insights from a business.  Getting back to first principles was illuminating because without explaining the business reason for looking at log files, your so-called features are almost irrelevant. 

We have our sales kickoff this week.  There’s a ton of energy across every group, not just because of the success we’ve had but also from the enormous opportunity to help small and large businesses generate value from their machine data.  We’d love to get your feedback on our service – try Sumo Logic Free and tell us what you think.