Work Here – Get a free MacBook Pro!

07.31.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

In the last few years, many developers started to favor Mac OS X as a development platform. When you come work for Sumo Logic, we give you a top-of-the line MacBook Pro, and you get to keep it, for good. This post describes the terms of this offer and the rationale behind it.

… Continue Reading

Scala at Sumo: grab bag of tricks

07.25.2012 | Posted by David Andrzejewski, Data Sciences Engineer

As mentioned previously on this blog, at Sumo Logic we primarily develop our backend software using the Scala programming language. In this post we present some (very) miscellaneous Scala tidbits and snippets that showcase some interesting features we have taken advantage of in our own code.

… Continue Reading

3 Tips for Writing Performant Scala

07.23.2012 | Posted by Russell

Here at Sumo Logic we write a lot of Scala code. We also have a lot of data, so some of our code has to go really, really fast. While Scala allows us to write correct, clear code quickly, it can be challenging to ensure that you are getting the best performance possible.

Two expressions which seem to be equivalent in terms of performance can behave radically differently. Only with an in-depth understanding the implementation of the language and the standard library can one predict which will be faster.

… Continue Reading

Connect the dots with the new Trace operator

07.18.2012 | Posted by David Andrzejewski, Data Sciences Engineer

The trace operator is a new “beta” feature in Sumo Logic that allows the user to identify and follow chains of entities across different log messages, which themselves may be distributed across different assemblies, machines, or even datacenters.  Its origins lie in our culture of “dogfooding” and a recent hackathon where engineers had the opportunity to work on cool or itch-scratching projects of their own choosing.

Since the Sumo Logic service itself is a cloud-based distributed system, we often found ourselves investigating behaviors across multiple components of the system.  Following our own logging advice, we use unique IDs to track these events and to make them easily identifiable within our logs.  However, unless the “originating ID” follows activity across every single system component, it was still necessary to perform multiple searches to follow event chains all the way to the end.  To show how trace automates this procedure and makes our lives easier, we’ll walk through a simplified session tracking example.

Session Tracking Example

Say that your product uses a variety of session IDs to track requests as they flow throughout your system.  For example, different components might use a series of 4-digit hexadecimal IDs to process a customer order as shown below.

Now imagine that an error is encountered within the system while processing the accountID causing an internal error log to be generated containing the webID: “PROCESSING FAILED: webID=7F92“.  

Manually connecting the dots

Starting from this information, we could perform a series of searches and manual investigations to uncover the root cause from this set of logs:

  1. User action webID=7F92
  2. Initiating requestID=082A for webID=7F92 …
  3.  … orderID=34C8 received for requestID=082A …
  4. Retrieving userID=11D2 for requestID=082A …
  5. … accountID=1234 access, userID=11D2 …
  6. ERROR accountID=1234 not found! 
    (this error percolates back until the original webID fails)
  7. PROCESSING FAILED: webID=79F92

Note that to arrive at this conclusion we are essentially following a ”chain” of these hex IDs across different components of our system.

Session tracking with trace

The idea of the trace operator is to automate this process, allowing us to jump almost directly from the observed webID (log #1) to the original failure deep within the system (log #6) via the following query:

* | trace “ID=([0-9a-fA-F]{4})” “7F92″ | where _raw matches “*ERROR*”

Let’s deconstruct what’s happening here. First, assume that our * keyword search query runs over the time window of interest, capturing all relevant logs and plenty of irrelevant ones as well.  Next we have the trace operator:

  • The regular expression (with exactly one capturing group) ”ID=([0-9a-fA-F]{4})” tells trace how to identify the individual pieces of the chain we are trying to build, in this case 4-digit hex strings following “ID=“.
  • The final value gives trace the starting point to build a chain from, which for us is the original webID 7F92.
  • trace then scans incoming logs to build the underlying chain based on IDs occurring together in the same log, starting from the user supplied initial value (here 7F92).  

For example, when trace observes this log

Initiating requestID=082A for webID=7F92 …

it uses the regex to identify two IDs: 082A and 7F92.  Since 7F92 is the starting point it is already part of the chain, and since 082A has just co-occurred with 7F92 we add it to the chain as well.  As trace works its way through the logs, any log containing any ID which is part of the chain is passed through, and any other log is simply ignored. For example the following log would not be added, because none of these IDs are connected to the chain we build starting from the webID 7F92:

Initiating requestID=8182 for webID=8384 …

This is how the trace operator filters logs by “connecting the dots” across different log messages.

The smoking gun

Finally, once we’ve used trace to filter down to logs containing IDs which we know to be connected to the failing webID 7F92, we do string matching to filter down to logs containing the substring “ERROR” and discover a failure associated with the accountID.  Note that if we had simply done an “ERROR” keyword search we might be faced with a deluge of other errors not directly connected to the specific issue we were trying to investigate.  Furthermore, without the constructing our chain of IDs, there would be no obvious connection between accountID 1234 and our failure webID 7F92.  Hopefully this example has given you a taste for what you can do with trace – there are certainly many other possible applications.

Pragmatic AWS: 3 Tips to enhance the AWS SDK with Scala

07.12.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

At Sumo Logic, most backend code is written in Scala. Scala is a newer JVM (Java Virtual Machine) language created in 2001 by Martin Odersky, who also co-founded our Greylock sister company, TypeSafe. Over the past two years at Sumo Logic, we’ve found Scala to be a great way to use the AWS SDK for Java. In this post, I’ll explain some use cases. 

1. Tags as fields on AWS model objects

Accessing AWS resource tags can be tedious in Java. For example, to get the value of the “Cluster” tag on a given instance, something like this is usually needed: 

   String deployment = null;
   for (Tag tag : instance.getTags()) {
     if (tag.getKey().equals(“Cluster”)) {
       deployment = tag.getValue();
     }
   }

While this isn’t horrible, it certainly doesn’t make code easy to read. Of course, one could turn this into a utility method to improve readability. The set of tags used by an application is usually known and small in number. For this reason, we found it useful to expose tags with an implicit wrapper around the EC2 SDK’s Instance, Volume, etc. classes. With a little Scala magic, the above code can now be written as:

val deployment = instance.cluster

Here is what it takes to make this magic work:

object RichAmazonEC2 {
 implicit def wrapInstance(i: Instance) = new RichEC2Instance(i)
}

class RichEC2Instance(instance: Instance) {
 private def getTagValue(tag: String): String =
   tags.find(_.getKey == tag).map(_.getValue).getOrElse(null)
 
 def cluster = getTagValue(“Cluster”)
}

Whenever this functionality is desired, one just has to import RichAmazonEC2._

2. Work with lists of resources

Scala 2.8.0 included a very powerful new set of collections libraries, which are very useful when manipulating lists of AWS resources. Since the AWS SDK uses Java collections, to make this work, one needs to import collections.JavaConversions._, which transparently “converts” (wraps implicitly) the Java collections. Here are a few examples to showcase why this is powerful: 

Printing a sorted list of instances, by name:
ec2.describeInstances(). // Get list of instances.
 getReservations.                  
 map(_.getInstances).
 flatten.                          // Translate reservations to instances.
 sortBy(_.sortName).               // Sort the list.
 map(i => “%-25s (%s)”.format(i.name, i.getInstanceId)). // Create String.
 foreach(println(_))               // Print the string.

Grouping a list of instances in a deployment by cluster (returns a Map from cluster name to list of instances in the cluster):
ec2.describeInstances().            // Get list of instances.
 filter(_.deployment = “prod”).    // Filter the list to prod deployment.
 groupBy(_.cluster)                // Group by the cluster.

You get the idea – this makes it trivial to build very rich interactions with EC2 resources.

3. Add pagination logic to the AWS SDK

When we first started using AWS, we had a utility class to provide some commonly repeated functionality, such as pagination for S3 buckets and retry logic for calls. Instead of embedding functionality in a separate utility class, implicits allow you to pretend that the functionality you want exists in the AWS SDK. Here is an example that extends the AmazonS3 class to allow listing all objects in a bucket: 

object RichAmazonS3 {
 implicit def wrapAmazonS3(s3: AmazonS3) = new RichAmazonS3(s3)
}

class RichAmazonS3(s3: AmazonS3) {
 def listAllObjects(bucket: String, cadence: Int = 100): Seq[S3ObjectSummary] = {

   var result = List[S3ObjectSummary]()

   def addObjects(objects: ObjectListing) = result ++= objects.getObjectSummaries

   var objects = s3.listObjects(new ListObjectsRequest().withMaxKeys(cadence).withBucketName(bucket))
   addObjects(objects)

   while (objects.isTruncated) {
     objects = s3.listNextBatchOfObjects(objects)
     addObjects(objects)
   }

   result
 }
}

To use this:

val objects = s3.listAllObjects(“mybucket”)

There is, of course a risk of running out of memory, given a large enough number of object summaries, but in many use cases, this is not a big concern.

Summary

Scala enables programmers to implement expressive, rich interactions with AWS and greatly improves readability and developer productivity when using the AWS SDK. It’s been an essential tool to help us succeed with  AWS.

Twitter