Russell Cohen

Posts by Russell Cohen

Blog

A Better Way to Analyze Log Files on the Command Line

Sumo Logic makes it easy to aggregate and search terabytes of log data. But you don’t always have terabytes of data on 1000s of servers. Sometimes you have just a few log files on a single server. We’re open sourcing Sumoshell, a set of tools recently created at a hackathon, to help fill that gap. Getting real value from your logs requires more than finding log lines that match a few keywords and paging through (ala tail/grep/less) — you need parsing, transforming, aggregating, graphing, clustering (and more). All these things are easy to do in Sumo Logic, but they’re hard to do with the standard set of unix command line utilities people usually use to analyze logs. Sumoshell is a set of command line utilities to analyze logs. Its goal is to bring Sumo Logic’s log analysis power to the command line. Here’s an example of Sumoshell parsing tcpdump’s output to show the ip addresses that my laptop is sending data to, and the total amount of data sent to each host. The TCP dump looks like this: 23:25:17.237834 IP 6.97.a86c.com.http > 10.0.0.6.53036: Flags [P.], seq 33007:33409, ack 24989, win 126, options [TS], length 2 23:25:17.237881 IP 10.0.0.6.53036 > 6.97.a86c.com.http: Flags [.], ack 2, win 4096, options [nop], length 0 23:25:17.237959 IP 10.0.0.6.53036 > 6.97.a86c.http: Flags [P.] options [nop,nop,TS val 1255619794 ecr 249923103], length 6 The Sumoshell command is: sudo tcpdump 2>/dev/null | sumo search | sumo parse "IP * > *:" as src, dest | sumo parse "length *" as length | sumo sum length by dest | render The Sumoshelll query language supports an adapted subset of the Sumo Logic query language, utilizing Unix pipes to shuttle data between operators. The output is: Some other helpful features of Sumoshell: Sumoshell understands that multiline log messages are one semantic unit, so if you search for Exception, you get the entire stack trace. Sumoshell lets you parse out pieces of your logs to just print the bits you care about or to use later in aggregations or transformations. Once you’ve parsed out fields like status_code or response_time_ms, you can count by status_code or average response_time_ms by status_code. If you wanted to do this for your weblogs, you could do something like: tail -f /var/log/webserver/http.log | sumo search "GET" | sumo parse "[status=*][response_time=*] as stat, rt | average rt by stat | render Once you’ve parsed fields, or aggregated the results with sum, count, or average, Sumoshell comes with intelligent pretty-printers to clearly display the aggregate data on the command line. They know how wide your terminal is so text won’t wrap and be hard to read. They figure out how many characters individual fields have, so the columns line up. They even let you see live updating graphs of your data, all in your terminal. You can learn more about Sumoshell at the Github repository where you can also download binaries, see the source, and contribute your own operators. If Sumoshell helps you analyze logs on one server, consider trying out Sumo Logic to use even more powerful tools on your entire fleet.

October 30, 2015

Blog

Regular Expressions - No Magic, Part 3

Blog

Regular Expressions - No Magic, Part 2

Blog

Regular Expressions - No Magic

Blog

Why You Should Never Catch Throwable In Scala

Scala is a subtle beast and you should heed its warnings. Most Scala and Java programmers have heard that catching Throwable, a superclass of all exceptions, is evil and patterns like the following should be avoided: #wrap_githubgist414fcc7a317a454da514 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n try {\n \n \n \n aDangerousFunction()\n \n \n \n } catch {\n \n \n \n case ex: Throwable => println(ex)\n \n \n \n // Or even worse\n \n \n \n case ex => println(ex)\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n dangerouspattern.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found This pattern is absurdly dangerous. Here’s why: The Problem In Java, catching all throwables can do nasty things like preventing the JVM from properly responding to a StackOverflowError or an OutOfMemoryError. Certainly not ideal, but not catastrophic. In Scala, it is much more heinous. Scala uses exceptions to return from nested closures. Consider code like the following: #wrap_githubgist11387293 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def inlineMeAgain[T](f: => T): T = {\n \n \n \n f\n \n \n \n }\n \n \n \n \n\n \n \n \n def inlineme(f: => Int): Int = {\n \n \n \n try {\n \n \n \n inlineMeAgain {\n \n \n \n return f\n \n \n \n }\n \n \n \n } catch {\n \n \n \n case ex: Throwable => 5\n \n \n \n }\n \n \n \n }\n \n \n \n \n\n \n \n \n def doStuff {\n \n \n \n val res = inlineme {\n \n \n \n 10\n \n \n \n }\n \n \n \n println("we got: " + res + ". should be 10")\n \n \n \n }\n \n \n \n doStuff\n \n\n\n \n\n \n \n\n\n \n \n view raw\n scalaclosures.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found We use a return statement from within two nested closures. This seems like it may be a bit of an obscure edge case, but it’s certainly possible in practice. In order to handle this, the Scala compiler will throw a NonLocalReturnControl exception. Unfortunately, it is a Throwable, and you’ll catch it. Whoops. That code will print 5, not 10. Certainly not what was expected. The Solution While we can say “don’t catch Throwables” until we’re blue in the face, sometimes you really want to make sure that absolutely no exceptions get through. You could include the other exception types everywhere you want to catch Throwable, but that’s cumbersome and error prone. Fortunately, this is actually quite easy to handle, thanks to Scala’s focus on implementing much of the language without magic—the “catch” part of the try-catch is just some sugar over a partial function—we can define partial functions! #wrap_githubgist11387402 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def safely[T](handler: PartialFunction[Throwable, T]): PartialFunction[Throwable, T] = {\n \n \n \n case ex: ControlThrowable => throw ex\n \n \n \n // case ex: OutOfMemoryError (Assorted other nasty exceptions you don't want to catch)\n \n \n \n \n \n \n \n //If it's an exception they handle, pass it on\n \n \n \n case ex: Throwable if handler.isDefinedAt(ex) => handler(ex)\n \n \n \n \n \n \n \n // If they didn't handle it, rethrow. This line isn't necessary, just for clarity\n \n \n \n case ex: Throwable => throw ex\n \n \n \n }\n \n \n \n \n\n \n \n \n // Usage:\n \n \n \n /*\n \n \n \n def doSomething: Unit = {\n \n \n \n try {\n \n \n \n somethingDangerous\n \n \n \n } catch safely {\n \n \n \n ex: Throwable => println("AHHH")\n \n \n \n }\n \n \n \n }\n \n \n \n */\n \n\n\n \n\n \n \n\n\n \n \n view raw\n safelythrowable.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found This defines a function “safely”, which takes a partial function and yields another partial function. Now, by simply using catch safely { /* catch block */ } we’re free to catch Throwables (or anything else) safely and restrict the list of all the evil exception types to one place in the code. Glorious.

May 5, 2014

Blog

Fuzzing For Correctness

Handling the Edge-Case Search-Space Explosion: A Case for Randomized Testing For much of software testing the traditional quiver of unit, integration and system testing suffices. Your inputs typically have a main case and a small set of well understood edge cases you can cover. But: periodically, we come upon software problems where the range of acceptable inputs and the number of potential code paths is simply too large to have an acceptable degree of confidence in correctness of a software unit (be it a function, module, or entire system). Enter The Fuzzer When we hit this case at Sumo we’ve recently started applying a technique well known in the security-community but less commonly used in traditional development: fuzzing. Fuzzing in our context refers to randomized software testing on valid and invalid inputs. In the software development world, fuzzing is commonly used to test compilers, the classic case of an exploding search space. It has also gained traction recently in the Haskell community with QuickCheck, a module that can automagically build test cases for your code and test it against a given invariant. ScalaCheck aims to do the same for Scala, the language we use primarily at Sumo Logic. The long and short of it is this: scala.util.Random coupled with a basic understanding of the range of inputs is better at thinking of edge cases than I am. At Sumo, our product centers around a deep query language rife with potential edge cases to handle. We recently started replacing portions of the backend system with faster, more optimized, alternatives. This presented us with a great opportunity to fuzz both code paths against hundreds of thousands of randomly generated test queries and verify the equivalence of the results. Fuzzers are great because they have no qualms about writing queries like “??*?r.” No human will ever write that query. I didn’t think about testing out that query. That doesn’t mean that no human will never be impacted by the underlying bug (*’s allowed the query parts to overlap on a document. Whoops.) Of course, I probably should have caught that bug in unit testing. But there is a limit to the edge cases we can conceive, especially when we get tunnel vision around what we perceive to be the weak spots in the code. Your fuzzer doesn’t care what you think the weak spots are, and given enough time will explore all the meaningful areas of the search space. A fuzzer is only constrained by your ability define the search space and the time you allow it to run. Fuzzing is especially useful if you already have a piece of code that is trusted to produce correct results. Even in the new-code case, however, fuzzing can still be invaluable for finding inputs that throw you into infinite loops or cause general crashes. In light of this, here at Sumo we’ve incorporated another test category into our hierarchy — Fuzzing tests, which sit in between unit tests and integration tests in our cadre of tests. Handling Unpredictability There are issues associated with incorporating random testing into your workflow. One should be rightfully concerned that your tests will be inherently flaky. Tests that whose executions are unpredictable, by necessity, have an associated stigma in the current testing landscape which rightfully strives for reproducibility in all cases. In light of this, we’ve established best practices for addressing those concerns in the randomized testing we do at Sumo. Each run of a randomized test should utilize a single random number generator throughout. The random number generator should be predictably seeded (System.currentTimeMillis() is popular), and that seed should be logged along with the test run. The test should be designed to be rerunnable with a specific seed. The test should be designed to output specific errors that can trivially (or even automatically) be pulled into a deterministic test suite. All errors caught by a randomized test should be incorporated into a deterministic test to prevent regressions. Following these guidelines allows us to create reproducible, actionable and robust randomized tests and goes a long way towards finding tricky corner cases before they can ever manifest in production. (To see the end result of all of our programming efforts, please check out Sumo Logic Free.)

August 14, 2012

Blog

3 Tips for Writing Performant Scala

Here at Sumo Logic we write a lot of Scala code. We also have a lot of data, so some of our code has to go really, really fast. While Scala allows us to write correct, clear code quickly, it can be challenging to ensure that you are getting the best performance possible. Two expressions which seem to be equivalent in terms of performance can behave radically differently. Only with an in-depth understanding the implementation of the language and the standard library can one predict which will be faster. For a great explanation of the implementation and details of the Scala language, I recommend reading Programming in Scala 2ed by Odersky, Spoon and Venners cover to cover. It’s worth every page. Short of reading the 800+ pages of Programming in Scala, here are 3 pieces of low hanging fruit to help improve the performance of your Scala code. 1. Understand the Collections! Users of Java are used to ArrayList with constant time lookup and amortized constant time append. In Scala, the object you get when you request a List(1, 2, 3) you get linked list. It can be prepended with objects using the “cons” (::) operator in constant time, but many other operations such as index based lookup, length, and append will run in linear time(!). If you want random access, you want an IndexedSeq. If you want constant time append use a ListBuffer. Read the collections chapter of Programming in Scala 2ed for all the details. 2. Be Lazy! Scala’s collection libraries allow us to write nearly any collection operation as short chain of functions. For example, let’s say we had a bunch of log entries. For each of them we wanted to extract the first word, pull them into groups of 8, then count the number of groups of 8 that contain the word “ERROR.” We would probably write that as: #wrap_githubgist3153362 .gist-data {max-height: 100%;} <pre>Not Found</pre> logs.map(_.takeWhile(_ != ‘ ‘)) will create an intermediate collection that we never use directly. If the size of logs was near our memory limit, the auxiliary collection could run us out of memory. To avoid generating the intermediate collections, we can run the operations on the list in a “lazy” manner. When we call the “.view” method on a Scala collection, it returns a view into the collection that provides lazy evaluation through a series of closures. For example, consider: #wrap_githubgist3153650 .gist-data {max-height: 100%;} <pre>Not Found</pre> If f(x) = x + 5, and g(x) = x * 2, then this is really just the functional composition of g(f(x)) — No reason to create the intermediate collections. A view runs transformations as functional composition instead of as a series of intermediate collections. So, going back to our initial example, the operation would become: #wrap_githubgist3153732 .gist-data {max-height: 100%;} <pre>Not Found</pre> The call to count will force the results of this computation to be evaluated. If your chain produces a collection on the other side (eg. just returning a subset of the logs), use .force to make it strict and return a concrete collection. Using lazy collections must be taken with a grain of salt — while lazy collections often can improve performance, they can also make it worse. For example: #wrap_githubgist3153369 .gist-data {max-height: 100%;} <pre>Not Found</pre> For this microbenchmark, the lazy version ran 1.5x faster than the strict version. However, for smaller values of n, the strict version will run faster. Lazy evaluation requires the creation of an additional closure. If creating the closures takes longer than creating intermediate collections, the lazy version will run slower. Profile and understand your bottlenecks before optimizing! 3. Don’t be Lazy! If you really need a piece of code to go fast, given the current state of Scala libraries and compiler, you’re going to need to write more of it. Sometimes (unfortunately) to write truly performant code in Scala, you need to write it like it’s C or Java. This means eschewing a lot of things you’ve come to love about Scala, such as: Use while loops instead of for loops. For loops create closures that can create significant overhead. Let me give some context for this: While-loop version code: #wrap_githubgist3153394 .gist-data {max-height: 100%;} <pre>Not Found</pre> The Scala code: #wrap_githubgist3153431 .gist-data {max-height: 100%;} <pre>Not Found</pre> Obviously the while-loop version will run faster, but the difference is surprising. In my benchmark, the while loop version ran on average in .557ms. The Scala version runs in 9.584ms. That is a 17x improvement! The exact reason is beyond the scope of this post, but in a nutshell, in the Scala version { x += 1 } is creating an anonymous class each time we want to increment x. For what it’s worth, this is issue 1338 on the Scala issue tracker, and there is a compiler plugin to perform a lot of these optimizations automatically. Replace convenience methods like exists, count, etc. with their hard-coded variants. For example, instead of: Version 1: #wrap_githubgist3153438 .gist-data {max-height: 100%;} <pre>Not Found</pre> Version 2: #wrap_githubgist3153445 .gist-data {max-height: 100%;} <pre>Not Found</pre> Version 2 gets a 3x speedup over version 1. Avoid objects when possible — use primitives instead. Whenever possible, the Scala compiler will insert JVM primitives for things like Ints, Booleans, Doubles and Longs. However, if you prevent this (by using an implicit conversion, etc.) and the compiler is forced to box your object, you will pay a significant performance cost. [Look for Value Classes to address this in future versions of Scala.] You could also specialize containers for primitive types, but that is beyond the scope of this post. I really hate suggesting that you write Scala like it’s C to get better performance. I really do. And I really enjoy programming in Scala. I hope in the future the standard library evolves in a way so that it becomes faster than hand-coding the C equivalent. The Bottom Line The first two suggestions get followed at Sumo Logic, and really boil down to a solid understanding of Scala’s standard library. The third suggestion gets followed very rarely if at all. This seems surprising — shouldn’t we be trying to write that fastest code possible? The answer, of course, is no. If we wanted to write the fastest code possible, we would essentially write Scala as if it were C. But then, why not just use C? There are multiple factors we need to optimize for here. By necessity, our services here at Sumo Logic are designed to scale horizontally. If we need better performance, we can spin up more nodes. It costs money. Developer time also costs money. Writing idiomatic Scala that fully utilizes the type-safety and functional properties of the language can and will produce code that runs slower than writing everything with while-loops in C-style. But slower code is OK. The critical trade-off for us is that writing clean Scala is faster, less error prone, and easier to maintain than the alternative. Scala’s performance shortcomings have garnered some criticism on the internet recently (and for good reason). This isn’t necessarily a reason not to use Scala. I suspect Scala performance will improve with time as the Scala compiler produces more optimized bytecode and the JVM gains native support for some of the functional features of the Scala language. The critical thing to recognize is that you must consider both developer and code performance in your optimizations. [Benchmarking Notes] Code was benchmarked with a script that first executes the function in question 10000 times, then runs it 1000 more times and computes the average. This ensures that the HotSpot JVM will JIT compile the code in question.