David Carlton

Posts by David Carlton


Improving Your Performance via Method Objects

When Sumo Logic receives metrics data, we put those metrics datapoints into a Kafka queue for processing. To help us distribute the load, that Kafka queue is broken up into multiple Kafka Topic Partitions; we therefore have to decide which partition is appropriate for a given metrics datapoint. Our logic for doing that has evolved over the last year in a way that spread the decision logic out over a few different classes; I thought it was time to put it all in one place. My initial version had an interface like this: def partitionFor(metricDefinition: MetricDefinition): TopicPartition As I started filling out the implementation, though, I began to feel a little bit uncomfortable. The first twinge was when calculating which branch to go down in one of the methods: normally, when writing code, I try to focus on clarity, but when you’re working at the volumes of data that Sumo Logic has to process, you have to keep efficiency in mind when writing code that is evaluated on every single data point. And I couldn’t convince myself that one particular calculation was quite fast enough for me to want to perform it on every data point, given that the inputs for that calculation didn’t actually depend on the specific data point. So I switched over to a batch interface, pulling that potentially expensive branch calculation out to the batch level: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val perMetric = calculateWhetherToPartitionPerMetric() metricDefinitions.map { metric => partitionFor(metric, perMetric) } } private def partitionFor(metricDefinition: MetricDefinition, perMetric: Boolean): TopicPartition = { if (perMetric) { ... } else { ... } } } That reduced the calculation in question from once per data point to once per batch, getting me past that first problem. But then I ran into a second such calculation that I needed, and a little after that I saw a call that could potentially translate into a network call; I didn’t want to do either of those on every data point, either! (The results of the network call are cached most of the time, but still.) I thought about adding them as arguments to partitionFor() and to methods that partitionFor() calls, but passing around three separate arguments would make the code pretty messy. To solve this, I reached a little further into my bag of tricks: this calls for a Method Object. Method Object is a design pattern that you can use when you have a method that calls a bunch of other methods and needs to pass the same values over and over down the method chain: instead of passing the values as arguments, you create a separate object whose member variables are the values that are needed in lots of places and whose methods are the original methods you want. That way, you can break your implementation up into methods with small, clean signatures, because the values that are needed everywhere are accessed transparently as member variables. In this specific instance, the object I extracted had a slightly different flavor, so I’ll call it a “Batch Method Object”: if you’re performing a calculation over a batch, if every evaluation needs the same data, and if evaluating that data is expensive, then create an object whose member variables are the data that’s shared by all batches. With that, the implementation became: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val batchPartitionSelector = new BatchPartitionSelector metricDefinitions.map(batchPartitionSelector.partitionFor) } private class BatchPartitionSelector { private val perMetric = calculateWhetherToPartitionPerMetric() private val nextExpensiveCalculation = ... ... def partitionFor(metricDefinition: MetricDefinition): TopicPartition = { if (perMetric) { ... } else { ... } } ... } } One question that came up while doing this transformation was whether every single member variable in BatchPartitioner was going to be needed in every batch, no matter what the feature flag settings were. (Which was a potential concern, because they would all be initialized at BatchPartitioner creation time, every time this code processes a batch.) I looked at the paths and checked that most were used no matter the feature flag settings, but there was one that only mattered in some of the paths. This gave me a tradeoff: should I wastefully evaluate all of them anyways, or should I mark that last one as lazy? I decided to go the route of evaluating all of them, because lazy variables are a little conceptually messy and they introduce locking behind the scenes which has its own efficiency cost: those downsides seemed to me to outweigh the costs of doing the evaluation in question once per batch. If the potentially-unneeded evaluation had been more expensive (e.g. if it had involved a network call), however, then I would have made them lazy instead. The moral is: keep Method Object (and this Batch Method Object variant) in mind: it’s pretty rare that you need it, but in the right circumstances, it really can make your code a lot cleaner. Or, alternatively: don’t keep it in mind. Because you can actually deduce Method Object from more basic, more fundamental OO principles. Let’s do a thought experiment where I’ve gone down the route of performing shared calculations once at the batch level and then passing them down through various methods in the implementation: what would that look like? The code would have a bunch of methods that share the same three or four parameters (and there would, of course, be additional parameters specific to the individual methods). But whenever you see the same few pieces of data referenced or passed around together, that’s a smell that suggests that you want to introduce an object that has those pieces of data as member variables. If we follow that route, we’d apply Introduce Parameter Object to create a new class that you pass around, called something like BatchParameters. That helps, because instead of passing the same three arguments everywhere, we’re only passing one argument everywhere. (Incidentally, if you’re looking for rules of thumb: in really well factored code, methods generally only take at most two arguments. It’s not a universal rule, but if you find yourself writing methods with lots of arguments, ask yourself what you could do to shrink the argument lists.) But then that raises another smell: we’re passing the same argument everywhere! And when you have a bunch of methods called in close proximity that all take exactly the same object as one of their parameters (not just an object of the same type, but literally the same object), frequently that’s a sign that the methods in question should actually be methods on the object that’s a parameter. (Another way to think of this: you should still be passing around that same object as a parameter, but the parameter should be called this and should be hidden from you by the compiler!) And if you do that (I guess Move Method is the relevant term here?), moving the methods in question to BatchParameters, then BatchParameters becomes exactly the BatchPartitionSelector class from my example. So yeah, Method Object is great. But more fundamental principles like “group data used together into an object” and “turn repeated function calls with a shared parameter into methods on that shared parameter” are even better. And what’s even better than that is to remember Kent Beck’s four rules of simple design: those latter two principles are both themselves instances of Beck’s “No Duplication” rule. You just have to train your eyes to see duplication in its many forms.

May 9, 2017


A Year at Sumo Logic

Launching a company is the equivalent of getting flung into the eye of a tornado. Today, we are reprinting a blog post from our very own David Carlton, who on his personal blog has summarized the Sumo Logic launch experience from his personal perspective, reflecting on the state of the company as it is growing quickly and in the eye of the public. -Christian Beedgen, Co-Founder and CTO The startup that I’ve been working at for the last year, Sumo Logic, has now launched its product! Our product is a service for gathering, searching, and analyzing logs: if you have software that’s generating log files, you point our collector at those files and it will upload them to our service, at which point you can slice and dice them however you want. You can do that with logs from one program or one machine, but you can also do that for logs from hundreds or thousands of machines: we’ll happily accept whatever you throw at us. The thing that sealed the deal with me to join Sumo Logic was meeting all the employees when I interviewed and realizing that I would actively enjoy working with every one of them.I’ve been working on distributed systems for a while; in particular, StreamStar was a distributed system of heterogeneous software running on heterogeneous machines. And when you’re working with a distributed system, surprises are going to happen; I love my unit tests, but when you’re pushing large amounts of data while wanting to meet tight performance limits, surprises are going to happen, every once in a while a piece of data won’t be where you expect it to be. And, when that happens, you need to piece together a timeline to understand and learn from the event; getting logs from all your different components and putting together a story from all of them is the way to do that. But you won’t be able to put together a story if you don’t have a lot of logs; the flip side, though, is that you need to be able to track a single event across those logs from different machines without being overwhelmed by all the other events that are in them. So you need to deal with a lot of data while searching within it to focus on a single event while popping back out when the need arises to gather more information and test a hypothesis. We tried to do that on StreamStar, but it was hard, and the log volume was overwhelming; Sumo Logic is also a homogeneous distributed system and hence is vulnerable to the same problem, but the difference is that we can use our own product to analyze what’s going on within it! Which is awesome. After working on StreamStar, I joined Playdom, working on their business intelligence team. There, we had to deal with logs for a different reason: instead of understanding what the different components of our own software were doing, we needed to understand what our players were doing. We needed to understand what drew players into our games, how long they stayed, what they spent money on. We had a very good set of homegrown tools written by some extremely talented engineers. The problem was, though, that people on game teams would ask us quite natural questions that we couldn’t answer, because the homegrown tools had to be focused on doing specific types of analysis on a handful of prebaked log types to be able to perform well. As I moved out of Playdom’s business intelligence team, they’d just begun overhauling their log infrastructure to be able to do a wider range of analysis (though, I think, still with prebaked log types?); Sumo Logic’s tools, however, will accept whatever lines of text you throw at it, and let you search, parse, and analyze it. No need to spend years of engineering effort to get that benefit (years that a startup can’t afford to spend!): just stick log lines into your software, install a collector, and start querying away. That’s my background; the Sumo Logic founders and some of the other early employees come from a different space, however, namely security. And that means that we’re quite happy to accept and analyze log files generated by third-party software (firewalls, routers, web servers) instead of logs in software that you wrote yourself. In that context, you’re trying to figure out how your systems are being used, and whether and how they’re being misused. That sort of analysis sometimes looks like the distributed system analysis that I mentioned in my StreamStar example: if there’s a specific security breach that you’re trying to track across systems, it can look a lot like tracking down an anomaly in a distributed system. But there’s also a different sort of analysis, where you’re trying to detect a statistical signal of malicious behavior out of a sea of normal behavior. The tools that we’re developing for that are rather fascinating. The first one to be released is the “summarize” operator: after using search to pick out a general class of logs, pipe the result through summarize, and Sumo Logic will cluster them for you. You can drill into clusters, teach the system what clusters are interesting and what clusters are expected, and in general work on teasing a signal out of what seems like noise. Useful for unexpected security events; but it’s also useful to just run a scheduled search every hour or every day where you throw all your warning and error logs at summarize to learn how your system’s behavior changes from day to day. (And, believe me, running in AWS, your system’s behavior will change from day to day…) I mentioned my coworkers above: the thing that sealed the deal with me to join Sumo Logic was meeting all the employees when I interviewed (I ended up becoming the tenth employee) and realizing that I would actively enjoy working with every one of them. Normally, when interviewing even with a good company, there are some people whom I’m looking forward to work with, some I’m indifferent about, some I’m a little unsure about, and then there are all the people whom the people in charge of hiring don’t even trust to throw in front of job candidates. Not so with Sumo Logic: I learn something from Christian and Kumar, the cofounders, every time I talk to them; the other early hires are extremely sharp as well; and we’ve kept a quite high caliber as we’ve (slowly!) expanded since then. So: if you’re writing a service, want to add logs to understand your software’s behavior and/or your users’ behavior, but don’t want to manage those logs, take a look at us! Or if you’re running lots of servers and want to keep track of what they’re doing, we can help with that, too! You can get a free demo account if you want to play around with the product on canned data, or sign up for a free trial if you want to feed in your own data. Or, if you’re a programmer who likes to work with large amounts of data or distributed systems or is curious about Scala, we’re hiring! (We’re hiring for non-engineering positions, too.) I had my one-year anniversary a couple of weeks ago; it’s been a great year, I’m looking forward to many more great years. Originally published on May 8, 2012.