Blog › Technology

Robert Sloan

Changing Representation

09.18.2014 | Posted by Robert Sloan

I don’t deal in veiled motives — I really like information theory. A lot. It’s been an invaluable conceptual tool for almost every area of my work; and I’m going to try to convince you of its usefulness for engineering problems. Let’s look at a timestamp parsing algorithm in the Sumo Logic codebase.

The basic idea is that each thread gets some stream of input lines (these are from my local /var/log/appfirewall.log), and we want to parse the timestamps (bolded) into another numeric field:

 

Jul 25 08:33:02 vorta.local socketfilterfw[86] <Info>: java: Allow TCP CONNECT (in:5 out:0)

Jul 25 08:39:54 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 1 time

Jul 25 08:42:40 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 1 time

Jul 25 08:43:01 vorta.local socketfilterfw[86] <Info>: java: Allow TCP LISTEN  (in:0 out:1)

Jul 25 08:44:17 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 6 time

 

Being a giant distributed system, we receive logs with hundreds of different timestamp formats, which are interleaved in the input stream. CPU time on the frontend is dedicated to parsing raw log lines, so if we can derive timestamps more quickly, we can reduce our AWS costs. Let’s assume that exactly one timestamp parser will match–we’ll leave ambiguities for another day.

How can we implement this? The naive approach is to try all of the parsers in an arbitrary sequence each time and see which one works; but all of them are computationally expensive to evaluate. Maybe we try to cache them or parallelize in some creative way? We know that caching should be optimal if the logs were all in the same format; and linear search would be optimal if they were randomly chosen.

In any case, the most efficient way to do this isn’t clear, so let’s do some more analysis: take the sequence of correct timestamp formats and label them:

 

Timestamp

Format

Label

Jul 25 08:52:10

MMM dd HH:mm:ss

Format 1

Fri Jul 25 09:06:49 PDT 2014

EEE MMM dd HH:mm:ss ZZZ yyyy

Format 2

1406304462

EpochSeconds

Format 3

[Jul 25 08:52:10]

MMM dd HH:mm:ss

Format 1

 

How can we turn this into a normal, solvable optimization problem? Well, if we try our parsers in a fixed order, the index label is actually just the number of parsing attempts before hitting the correct parser. Let’s keep the parsers in the original order and add another function that reorders them, and then we’ll try them in that order:

 

Format

Parser Label

Parser Index

MMM dd HH:mm:ss

Format 1

2

EEE MMM dd HH:mm:ss ZZZ yyyy

Format 2

1

EpochSeconds

Format 3

3

 

This is clearly better, and we can change this function on every time step. Having the optimal parser choice be a low number is always better, because we’re trying to minimize the time delay of the parsing process:

 

 (Time Delay) (# Tries)

 

 But can we really just optimize over that? It’s not at all clear to me how that translates into an algorithm. While it’s a nice first-order formulation, we’re going to have to change representations to connect it to anything more substantial.

 

Parser Index

Parser Index (Binary)

Parser Index (Unary)

2

10

11

1

1

1

3

11

111

 

This makes it clear that making the parser index small is equivalent to making its decimal/binary/unary representation small. In other words, we want to minimize the information content of the index sequence over our choice of parsers.

 In mathematical terms, the information (notated H) is just the sum of -p log p over each event, where p is the event’s probability. As an analogy, think of -log p as the length of the unary sequence (as above) and p as the probability of the sequence — we’ll use the experimental probability distribution over the parser indices that actually occur.

 As long as the probability of taking more tries is strictly decreasing, minimizing it also minimizes the time required because the information is strictly increasing with the number of tries it takes.

 

arg min{Time Delay} =arg min{Sequence Length * Probability of sequence}

=arg min {-p(# Tries) * log(p(# Tries)) } = arg min{ H(# Tries) }

 

That’s strongly suggestive that what we want to use as the parser-order-choosing function is actually a compression function, whose entire goal in life is to minimize the information content (and therefore size) of byte sequences. Let’s see if we can make use of one: in the general case, these algorithms look like Seq(Int) Seq(Int), making the second sequence  shorter.

 

Parser Index Sequence: Length 13

Parser Index (LZW Compressed): Length 10

12,43,32,64,111,33,12,43,32,64,111,33,12

12,43,32,64,111,33,256,258,260,12

 

Let’s say that we have some past sequence — call it P — and we’re trying to find the next parser-index mapping. I admit that it’s not immediately clear how to do this with a compression algorithm a priori, but if we just perturb the algorithm, we can compare the options for the next functions as:

 

newInfo(parser label) = H(compress(P + [parser label]))-H(compress(P))

 

Any online compression algorithm will allow you to hold state so that you don’t have to repeat computations in determining this. Then, we can just choose the parser with the least newInfo; and if the compressor will minimize information content (which I’ll assume they’re pretty good at), then our algorithm will minimize the required work. If you’d like a deeper explanation of compression, ITILA [1] is a good reference.

 With a fairly small, reasonable change of representation, we now have a well-defined, implementable, fast metric to make online decisions about parser choice. Note that this system will work regardless of the input stream — there is not a worst case except those of the compression algorithm. In this sense, this formulation is adaptive.

 Certainly, the reason that we can draw a precise analogy to a solved problem is because analogous situations show up in many fields, which at least include Compression/Coding, Machine Learning [2], and Controls [3]. Information theory is the core conceptual framework here, and if I’ve succeeded in convincing you, Bayesian Theory [4] is my favorite treatment.

 

References:

  1. Information Theory, Inference, and Learning Algorithms by David MacKay

  2. Prediction, Learning, and Games by Nicolo Cesa-Bianchi and Gabor Lugosi.

  3. Notes on Dynamic Programming and Optimal Control by Demitri Bertsekas

  4. Bayesian Theory by Jose Bernardo and Adrian Smith

Russell

No Magic: Regular Expressions, Part 3

09.11.2014 | Posted by Russell

Evaluating the NFA

In part 1, we parsed the regular expression into an abstract syntax tree. In part 2, we converted that syntax tree into an NFA. Now it’s time evaluate that NFA against a potential string.

NFAs, DFAs and Regular Expressions

Recall from part 2 that there are two types of finite automata: deterministic and non-deterministic. They have one key difference: A non-deterministic finite automata can have multiple paths out of the same node for the same token as well as paths that can be pursued without consuming input. In expressiveness (often referred to as “power”), NFAs, DFAs and regular expressions are all equivalent. This means if you can express a rule or pattern, (eg. strings of even length), with an NFA, you can also express it with a DFA or a regular expression. Lets first consider a regular expressionabc* expressed as a DFA:

regexdfa.png

Evaluating a DFA is straightforward: simply move through the states by consuming the input string. If you finish consuming input in the match state, match, otherwise, don’t. Our state machine, on the other hand, is an NFA. The NFA our code generates for this regular expression is:

dfavsnfa.png

Note that there are multiple unlabeled edges that we can follow without consuming a character. How can we track that efficiently? The answer is surprisingly simple: instead of tracking only one possible state, keep a list of states that the engine is currently in. When you encounter a fork, take both paths (turning one state into two). When a state lacks a valid transition for the current input, remove it from the list.

There are 2 subtleties we have to consider: avoiding infinite loops in the graph and handling no-input-transitions properly. When we are evaluating a given state, we first advance all the states to enumerate all the possible states reachable from our current state if we don’t consume any more input. This is the phase that also requires care to maintain a “visited set” to avoid infinitely looping in our graph. Once we have enumerated those states, we consume the next token of input, either advancing those states or removing them from our set

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
object NFAEvaluator {
def evaluate(nfa: State, input: String): Boolean =
evaluate(Set(nfa), input)
 
def evaluate(nfas: Set[State], input: String): Boolean = {
input match {
case "" =>
evaluateStates(nfas, None).exists(_ == Match())
case string =>
evaluate(
evaluateStates(nfas, input.headOption),
string.tail
)
}
}
 
def evaluateStates(nfas: Set[State],
input: Option[Char]): Set[State] = {
val visitedStates = mutable.Set[State]()
nfas.flatMap { state =>
evaluateState(state, input, visitedStates)
}
}
 
def evaluateState(currentState: State, input: Option[Char],
visitedStates: mutable.Set[State]): Set[State] = {
 
if (visitedStates contains currentState) {
Set()
} else {
visitedStates.add(currentState)
currentState match {
case placeholder: Placeholder =>
evaluateState(
placeholder.pointingTo,
input,
visitedStates
)
case consume: Consume =>
if (Some(consume.c) == input
|| consume.c == '.') {
Set(consume.out)
} else {
Set()
}
case s: Split =>
evaluateState(s.out1, input, visitedStates) ++
evaluateState(s.out2, input, visitedStates)
case m: Match =>
if (input.isDefined) Set() else Set(Match())
}
}
}
}
view raw regexblog11.scala hosted with ❤ by GitHub

And that's it!

Put a bow on it

We’ve finished all the important code, but the API isn’t as clean as we’d like. Now, we need to create a single-call user interface to call our regular expression engine. We’ll also add the ability to match your pattern anywhere in the string with a bit of syntactic sugar.

1 2 3 4 5 6 7 8 9 10 11 12
object Regex {
def fullMatch(input: String, pattern: String) = {
val parsed = RegexParser(pattern).getOrElse(
throw new RuntimeException("Failed to parse regex")
)
val nfa = NFA.regexToNFA(parsed)
NFAEvaluator.evaluate(nfa, input)
}
 
def matchAnywhere(input: String, pattern: String) =
fullMatch(input, ".*" + pattern + ".*")
}
view raw regexblog11.scala hosted with ❤ by GitHub

To use it:

1 2 3
Regex.fullMatch("aaaaab", "a*b") // True
Regex.fullMatch("aaaabc", "a*b") // False
Regex.matchAnywhere("abcde", "cde") // True
view raw regexblog12.scala hosted with ❤ by GitHub

That’s all there is to it. A semi-functional regex implementation in just 106 lines. There are a number of things that could be added but I decided they added complexity without enough value:

  1. Character classes
  2. Value extraction
  3. ?
  4. Escape characters
  5. Any many more.

I hope this simple implementation helps you understand what’s going on under the hood! It’s worth mentioning that the performance of this evaluator is heinous. Truly terrible. Perhaps in a future post I’ll look into why and talk about ways to optimize it…

Ozan Unlu

Debugging to Customer Hugging – Becoming an SE

09.10.2014 | Posted by Ozan Unlu

“I know app developers, and that’s not you!” It was a statement that I couldn’t really argue with, and it was coming from one of my closest friends. It didn’t matter that I was employed in a career as an app developer at one of the top software companies in the world. It didn’t matter that I was performing well and the tools and applications I coded were being used by hundreds of internal developers. It didn’t even matter that the friend making the conclusion had never written a single line of code in his life, nor had he any idea of my technical ability. The funny thing was, he meant it as a compliment, and so began the biggest career transition of my life.

Coding and logic puzzles were always very intuitive to me, so I always enjoyed solving a variety of technical challenges. Yet, articulation, interpersonal communication and cross-team collaboration were some of my other strong suits I felt weren’t being used in my professional life. My career ambitions to be the biggest success possible combined with my desire to fulfill my potential always had me wondering if there was a role better suited for me where I would be able to leverage both diverse skills sets. Over the years I had many mentors and through all the various conversations and constructive criticism, the same trend was always prevalent. They all thought I could be more successful within a Program Manager or Technical Lead role as it would allow me to take advantage of these strengths that were being under-used in a purely development-focused role. So I made those career moves, but decided to stay within the company. After all, I didn’t want to cast away the experience and knowledge I had gained during my role there, and believed it would propel me in my new roles as they were in a related field. It did; I continued to be successful, and it was certainly a step in the right direction, but needed to be taken further. I had tunnel vision and when I looked at my career, all my choices seemed a little too safe. It was time to take a risk.

I was informed of the Sales Engineering role as it could be the perfect position for me to stretch my wings and use my full potential. The more I looked into it, the better it seemed. I would be a technical expert with deep knowledge of the product while at the same time selling the value of the solution to potential clients. I would be listening to the customer’s needs and educating them on whether or not our product would be the best fit for them. After spending so much time on research and development teams creating software with the same handful of peers every day, the prospect of working with a mixture of clients who were the top engineering minds in the world across a plethora of different technologies was enticing. Just the ability to work with these industry leaders in a variety of different challenges allowed me to solve more technical problems than I was ever able to do as a developer working on a only a handful of projects over the course of a year. I had warmed up to the idea and it was time to commit to something new.

There is one area of the world that people consistently consider the “Mecca of Tech,” and that is the San Francisco / Silicon Valley Bay Area. That was settled. If I was going to go into sales, I had promised myself I would never sell a product in which I didn’t have full confidence, so I needed to find a company with a product I really believed in. Enter Sumo Logic: a fully cloud based data analytics and machine learning solution.

Curious, I created a free account and played around with the product. In a very short time, I could see the impressive power and versatile functionality, the value it could provide to nearly any tech company. Also growing at a tremendous rate, supported by the top investors and sporting a unique combination of relatively low risk and high upside, I couldn’t craft an argument to deter myself from joining the company. I interviewed, and when offered, accepted the job. After committing, what awaited me next felt like a breath of fresh air.

Joining a start up from a large company and transitioning into the sales field from development, I didn’t know what type of culture to expect. What awaited me was a company culture where team members are genuinely and actively supportive, and it was awesome. In the first couple months I learned more about various technologies in the market than I ever knew existed before. I work with customers and drastically improve their systems, processes and consequently their careers. I did not expect to be able to contribute to our internal product development process yet I have our best engineers coming to ask which direction we should take our features. Being able to work with customers and feel like you’re truly helping them while at the same time continuing to design and engineer a product on the cutting edge is the best of both worlds, and the sizable increase in compensation isn’t a bad side effect either. I have no regrets in making the biggest career transition of my life, I’m happier than I’ve ever been and I’m not looking back.

If you want to join Ozan and work as a Sales Engineer at Sumo, click here!

Russell

No Magic: Regular Expressions, Part 2

09.05.2014 | Posted by Russell

The code for this post, as well as the post itself, are on github.

Converting the Parse Tree to an NFA

In the last post, we transformed the flat string representation of a regular expression to the hierarchical parse tree form. In this post we’ll transform the parse tree to a state machine. The state machine linearizes the regex components into a graph, producing an “a follows b follows c” relationship. The graph representation will make it easier to evaluate against a potential string.

Why make yet another transformation just to match a regex?

It would certainly be possible to translate directly from a string to a state machine. You could even attempt to evaluate the regex directly from the parse tree or the string. Doing this, however, would likely involve significantly more complicated code. By slowly lowering the level of abstraction, we can ensure that the code at each stage is easy to understand and reason about. This is especially important when building something like a regular expression evaluator with literally infinite edge cases.

NFAs, DFAs, and You

We’ll be creating a state machine called an NFA, or nondeterministic finite automata. It has two types of components: states and transitions. When we diagram it, states will be represented as circles and transitions will be arrows. The match state will be a double circle. If a transition is labeled, it means that we must consume that character from the input to make that transition. Transitions can also be unlabeled — that means that we can make the transition without consuming input. Aside: In other literature, you may see this represented as ε. This state machine represents the simple regex “ab”:

ab.dot.png

Our nodes can have multiple valid subsequent states for a given input, as in the following diagram where there are two paths consuming the same input leaving a node:

nfa.dot.png

Contrast this with deterministic finite automata, where, as the name suggests, a given input can only result in one state change. This relaxation in constraints will make evaluating it a bit more tricky, but it makes it much easier to generate them from our tree, as we’ll see shortly. Fundamentally, NFAs and DFAs are equivalent in the state machines they can represent.

Transforming, in Theory

Let’s lay out a strategy for transforming our parse tree info an NFA. While it may seem daunting, we’ll see that by breaking the process into composable pieces, it can be straightforward. Recall the syntax elements that we’ll need to transform:

1. Characters Literals: Lit(c: Char)

2. *Repeat(r: RegexExpr) 

3. +Plus(r: RegexExpr) 

4. Concatenation: Concat(first: RegexExpr, second: RegexExpr) 

5. |Or(a: RegexExpr, b: RegexExpr)

The transformations that follow were initially outlined by Thompson in his 1968 paper. In the transformation diagrams,In will refer the to the entry point of the state machine and Out will refer to the exit. In practice those can be the “Match” state, the “Start” state, or other regex components. The In / Out abstraction allows us to compose and combine state machines, a key insight. For example, if we have a state machine that matches “abc” and another that matches “cde”, connecting them sequentially will yield a state machine that matches “abccde”. We’ll apply this principle more generally to transform from each element in our syntax tree into a composable state machine.

Let’s start with character literals. A character literal is a transition from one state to another that consumes input. Consuming the literal ‘a’ would look like this:

lit.dot.png

Next lets examine concatenation. To concatenate two components of the parse tree, we just need to link them together with a empty arrow. Note that in this example, just as a Concat can contain the concatenation of two arbitrary regular expressions, A and B in the diagram can both be state machines, not just individual states. Something a little bit weird happens if A and B are both literals. How do we connect to arrows together with no intervening states? The answer is that when necessary, literals can have phantom states at each end to make our state machine sound.Concat(A, B) would transform to:

concat.dot.png

To represent Or(A, B) we need to branch from the start state into the two separate states. Once those machines terminate, they must both point to the subsequent (Out) state:

split.dot.png

Lets consider *. Star can be 0 or more repetitions of a pattern. To do that, we want one wire pointing directly to the next state, and one looping back to our current state via AA* would look like:

astar.dot.png

For +, we’ll use a little trick. a+ is just aa*. Generalizing, Plus(A) can be rewritten to Concat(A, Repeat(A)) We’ll rewrite it to that, rather than designing a special pattern for it. I’ve included + in the language for a specific reason: to demonstrate how other, more complicated regular expression elements that I’ve omitted can often be expressed in terms of our language.

Transforming, in practice

Now that we’ve made a theoretical plan, lets dive in to how we’ll actually code it. We’ll be creating a mutable graph to store our tree. While I’d prefer immutability, making immutable graphs is annoyingly difficult, and I’m lazy.

If we were to try to boil the above diagrams into core components, we’d end up with three types of components: arrows that consume input, the match state, and one state splitting into two states. I know this seems a little odd, and potentially incomplete. You’ll have to trust me that this choice leads to the cleanest code. Here are our 3 NFA component classes:

abstract class State

class Consume(val c: Char, val out: State) extends State
class Split(val out1: State, val out2: State) extends State
case class Match() extends State

Aside: I made Match a case class instead of a regular class. Case classes in Scala bring certain sensible defaults to your class. I used it because it gives value based equality. This makes all Match objects equal, a useful property. For the other types of NFA components, we want reference equality.

Our code will recursively traverse our syntax tree, keeping an andThen object as a parameter. andThen is the what we will attach to the free hanging outputs of our expression. This is required because from an aribtrary branch in your syntax tree, you lack the context of what comes next – andThen allows us pass that context down as we recurse. It also gives us an easy way to append the Match state.

When it comes time to handle Repeat we’ll run into a problem. The andThen for Repeat is the split operator itself. To deal with this issue, we’ll introduce a placeholder to let us bind it later. We’ll represent the placeholder with this class:

class Placeholder(var pointingTo: State) extends State

The var in Placeholder means the pointingTo is mutable. It is the isolated bit of mutability allowing us easily create a cyclical graph. All the other members are immutable.

To start things off, andThen is Match() – this means that we’ll create a state machine matching our Regex which can then transfer to the Match state without consuming anymore input. The code is short but dense:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
object NFA {
def regexToNFA(regex: RegexExpr): State =
regexToNFA(regex, Match())
 
private def regexToNFA(regex: RegexExpr,
andThen: State): State = {
regex match {
case Literal(c) => new Consume(c, andThen)
case Concat(first, second) => {
// Convert first to an NFA. The "out" of that is
// the result of converting second to an NFA.
regexToNFA(first, regexToNFA(second, andThen))
}
case Or(l, r) => new Split(
regexToNFA(l, andThen),
regexToNFA(r, andThen)
)
 
case Repeat(r) =>
val placeholder = new Placeholder(null)
val split = new Split(
// One path goes to the placeholder,
// the other path goes back to r
regexToNFA(r, placeholder),
andThen
)
placeholder.pointingTo = split
placeholder
 
case Plus(r) =>
regexToNFA(Concat(r, Repeat(r)), andThen)
}
}
}
view raw regexblog10.scala hosted with ❤ by GitHub

And we're done! The word to line of code ratio in this post is high -- each line encodes a lot of information, but boils down the the transformations we discussed in the previous section. I should clarify I didn't just sit down and write the code like this -- getting it this short and to the point was the result of several iterations on the data-structures and algorithm. Clean code is hard.

In the debugging process, I wrote a quick script to generate dot files from NFAs so I could look at the NFAs I was actually generating to debug issues. Note that the NFAs this code generates contain lots of unnecessary transitions and states — When writing this, I’ve tried to keep the code as simple as possible, at the cost of performance. Here are some examples for simple regular expressions:

(..)*

evenstrs.dot.png

ab

ab.dot.png

a|b

aorb.dot.png

Now that we have an NFA (the hard part) all we have to do is evaluate it (the easier part). Come back next week for the wrapup post on writing the evaluator, or just read the code for yourself on github.

Caleb Sotelo

How To Add Sumo Logic To A Dockerized App

08.27.2014 | Posted by Caleb Sotelo

This is a guest post from Caleb Sotelo who is a software engineer at OpenX and has been reprinted with his permission.  You can view the original here.  

kuniyoshi_utagawa_the_sumo_wrestler

Sumo Logic is a nifty service that makes it easy watch and analyze your apps’ logs. You install a collector local to the app in question, point it at some files to watch, and your logs are sent to the cloud, where they can be sliced and diced from a nice web UI.

At a high level, Sumo Logic wants to transform big data logs into operational information. But at a practical level, it’s nice not to have to SSH to a production machine and tail or grep enormous files to take your app’s pulse. Not to mention the idea is totally in line with one of the Twelve Factors: treating logs as event streams enables better understanding of an app’s behavior over time.

I’ve had success using SumoLogic at OpenX, so I wanted to give their free tier a shot for a personal project. The only limitations are a 500MB of data per day limit and 7 days of retention. I was surprised not to find anything on the web for installing Sumo Logic alongside a Dockerized app, and I had a couple of Docker-based candidates. So without further ado, here’s how to add Sumo Logic to a Dockerized app:

1. Sign up for Sumo Logic Free

Head over to sumologic.com/signup to sign up. The only catch here is that you’ll need a company email address. For the project I’m going to use SumoLogic for, I own and manage my own domain, so it wasn’t too much trouble to create an email address using my registrar’s mail service. Since I host the domain separately, I did have to add an MX record to my zone file to point to the registrar’s mail server. For example, with DigitalOcean.

2. Download a Collector

Once you confirm your email address and log in, you be stepped through a process for downloading and installing a collector. I chose the Installed Collector and downloaded sumocollector_19.91-2_amd64.deb, becuase my Docker image is based on Ubuntu.

sumo_logic_choose_collector_screenshot

After downloading the collector, the setup wizard proceeds to a screen that spins until it detects a newly installed collector. I didn’t yet know how I was going to install it, and I got logged out of Sumo Logic anyway due to inactivity, so I abandoned the wizard at that point. The Sumo Logic UI changed itself as soon as it detected that my first collector had been installed.

  • As I plan to install the Sumo Logic collector during the docker build process, I uploaded the .deb file to a Dropbox and grabbed the public link to use later.

3. Create Access Keys

When a collector client is installed it has to have some way of authenticating to the Sumo Logic server. The docs for creating a sumo.conf file (we’ll get there soon) offer two choices: (1) provide your Sumo Logic email and password, or (2) provide access keys generated from the UI. The latter is recommended if only to avoid storing a username/password in plaintext. Keys can be generated from ManageCollectorsAccess KeysCreate.

4. Augment your Docker Container

Here’s the Docker-specific part of installing Sumo Logic. We’ll add some lines to our app’s Dockerfile and author two files that are ADDed to the container during a docker build. I assume working knowledge of Docker, but here is the list of Dockerfile commands for good measure.

4.1 Create sumo.conf

First create a sumo.conf file like the following:

name={collector_name}  
accessid={your_access_id}  
accesskey={your_access_key}  

 

where name is an arbitrary name for this collector, and accessid and accesskey are those generated in step 3. There are many more conf options specified here but the important ones, namely sources, can actually be configured through the UI later on.

By convention I put Docker-specific files into .docker/{resource}, so this one goes to .docker/sumo/sumo.conf. It’ll be referenced in our Dockerfile shortly.

4.2 Modify your Dockerfile

Add a block like the following to your Dockerfile (assumed to live in the root of your app’s code), preferably before your actual app is added:

# install sumologic
RUN apt-get -qq update  
RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  
RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  
ADD .docker/sumo/sumo.conf /etc/sumo.conf  
ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

 

Let’s break this down:

RUN apt-get -qq update  

Update sources. This may not be necessary, but I like to put this before each dependancy installed by my Dockerfile to avoid issues with image caching.

RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  

We’ll use wget to grab the collector file we uploaded in step 2. You may opt to ADD the file locally, but this option avoids having to check the resource into your app’s source code, while housing it in a consistent location. Better practice would be to store it in some kind of artifact repository and version it.

RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  

Install the debian package and clean up.

ADD .docker/sumo/sumo.conf /etc/sumo.conf  

Copy the newly created sumo.conf file to the place where the collector expects to find it.

Before we get to the last line, let’s pause. If you were able to catch the output from installing the collector, you saw something like:

Preparing to unpack sumocollector_19.91-2_amd64.deb ...
Unpacking sumocollector (1:19.91-2) ...
Setting up sumocollector (1:19.91-2) ...
configuring collector....
configuring collector to run as root
Detected Ubuntu:
Installing the SumoLogic Collector daemon using init.d..
 Adding system startup for /etc/init.d/collector ...
   /etc/rc0.d/K20collector -> ../init.d/collector
   /etc/rc1.d/K20collector -> ../init.d/collector
   /etc/rc6.d/K20collector -> ../init.d/collector
   /etc/rc2.d/S20collector -> ../init.d/collector
   /etc/rc3.d/S20collector -> ../init.d/collector
   /etc/rc4.d/S20collector -> ../init.d/collector
   /etc/rc5.d/S20collector -> ../init.d/collector
Collector has been successfully installed. Please provide account credential in /etc/sumo.conf and start it up via service or init.d script!

 

It was only after sifting through my docker output that I saw this and learned about the existence of a sumo.conf file. Before that, nothing was happening in the Sumo Logic UI because no collector had been correctly installed and started, even when I started the container. Anyway, we got /etc/sumo.conf out of the way, so what about starting it up “via service or init.d script”?

My solution was to include a simple bash script that starts the collector service on startup. But my Dockerfile extends phusion/baseimage-docker, which uses a custom init system. So the last Dockerfile command,

ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

adds a file called start_sumo like:

#!/bin/bash
service collector start  

into /etc/my_init.d. Make sure it’s executable with chmod +x. Like the conf file, this is saved into .docker/sumo/start_sumo of the app code repository.

I am very open to more elegant ways for getting the Sumo Logic collector to start. I’d also like to see how non-baseimage users deal with init requirements. I would have done this as a runit script as recommended by the baseimage-docker README, but the collector script appears to automatically daemonize itself, which breaks runit.

5. Build and Deploy!

I ran docker build and docker run as usual, and voilà!, the newly installed collector popped up in ManageCollectors.

6. Configure Sources

Before we start seeing logs, we have to tell Sumo what a log file is. I clicked ManageCollectorsAddAdd Source and added a Local File entry that had the absolute path to a log file I was interested in. One of the Sumo Logic videos I watched noted that specifying /path/to/log/dir/** will pick up all log files in a directory.

sumo_logic_manage_collectors_screenshot

I waited a couple of minutes, and log messages started coming into the UI. Sweet! Keep in mind that multiple sources can be added for a single collector.


So far, I’ve learned that I can get a bird’s eye view of all my logs from ManageStatus, and look at actual log messages from Search. I haven’t spent time really getting to know the various queries yet, but if they’re worth writing about, expect another post.

Possible Improvement: The above example installs Sumo Logic inside the app container. An alternate approach might have Sumo installed on the host (or in its own Docker container), reading log files from a shared data volume. This has the benefits of (1) requiring only a single Sumo Logic install for potentially more than one app container, and (2) architectural separation of app from log consumption.

That’s it! This turned out to be surprisingly simple. Kudos to Sumo Logic for offering an easy to use service + free tier that’s totally feasible for smallish apps.

Russell

No Magic: Regular Expressions

08.18.2014 | Posted by Russell

The code for this post, as well as the post itself, are on github.

Until recently, regular expressions seemed magical to me. I never understood how you could determine if a string matched a given regular expression. Now I know! Here’s how I implemented a basic regular expression engine in under 200 lines of code.

The Spec

Implementing full regular expressions is rather cumbersome, and worse, doesn’t teach you much. The version we’ll implement is just enough to learn without being tedious. Our regular expression language will support:

  • : Match any character
  • : Match abc or cde
  • : Match one or more of the previous pattern
  • : Match 0 or more of the previous pattern
  •  and  ) for grouping

While this is a small set of options, we’ll still be able to make some cute regexes, like m (t|n| ) | b to match Star Wars subtitles without matching Star Trek ones, or (..)* the set of all even length strings.

The Plan of Attack

We’ll evaluate regular expressions in 3 phases: 1. Parse the regular expression into a syntax tree 2. Convert the syntax tree into a state machine 3. Evaluate the state machine against our string

We’ll use a state machine called an NFA to evaluate regular expressions (more on that later). At a high level, the NFA will represent our regex. As we consume inputs, we’ll move from state to state in the NFA. If we get to a point where we can’t follow an allowed transition, the regular expression doesn’t match the string.

This approach was originally demonstrated by Ken Thompson, one of the original authors of Unix. In his 1968 CACM paper he outlines the implementation of a text editor and includes this as a regular expression evaluator. The only difference is that his article is written 7094 machine code. Things used to be way more hard core.

This algorithm is a simplification of how non-backtracking engines like RE2 evaluate regular expressions in provably linear time. It’s notably different from the regex engines found in Python and Java that use backtracking. When given certain inputs, they’ll run virtually forever on small strings. Ours will run in O(length(input) * length(expression).

My approach roughly follows the strategy Russ Cox outlines in his excellent blog post.

Representing a Regular Expression

Lets step back and think about how to represent a regular expression. Before we can hope to evaluate a regular expression, we need to convert it into a data structure the computer can operate on. While strings have a linear structure, regular expressions have a natural hierarchy.

Lets consider the string abc|(c|(de)). If you were to leave it as a string, you’d have to backtrack and jump around as you tried to keep track of the different sets of parenthesis while evaluating the expression. One solution is converting it to a tree, which a computer can easily traverse. For example, b+a would become:

syntaxex.png

To represent the tree, we’ll want to create a hierarchy of classes. For example, our Or class will need to have two subtrees to represent its two sides. From the spec, there are 4 different regular expression components we’ll need to recognize: +*|, and character literals like .a and b. In addition, we’ll also need to be able to represent when one expression follows another. Here are our classes:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
abstract class RegexExpr
 
// ., a, b
case class Literal(c: Char) extends RegexExpr
 
// a|b
case class Or(expr1: RegexExpr, expr2: RegexExpr) extends RegexExpr
 
// ab -> Concat(a,b); abc -> Concat(a, Concat(b, c))
case class Concat(first: RegexExpr, second: RegexExpr) extends RegexExpr
 
// a*
case class Repeat(expr: RegexExpr) extends RegexExpr
 
// a+
case class Plus(expr: RegexExpr) extends RegexExpr
view raw blog1.scala hosted with ❤ by GitHub

Parsing a Regular Expression

To get from a string to a tree representation, we’ll use conversion process known as “parsing.” I’m not going to talk about building parsers in a high level of detail. Rather, I’ll give just enough to point you in the right direction if you wanted to write your own. Here, I’ll give an overview of how I parsed regular expressions with Scala’s parser combinator library.

Scala’s parser library will let us write a parser by just writing the set of rules that describe our language. It uses a lot of obtuse symbols unfortunately, but I hope you’ll be able to look through the noise to see the gist of what’s happening.

When implementing a parser we need to consider order of operations. Just as “PEMDAS” applies to arithmetic, a different set of rules apply to regular expressions. We can express this more formally with the idea of an operator “binding” to the characters near it. Different operators “bind” with different strengths — as an analogy, * binds more tightly than + in expressions like 5+6*4. In regular expressions, * binds more tightly than |. If we were to represent parsing this as a tree the weakest operators end up on top.

multex.png

It follows that we should parse the weakest operators first, followed by the stronger operators. When parsing, you can imagine it as extracting an operator, adding it to the tree, then recursing on the remaining 2 parts of the string.

In regular expressions the order of binding strength is: 1. Character Literal & parentheses 2. + and * 3. “Concatenation” — a is after b 4. |

Since we have 4 levels of binding strength, we need 4 different types of expressions. We named them (somewhat arbitrarily): litlowExpr (+*), midExpr (concatenation), and highExpr (|). Lets jump into the code. First we’ll make a parser for the most basic level, a single character:

1 2 3 4
object RegexParser extends RegexParsers {
def charLit: Parser[RegexExpr] = ("""\w""".r | ".") ^^ {
char => Literal(char.head)
}
view raw regexblog2.scala hosted with ❤ by GitHub

Lets take a moment to explain the syntax. This defines a parser that will build a RegexExpr. The right hand side says: “Find something that matches \w (any word character) or a period. If you do, turn it into a Literal.

Parentheses must be defined at the lowest level of the parser since they are the strongest binding. However, you need to be able to put anything in parentheses. We can accomplish this with the following code:

1 2
def parenExpr: Parser[RegexExpr] = "(" ~> highExpr <~ ")"
def lit: Parser[RegexExpr] = charLit | parenExpr
view raw regexblog3.scala hosted with ❤ by GitHub

Here, we’ll define * and +:

1 2 3 4 5
def repeat: Parser[RegexExpr] = lit <~ "*"
^^ { case l => Repeat(l) }
def plus: Parser[RegexExpr] = lit <~ "+"
^^ { case p => Plus(p) }
def lowExpr: Parser[RegexExpr] = repeat | plus | lit
view raw regexblog4.scala hosted with ❤ by GitHub

Next, we’ll define concatenation, the next level up:

1 2 3
def concat: Parser[RegexExpr] = rep(lowExpr)
^^ { case list => listToConcat(list)}
def midExpr: Parser[RegexExpr] = concat | lowExpr
view raw regexblog.scala hosted with ❤ by GitHub

Finally, we’ll define or:

1 2
def or: Parser[RegexExpr] = midExpr ~ "|" ~ midExpr
^^ { case l ~ "|" ~ r => Or(l, r)}
view raw regexblog6.scala hosted with ❤ by GitHub

Lastly, we’ll define highExprhighExpr is an or, the weakest binding operator, or if there isn’t an or, a midExpr.

1
def highExpr: Parser[RegexExpr] = or | midExpr
view raw regexblog7.scala hosted with ❤ by GitHub

Finally, a touch of helper code to finish it off:

1 2 3 4 5 6 7 8 9 10 11 12
def listToConcat(list: List[RegexExpr]): RegexExpr = list match {
case head :: Nil => head
case head :: rest => Concat(head, listToConcat(rest))
}
 
def apply(input: String): Option[RegexExpr] = {
parseAll(highExpr, input) match {
case Success(result, _) => Some(result)
case failure : NoSuccess => None
}
}
}
view raw regexblog8.scala hosted with ❤ by GitHub

That’s it! If you take this Scala code you’ll be able to generate parse trees for any regular expression that meets the spec. The resulting data structures will be trees.

Now that we can convert our regular expressions into syntax trees, we’re well underway to being able to evaluate them. I wrote about the next steps in part 2. As always, the code is on Github.

Vance Loiselle, CEO

Sequoia Joins The Team and Eight Lessons of a First Time CEO

05.20.2014 | Posted by Vance Loiselle, CEO

I originally envisioned this blog as a way to discuss our recent $30 million funding, led by our latest investor, Sequoia Capital, with full participation from Greylock, Sutter Hill and Accel. I’ve been incredibly impressed with the whole Sequoia team and look forward to our partnership with Pat. Yet despite 300 enterprise customers (9 of the Global 500), lots of recent success against our large competitor, Splunk, and other interesting momentum metrics, I’d rather talk about the ride and lessons learned from my first two years as a CEO.

  1. It’s Lonely. Accept It and Move On.  My mentor, former boss and CEO of my previous company told me this, years ago. But at the time, it applied to him and not me (in hindsight I realize I did not offer much help). But, like being a first time parent, you really can’t fathom it until you face it yourself. I’m sure there’s some psychology about how certain people deal with it and others don’t. I’m constantly thinking about the implications of tactical and strategic decisions. I’ve learned that if you’re too comfortable, you’re not pushing hard enough. The best advice I can give is to find a Board member you can trust, and use him or her as a sounding board early and often.
  2. Trust Your Gut.  There have been many occasions when I have been given good advice on key decisions. One problem with good advice, is you can get too much of it, and it isn’t always aligned. The best leader I ever met, and another long-time mentor, would always ask, ‘what is your gut telling you?’ More often than not, your gut is right. The nice thing about following your instincts, the only person to blame if it goes awry is yourself.
  3. Act Like It’s Your Money. I grew up in Maine, where $100,000 can still buy a pretty nice house. When I first moved to California from Boston it took me some time to get accustomed to the labor costs and other expenses. The mentality in most of the top startups in Silicon Valley is “don’t worry, you can always raise OPM (other people’s money)”. Though I understand the need to invest ahead of the curve, especially in a SaaS-based business like ours, I also believe too much funding can cause a lack of discipline. People just expect they can hire or spend their way around a problem.
  4. Don’t Be Arrogant. Just saying it almost disqualifies you. Trust me, I have come across all kinds. Backed by arguably the four best Venture Capital firms in the business, I have had plenty of opportunities to meet other CEOs, founders and execs.  Some are incredible people and leaders. Some, however, act like they and their company are way too valuable and important to treat everyone with respect. Life is too short not to believe in karma.
  5. Listen Carefully. If a sales rep is having trouble closing deals, put yourself in his shoes and figure out what help he needs. If the engineering team is not meeting objectives fast enough, find out if they really understand the customer requirements. Often the smallest tweaks in communication or expectations can drastically change the results. Lastly, listen to your customer(s). It is very easy to write off a loss or a stalled relationship to some process breakdown, but customers buy from people they trust.  Customers trust people who listen.
  6. It’s a People Business. Software will eat the world, but humans still make the decisions. We’re building a culture that values openness and rapid decision-making while aligning our corporate mission with individual responsibilities. This balance is a constant work in process and I understand that getting this balance right is a key to successfully scaling the Sumo Logic business.
  7. Find the Right VCs at the Right Time. I can’t take any credit for getting Greylock or Sutter Hill to invest in our A and B rounds, respectively.  But I do have them to thank for hiring me and helping me. We partnered with Accel in November of 2012 and now Sequoia has led this recent investment. Do not underestimate the value of getting high quality VCs. Their access to customers, top talent, and strategic partners is invaluable.  Not to mention the guidance they give in Board meetings and at times of key decisions. The only advice I can give here is:  1) know your business cold, 2) execute your plan and 3) raise money when you have wind at your back.  Venture Capitalists make a living on picking the right markets with the right teams with the right momentum. Markets can swing (check Splunk’s stock price in last 3 months) and momentum can swing (watch the Bruins in the Stanley Cup – never mind they lost to the Canadiens).
  8. Believe. It may be cliché, but you have to believe in the mission. If you haven’t watched Twelve O’Clock High, watch it. It’s not politically correct, but it speaks volumes about how to lead and manage. You may choose the wrong strategy or tactics at times. But you’ll never know if you don’t have conviction about the goals.

OK, so I’m no Jack Welch or Steve Jobs, and many of these lessons are common sense. But no matter how much you think you know, there is way more that you don’t. Hopefully one person will be a little better informed or prepared by my own experience.

 

Jacek Migdal

Building Scala at Scale

05.12.2014 | Posted by Jacek Migdal

The Scala compiler can be brutally slow. The community has a love-hate relationship with it. Love means “Yes, scalac is slow”. Hate means, “Scala — 1★ Would Not Program Again”. It’s hard to go a week without reading another rant about the Scala compiler.

Moreover, one of the Typesafe co-founders left the company shouting, “The Scala compiler will never be fast” (17:53). Even Scala inventor Martin Odersky provides a list of fundamental reasons why compiling is slow.

At Sumo Logic, we happily build over 600K lines of Scala code[1] with Maven and find this setup productive. Based on the public perception of the Scala build process, this seems about as plausible as a UFO landing on the roof of our building. Here’s how we do it:

Many modules

At Sumo Logic, we have more than 120 modules. Each has its own source directory, unit tests, and dependencies. As a result, each of them is reasonably small and well defined. Usually, you just need to modify one or a few of them, which means that you can just build them and fetch binaries of dependencies[2].

Using this method is a huge win in build time and also makes the IDE and test suites run more quickly. Fewer elements are always easier to handle.

We keep all modules in single GitHub repository. Though we have experimented with a separate repository for each project, keeping track of version dependencies was too complicated.

Parallelism on module level

Although Moore’s law is still at work, single cores have not become much faster since 2004. The Scala compiler has some parallelism, but it’s nowhere close to saturating eight cores[3] in our use case.

Enabling parallel builds in Maven 3 helped a lot. At first, it caused a lot of non-deterministic failures, but it turns out that always forking the Java compiler fixed most of the problems[4]. That allows us to fully saturate all of the CPU cores during most of the build time. Even better, it allows us to overcome other bottlenecks (e.g., fetching dependencies).

Incremental builds with Zinc

Zinc brings features from sbt to other build systems, providing two major gains:

  • It keeps warmed compilers running, which avoids the startup JVM “warm-up tax”.
  • It allows incremental compilation. Usually we don’t compile from a clean state, we just make a simple change to get recompiled. This is a huge gain when doing Test Driven Development.

For a long time we were unable to use Zinc with parallel modules builds. As it turns out, we needed to tell Zinc to fork Java compilers. Luckily, an awesome Typesafe developer, Peter Vlugter, implemented that option and fixed our issue.

Time statistics

The following example shows the typical development workflow of building one module. For this benchmark, we picked the largest one by lines of code (53K LOC).

Example module build time

This next example shows building all modules (674K LOC), the most time consuming task.

 Total build time

Usually we can skip test compilation, bringing build time down to 12 minutes.[5]

Wrapper utility

Still, some engineers were not happy, because:

  • Often they build and test more often than needed.
  • Computers get slow if you saturate the CPU (e.g., video conference becomes sluggish).
  • Passing the correct arguments to Maven is hard.

Educating developers might have helped, but we picked the easier route. We created a simple bash wrapper that:
Runs every Maven process with lower CPU priority (nice -n 15); so the build process doesn’t slow the browser, IDE, or a video conference.

  • Makes sure that Zinc is running. If not, it starts it. 
  • Allows you to compile all the dependencies (downstream) easily for any module.
  • Allows you to compile all the things that depend on a module (upstream).
  • Makes it easy to select the kind of tests to run.

Though it is a simple wrapper, it improves usability a lot. For example, if you fixed a library bug for a module called “stream-pipeline” and would like to build and run unit tests for all modules that depend on it, just use this command:

bin/quick-assemble.sh -tu stream-pipeline

Tricks we learned along the way

  1. Print the longest chain of module dependency by build time.
    That helps identify the “unnecessary or poorly designed dependencies,” which can be removed. This makes the dependency graph much more shallow, which means more parallelism.
  2. Run a build in a loop until it fails.
    As simple as in bash:  while bin/quick-assemble.sh; do :; done.
    Then leave it overnight. This is very helpful for debugging non-deterministic bugs, which are common in a multithreading environment.
  3. Analyze the bottlenecks of build time.
    CPU? IO? Are all cores used? Network speed? The limiting factor can vary during different phases. iStat Menus proved to be really helpful.
  4. Read the Maven documentation.
    Many things in Maven are not intuitive. The “trial and error” approach can be very tedious for this build system. Reading the documentation carefully is a huge time saver.

Summary

Building at scale is usually hard. Scala makes it harder, because relatively slow compiler. You will hit the issues much earlier than in other languages. However, the problems are solvable through general development best practices, especially:

  • Modular code
  • Parallel execution by default
  • Invest time in tooling

Then it just rocks!

 

[1] ( find ./ -name ‘*.scala’ -print0 | xargs -0 cat ) | wc -l
[2] All modules are built and tested by Jenkins and the binaries are stored in Nexus.
[3] The author’s 15-inch Macbook Pro from late 2013 has eight cores.
[4] We have little Java code. Theoretically, Java 1.6 compiler is thread-safe, but it has some concurrency bugs. We decided not to dig into that as forking seems to be an easier solution.
[5] Benchmark methodology:

  • Hardware: MacBook Pro, 15-inch, Late 2013, 2.3 GHz Intel i7, 16 GB RAM.
  • All tests were run three times and median time was selected.
  • Non-incremental Maven goal: clean test-compile.
  • Incremental Maven goal: test-compile. A random change was introduced to trigger some recompilation.
Russell

Why You Should Never Catch Throwable In Scala

05.05.2014 | Posted by Russell

Scala is a subtle beast and you should heed its warnings. Most Scala and Java programmers have heard that catching Throwable, a superclass of all exceptions, is evil and patterns like the following should be avoided:

1 2 3 4 5 6 7
try {
aDangerousFunction()
} catch {
case ex: Throwable => println(ex)
// Or even worse
case ex => println(ex)
}

This pattern is absurdly dangerous. Here’s why:

The Problem

In Java, catching all throwables can do nasty things like preventing the JVM from properly responding to a StackOverflowError or an OutOfMemoryError. Certainly not ideal, but not catastrophic. In Scala, it is much more heinous. Scala uses exceptions to return from nested closures. Consider code like the following:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
def inlineMeAgain[T](f: => T): T = {
f
}
 
def inlineme(f: => Int): Int = {
try {
inlineMeAgain {
return f
}
} catch {
case ex: Throwable => 5
}
}
 
def doStuff {
val res = inlineme {
10
}
println("we got: " + res + ". should be 10")
}
doStuff
view raw scalaclosures.scala hosted with ❤ by GitHub

We use a return statement from within two nested closures. This seems like it may be a bit of an obscure edge case, but it’s certainly possible in practice. In order to handle this, the Scala compiler will throw a NonLocalReturnControl exception. Unfortunately, it is a Throwable, and you’ll catch it. Whoops. That code will print 5, not 10. Certainly not what was expected.

The Solution

While we can say “don’t catch Throwables” until we’re blue in the face, sometimes you really want to make sure that absolutely no exceptions get through. You could include the other exception types everywhere you want to catch Throwable, but that’s cumbersome and error prone. Fortunately, this is actually quite easy to handle, thanks to Scala’s focus on implementing much of the language without magic—the “catch” part of the try-catch is just some sugar over a partial function—we can define partial functions!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
def safely[T](handler: PartialFunction[Throwable, T]): PartialFunction[Throwable, T] = {
case ex: ControlThrowable => throw ex
// case ex: OutOfMemoryError (Assorted other nasty exceptions you don't want to catch)
//If it's an exception they handle, pass it on
case ex: Throwable if handler.isDefinedAt(ex) => handler(ex)
// If they didn't handle it, rethrow. This line isn't necessary, just for clarity
case ex: Throwable => throw ex
}
 
// Usage:
/*
def doSomething: Unit = {
try {
somethingDangerous
} catch safely {
ex: Throwable => println("AHHH")
}
}
*/

This defines a function “safely”, which takes a partial function and yields another partial function. Now, by simply using catch safely { /* catch block */ } we’re free to catch Throwables (or anything else) safely and restrict the list of all the evil exception types to one place in the code. Glorious.

Sanjay Sarathy, CMO

Sumo Logic, ServiceNow and the Future of Event Management

04.29.2014 | Posted by Sanjay Sarathy, CMO

Today’s reality is that companies have to deal with disjointed systems when it comes to detecting, investigating and remediating issues in their infrastructure.  Compound that with the exponential growth of machine data and you have a recipe for frustrated IT and security teams who are tasked with uncovering insights from this data exhaust and then remediating issues as appropriate.  Customer dissatisfaction, at-risk SLAs and even revenue misses are invariable consequences of this fragmented approach.  

With our announcement today of a certified integration with ServiceNow, companies now have a closed loop system that makes it much easier for organizations to uncover known and unknown events in Sumo Logic and then immediately create alerts and incidents in ServiceNow.  The bi-directional integration supports the ability for companies to streamline the entire change management process, capture current and future knowledge, and lay the groundwork for integrated event management capabilities.  This integration takes advantage of all the Sumo Logic analytics capabilities, including LogReduce and Anomaly Detection, to identify what’s happening in your enterprise, even if you never had rules to detect issues in the first place.  

ServiceNow Integration

The cloud-to-cloud integration of ServiceNow and Sumo Logic also boosts productivity by eliminating the whole concept of downloading, installing and managing software.  Furthermore, IT organizations also have the ability to elastically scale their data analytics needs to meet the service management requirements of the modern enterprise.

Let us know if you’re interested in seeing our integration with ServiceNow.  And while you’re at it, feel free to register for Sumo Logic Free.  It’s a zero price way to understand how our machine data analytics service works.

PS – check out our new web page which provides highlights of recent capabilities and features that we’ve launched. 

Twitter