Blog › Technology

Sending CloudPassage Halo Event Logs to Sumo Logic

04.23.2013 | Posted by CloudPassage: Cloud Security

The below is a guest post from CloudPassage.

Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions.  For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful.

The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more.

The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur.

The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic.

Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script.  

Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below).


 

When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed.

 

Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches:

Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.

Universal Collection of Machine Data

04.18.2013 | Posted by Sanjay Sarathy, CMO

Customers love flexibility, especially if that flexibility drives additional business value.  In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation.  There may be a variety of reasons why you don’t want or can’t have local collectors  - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments.  Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.    

 

 

With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure.  They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.

With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure.  Sumo Logic provides that flexibility.

Dirty Haskell Phrasebook

04.05.2013 | Posted by Máté Kovács, Sumo Logic Intern

Whenever people ask me whether Hungarian is difficult to learn, I half-jokingly say that it can’t be too hard given that I had learned it by the time I turned three. Having said that, I must admit that learning a new language as a grown-up is a whole new ball game. Our struggle for efficiency is reflected in the way we learn languages: we focus on the most common patterns, and reuse what we know as often as possible.

Programming languages are no different. When I started at Sumo Logic just two months ago, I wanted to become fluent in Scala as quickly as possible. Having a soft spot for functional languages such as Haskell, a main factor in deciding to do an internship here was that we use Scala. I soon realized that a large subset of Haskell can easily be translated into Scala, which made the learning process a lot smoother so far.

You’ve probably guessed by now that this post is going to be a Scala phrasebook for Haskellers. I’m also hoping that it will give new insights to seasoned Scalaists, and spark the interest of programmers who are new to the functional paradigm. Here we go.

Basics

module Hello where
 
main :: IO ()
main = do
  putStrLn "Hello, World!"
object Hello {
 
  def main(args: Array[String]): Unit =
    println("Hello, World!")
}
 

While I believe that HelloWorld examples aren’t really useful, there are a few key points to make here.

The object keyword creates a singleton object with the given name and properties. Pretty much everything in Scala is an object, and has its place in the elaborate type hierarchy stemming from the root-type called Any. In other words, a set of types always has a common ancestor, which isn’t the case in Haskell. One consequence of this is that Scala’s ways of emulating heterogeneous collections are more coherent. For example, Haskell needs fairly involved machinery such as existential types to describe a list-type that can simultaneously hold elements of all types, which is simply Scala’s List[Any].

In Scala, every function (and value) needs an enclosing object or class. (In other words, every function is a method of some object.) Since object-orientation concepts don’t have direct analogues in Haskell, further examples will implicitly assume an enclosing object on the Scala side.

Haskell’s () type is Scala’s Unit, and its only value is called () just like in Haskell. Scala has no notion of purity, so functions might have side-effects without any warning signs. One particular case is easy to spot though: the sole purpose of a function with return type Unit is to exert side effects.

Values

answer :: Int
answer = 42
lazy val answer: Int = 42
 

Evaluation in Haskell is non-strict by default, whereas Scala is strict. To get the equivalent of Haskell’s behavior in Scala, we need to use lazy values (see also lazy collections). In most cases however, this makes no difference. From now on, the lazy keyword will be dropped for clarity. Besides val, Scala also has var which is mutable, akin to IORef and STRef in Haskell.

Okay, let’s see values of some other types.

question :: [Char]
question = "What's six by nine?"
val question: String =
  "What's six by nine?"
 

Can you guess what the type of the following value is?

judgement = (6*9 /= 42)
val judgement = (6*9 != 42)
 

Well, so can Haskell and Scala. Type inference makes it possible to omit type annotations. There are a few corner cases that get this mechanism confused, but a few well-placed type annotations will usually sort those out.

Data Structures

Lists and tuples are arguably the most ubiquitous data structures in Haskell.

In contrast with Haskell’s syntactic sugar for list literals, Scala’s notation seems fairly trivial, but in fact involves quite a bit of magic under the hood.

list :: [Int]
list = [3, 5, 7]
val list: List[Int] = List(3, 5, 7)
 

Lists can also be constructed from a head-element and a tail-list.

smallPrimes = 2 : list
val smallPrimes = 2 :: list
 

As you can see, : and :: basically switched roles in the two languages. This list-builder operator, usually called cons, will come in handy when we want to pattern match on lists (see Control Structures and Scoping below for pattern matching).

Common accessors and operations have the same name, but they are methods of the List class in Scala.

head list
list.head
tail list
list.tail
map func list
list.map(func)
zip list_1 list_2
list_1.zip(list_2)
 

If you need to rely on the non-strict evaluation semantics of Haskell lists, use Stream in Scala.

Tuples are virtually identical in the two languages.

tuple :: ([Char], Int)
tuple = (question, answer)
val tuple: (String, Int) =
  (question, answer)
 

Again, there are minor differences in Scala’s accessor syntax due to object-orientation.

fst tuple
tuple._1
snd tuple
tuple._2
 

Another widely-used parametric data type is Maybe, which can represent values that might be absent. Its equivalent is Option in Scala.

singer :: Maybe [Char]
singer = Just "Carly Rae Jepsen"
val singer: Option[String] =
  Some("Carly Rae Jepsen")
song :: Maybe [Char]
song = Nothing
val song: Option[String] =
  None
 

Algebraic data types translate to case classes.

data Tree
  = Leaf
  | Branch [Tree]
  deriving (Eq, Show)
sealed abstract class Tree
case class Leaf extends Tree
case class Branch(kids: List[Tree]) extends Tree
 

Just like their counterparts, case classes can be used in pattern matching (see Control Structures and Scoping below), and there’s no need for the new keyword at instantiation. We also get structural equality check and conversion to string for free, in the form of the equals and toString methods, respectively.

The sealed keyword prevents anything outside this source file from subclassing Tree, just to make sure exhaustive pattern lists don’t become undone.

See also extractor objects for a generalization of case classes.

Functions

increment :: Int -> Int
increment x = x + 1
def increment(x: Int): Int = x + 1
 

If you’re coming from a Haskell background, you’re probably not surprised that the function body is a single expression. For a way to create more complex functions, see let-expressions in Control Structures and Scoping below.

three = increment 2
val three = increment(2)
 

Most of the expressive power of functional languages stems from the fact that functions are values themselves, which leads to increased flexibility in reusing algorithms.

Composition is probably the simplest form of combining functions.

incrementTwice =
  increment . increment
val incrementTwice =
  (increment: Int => Int).compose(increment)
 

Currying, Partial Application, and Function Literals

Leveraging the idea that functions are values, Haskell chooses to have only unary functions and emulate higher arities by returning functions, in a technique called currying. If you think that isn’t a serious name, you’re welcome to call it schönfinkeling instead.

Here’s how to write curried functions.

addCurry :: Int -> Int -> Int
addCurry x y = x + y
def addCurry(x: Int)(y: Int): Int =
  x + y
 
five = addCurry 2 3
val five = addCurry(2)(3)
 

The rationale behind currying is that it makes certain cases of partial application very succinct.

addSix :: Int -> Int
addSix = addCurry 6
val addSix: Int => Int =
  addCurry(6)
 
val addSix = addCurry(6) : (Int => Int)
 
val addSix = addCurry(6)(_)
 

The type annotation is needed to let Scala know that you didn’t forget an argument but really meant partial application. If you want to drop the type annotation, use the underscore placeholder syntax.

To contrast with curried ones, functions that take many arguments at once are said to be uncurried. Scalaists seem to prefer their functions less spicy by default, most likely to save parentheses.

addUncurry :: (Int, Int) -> Int
addUncurry (x, y) = x + y
def addUncurry(x: Int, y: Int): Int =
  x + y
 
seven = addUncurry (2, 5)
val seven = addUncurry(2, 5)
 

Uncurried functions can still be partially applied with ease in Scala, thanks to underscore placeholder notation.

addALot :: Int -> Int
addALot =
  \x -> addUncurry (x, 42)
val addALot: Int => Int =
  addUncurry(_, 42)
 
val addALot =
  addUncurry(_: Int, 42)
 

When functions are values, it makes sense to have function literals, a.k.a. anonymous functions.

(brackets :: Int -> [Char]) =
  \x -> "<" ++ show x ++ ">"
val brackets: Int => String =
  x => "<%s>".format(x)
brackets = \(x :: Int) ->
  "<" ++ show x ++ ">"
val brackets =
  (x: Int) => "<%s>".format(x)
 

Infix Notation

In Haskell, any function whose name contains only certain operator characters will take its first argument from the left side when applied, which is infix notation if it has two arguments. Alphanumeric function names surrounded by backticks also behave that way. In Scala, any single-argument function can be used as an infix operator by omitting the dot and parentheses from the function call syntax.

data C = C [Char]
 
bowtie (C s) t =
  s ++ " " ++ t
 
(|><|) = bowtie
case class C(s: String) {
 
  def bowtie(t: String): String =
    s + " " + t
 
  val |><| = bowtie(_)
}
(C "James") |><| "Bond"
C("James") |><| "Bond"
(C "James") `bowtie` "Bond"
C("James") bowtie "Bond"
 

Haskell’s sections provide a way to create function literals from partially applied infix operators. They can then be translated to Scala using placeholder notation.

tenTimes = (10*)
val tenTimes = 10 * (_: Int)
 

Again, the type annotation is necessary so that Scala knows you meant what you wrote.

Higher-order Functions and Comprehensions

Higher order functions are functions that have arguments which are functions themselves. Along with function literals, they can be used to express complex ideas in a very compact manner. One example is operations on lists (and other collections in Scala).

map (3*) (filter (<5) list)
list.filter(_ < 5).map(3 * _)
 

That particular combination of map and filter can also be written as a list comprehension.

[3 * x | x <- list, x < 5]
for(x <- list if x < 5) yield (3 * x)
 

Control Structures and Scoping

Pattern matching is a form of control transfer in functional languages.

countNodes :: Tree -> Int
countNodes t = case t of
  Leaf -> 1
  (Branch kids) ->
    1 + sum (map countNodes kids)
def countNodes(t: Tree): Int =
  t match {
    case Leaf() => 1
    case Branch(kids) =>
      1 + kids.map(countNodes).sum
  }
 

For a definition of Tree, see the Data Structures section above.

Even though they could be written as pattern matching, if-expressions are also supported for increased readability.

if condition
  then expr_0
  else expr_1
if (condition)
  expr_0
else
  expr_1
 

Let expressions are indispensable in organizing complex expressions.

result =
  let
    v_0 = bind_0
    v_1 = bind_1
    -- ...
    v_n = bind_n
  in
   expr
val result = {
 
  val v_0 = bind_0
  val v_1 = bind_1
  // ...
  val v_n = bind_n
 
  expr
}
 

A code block evaluates to its final expression if the control flow reaches that point. Curly brackets are mandatory; Scala isn’t indentation-sensitive.

Parametric Polymorphism

I’ve been using parametric types all over the place, so it’s time I said a few words about them. It’s safe to think of them as type-level functions that take types as arguments and return types. They are evaluated at compile time.

[a]
List[A]
(a, b)
(A, B)
 
// desugars to
Tuple2[A, B]
Maybe a
Option[A]
a -> b
A => B
 
// desugars to
Function1[A, B]
a -> b -> c
A => B => C
 
// desugars to
Function2[A, B, C]
 

Type variables in Haskell are required to be lowercase, whereas they’re usually uppercase in Scala, but this is only a convention.

In this context, Haskell’s type classes loosely correspond to Scala’s traits, but that’s a topic for another time. Stay tuned.

Comments

-- single-line comment
// single-line comment
{-
Feel free to suggest additions
and corrections to the phrasebook
in the comments section below. :]
-}
/*
Feel free to suggest additions
and corrections to the phrasebook
in the comments section below. :]
*/
 

Here Be Dragons

Please keep in mind that this phrasebook is no substitute for the real thing; you will be able to write Scala code, but you won’t be able to read everything. Relying on it too much will inevitably yield some unexpected results. Don’t be afraid of being wrong and standing corrected, though. As far as we know, the only path to a truly deep understanding is the way children learn: by poking around, breaking things, and having fun.

 

Harder, Better, Faster, Stronger – Machine Data Analytics and DevOps

03.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager

Work It Harder, Make It Better

Do It Faster, Makes Us Stronger

More Than Ever Hour After

Our Work Is Never Over

     Daft Punk – “Harder, Better, Faster, Stronger”

 

When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in “Stronger”). So often, I find the “spirit” of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps “essence” – programmable architecture, agile development, and lean methodology – in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps “team”, and the continuous improvement aspects of lean and agile methodology.

So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps?  In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach.

 

1.  Let the data speak for itself

Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns – both good and ill – that would escape the inflexible tools of the past.

2.  Continuous reinterpretation

Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can’t wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now.

3. Any metric you want, any time you want it

The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers’ experience and the operation of your applications is truly groundbreaking. 

4. Set the data free

Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy.

5. Develop DevOps applications, not DevOps tools

The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at – developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders.

 

The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace – and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.

Using the transpose operator

02.19.2013 | Posted by Yan Qiao, Software Engineer

Sumo Logic lets you access your logs through a powerful query language.  In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators.  There are currently about two dozen operators available and we are constantly adding new ones.  In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.

Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:

2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 219.142.249.227 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449

There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds.   We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed.  One way to do that is to build a histogram of the response time using the following query:

stocktrade |  extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time

Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number.  The result looks like: 

Now, it would also be interesting to see how the distribution changes over time.   That is easy with the timeslice operator:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time

and the result looks like the following:

This gets the data we want, but it is not presented in a format that is easy to digest.  For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc.  Wouldn’t it be nice if we could rearrange the data into the following table?

That is exactly what transpose does:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time

Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.

This is especially useful when the data is visualized.  The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:

The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.

That’s it!  To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!

Why I joined Sumo Logic and Moved to Silicon Valley

01.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager

Entering StartUP

We make hundreds of decisions every day, mostly small ones, that are just part of life’s ebb and flow. And then there are the big decisions that don’t merely create ripples in the flow of your life - they redirect it entirely. The massive, life-defining decisions like marriage and children; the career-defining decisions like choosing your first job after college. I’ve had my share of career-defining decisions – leaving a physics graduate program to chase after the dot com craze, leaving consulting for sales engineering, etc. The thing about this latest decision is that it combines both. I am joining Sumo Logic, leaving behind a safe job in marketing, and moving to Silicon Valley – away from my friends, family, and community. So, why did I do it? 

 

Now is the time for Start-Ups in Enterprise Software. 

Consumer start-ups get all the press, but the enterprise startups are where the real action is. The rash of consolidations in the last five years or so has created an innovation gap that companies like Sumo Logic are primed to exploit.  The perfect storm of cloud computing, SaaS, Big Data, and DevOps/Agile is forcing customers to start looking outside of their comfort zones to find the solutions they need. Sumo Logic brings together all of that innovation in a way that is too good to not be a part of it.

The Enterprise SaaS Revolution is Inevitable.

The SaaS business model, combined with Agile development practices, is completely changing the ways companies buy enterprise software. Gartner sees companies replacing legacy software with SaaS more than ever. The antiquated term-licenses of on-premise software with its massive up-front costs, double digit maintenance charges, and “true-ups” seem positively barbaric by comparison to the flexibility of SaaS. And crucially for me, Sumo Logic is also one of the few true SaaS companies that is delving into the final frontier of the previously untouchable data center. 

Big Data is the “Killer App” for the Cloud.
“Big Data” analytics, using highly parallel-ized architectures like Hadoop or Cassandra, is one of the first innovations in enterprise IT to truly be “born in the cloud”. These new approaches were built to solve problems that just didn’t exist ten, or even five, years ago. The Big Data aspect of Sumo Logic is exciting to me. I am convinced that we are only scratching the surface of what is possible with Sumo Logic’s technology, and I want to be there on the bleeding edge with them.

Management Teams Matter.
When it really comes down to it, I joined Sumo Logic because I have first-hand knowledge of the skills that Sumo Logic’s management team brings to the table. I have complete confidence in Vance Loiselle’s leadership as CEO, and Sumo Logic has an unbeatable combination of know-how and get-it-done people . And clearly some of the top venture capital firms in the world agree with me. This is a winning team, and I like to win!

Silicon Valley is still Nirvana for Geeks and the best place for Start-Ups.
Other cities are catching up, but Silicon Valley is still the best place to start a tech company. The combination of brainpower, money, and critical mass is just hard to beat. On a personal level I have resisted the siren call of San Francisco Bay Area for too long. I am strangely excited to be in a place where I can wear my glasses as a badge of honor, and discuss my love for gadgets and science fiction without shame. Luckily for me, I am blessed with a wife that has embraced my geek needs, and supports me whole heartedly (and a 21-month-old who doesn’t care either way). 

So, here’s to a great adventure with the Sumo Logic team, to a new life in Silicon Valley, and to living on the edge of innovation. 

P.S.  If you want to see what I am so excited about, get a Sumo Logic Free account and check it out. 

Beyond LogReduce: Refinement and personalization

01.23.2013 | Posted by David Andrzejewski, Data Sciences Engineer

LogReduce is a powerful feature unique to the Sumo Logic offering. At the click of a single button, the user can apply the Summarize function to their previous search results, distilling hundreds of thousands of unstructured log messages into a discernible set of underlying patterns.

While this capability represents a significant advance in log analysis, we haven’t stopped there. One of the central principles of Sumo Logic is that, as a cloud-based log management service, we are uniquely positioned to deliver a superior service that learns and improves from user interactions with the system. In the case of LogReduce, we’ve added features that allow the system to learn better, more accurate patterns (refinement), and to learn which patterns a given user might find most relevant (personalization).

Refinement

Users have the ability to refine the automatically extracted signatures by splitting overly generalized patterns into finer-grained signatures or editing overly specific signatures to mark fields as wild cards. These modifications will then be remembered by the Sumo Logic system. As a result, all future queries run by users within the organization will be improved by returning higher-quality signatures.

Personalization

Personalized LogReduce helps users uncover the insights most important to them by capturing user feedback and using it to shape the ranking of the returned results. Users can promote or demote signatures to ensure that they do (or do not) appear at the top of Summarize results. Besides obeying this explicit feedback, Sumo Logic also uses this information to compute a relevance score which is used to rank signatures according to their content. These relevance profiles are individually tailored to each Sumo Logic user. For example, consider these Summarize query results:

 Results before feedback 

Since we haven’t given any feedback yet, their relevance scores are all equal to 5 (neutral) and they fall back to being ranked by count.

Promotion

Now, let’s pretend that we are in charge of ensuring that our database systems are functioning properly, so we promote one of the database-related signatures:

Results after promote

We can see that the signature we have promoted has now been moved to the top of the results, with the maximum relevance score of 10. When we do future Summarize queries, that signature will continue to appear at the top of results (unless we later choose to undo its promotion by simply clicking the thumb again).

The scores of the other two database-related signatures have increased as well, improving their rankings. This is because the content of these signatures is similar to the promoted database signature. This boost also will persist to future searches.

Demotion

This functionality works in the opposite direction as well. Continuing our running example, our intense focus on database management may mean that we find log messages about compute jobs to be distracting noise in our search results. We could try to “blacklist” these messages by putting Boolean negations in our original query string (e.g., “!comput*”), but this approach is not very practical or flexible. As we add more and more terms to our our search, it becomes increasingly likely that we will unintentionally filter out messages that are actually important to us. With Personalized LogReduce, we can simply demote one of the computation-related logs:

Results after demote

This signature then drops to the bottom of the results. As with promotion, the relevance and ranking of the other similar computation-related signature has also been lowered, and this behavior will be persisted across other Summarize queries for this user.

Implicit feedback

Besides taking into account explicit user feedback (promotion and demotion), Summarize can also track and leverage the implicit signals present in user behavior. Specifically, when a user does a “View Details” drill-down into a particular signature to view the raw logs, this is also taken to be a weaker form of evidence to increase the relevance scores of related signatures.

Conclusion

The signature refinement and personalized relevance extensions to LogReduce enable the Sumo Logic service to learn from experience as users explore their log data. This kind of virtuous cycle holds great promise for helping users get from raw logs to business-critical insights in the quickest and easiest way possible, and we’re only getting started. Try these features out on your own logs at no cost with Sumo Logic Free and let us know what you think!

Real-time Enterprise Dashboards, Really

11.14.2012 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

Today we shipped a highly anticipated new capability with a novel approach, novel not only to Sumo Logic, but also novel within our space: Real-time Enterprise Dashboards.  Dashboard technologies have been around for many years, but not all dashboard technologies are created equal.  Most existing technologies leverage either precomputed summary data sets or recompute the entire data set every time a dashboard is viewed.  As such, they suffer from long load times, stale information, an inability to handle the data volume.

Our customers faced a specific challenge: how to take terabytes of machine data per day, crunch it, transform it into information, and render that information in a way that supports making business and IT decisions in real time.  Now they can.

When machine data is used to troubleshoot and monitor today’s production applications or infrastructure, data volume is the enemy.  Large farms of Apache or IIS servers, SaaS and other applications, or data center infrastructure like VMware farms, Cisco networking gear, or Linux or Microsoft Windows server farms generate volumes of data that obey Moore’s Law: the data volume doubles every two years.  It only makes sense that the volume of machine data would follow Moore’s Law – if machine computing capacity doubles, those machines do twice the work, as a result they generate twice the amount of machine data that describes that work.

This exponential growth has put existing dashboarding technologies under an insurmountable strain. Some of us here at Sumo Logic built previous-generation dashboards in our past lives.  From our experience we realized that an entirely new approach is required to enable real-time monitoring and dashboarding and that realization drove development of a new architecture.

First, we adopted the cloud computing paradigm. That turned a data center into an API with lim(capacity)=∞.  This enabled us to spin up and spin down additional capacity truly on demand with a single API call.  Then we built our Streaming Query Engine that leverages that capacity in an elastic manner.  It continuously takes data off the wire and computes results before the data ever hits its permanent resting place.  This “one-time” computing is more efficient and less costly than traditional recompute methods.   When you view a Sumo Logic Dashboard, you simply attach to the existing state, which is continuously computed by our Stream Query Engine in the background.  What you get is freshest data available instantly enabling real-time visibility into your infrastructure or applications.  And they are beautiful to boot. 

Try it for yourself.

Securing the Enterprise Cloud – SOC 2 Compliance

10.16.2012 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

In our earlier post, Cloudy Compliance Part 1, we discuss general standards, regulations and some basic compliance concepts. In Part 2, we further explore the relevance of current standards and regulations, including the brief explanations of the American Institute of Certified Public Accountants (AICPA) and its Service Organization Control (SOC) reports.

Today we officially announced the successful completion of our SOC 2 Type 1 examination. Based on Trust Services Principles and Criteria, SOC 2 relates to enterprise-grade assurance, management and confidentiality capabilities.  It’s a significant validation for Sumo Logic, and further proof of the enterprise readiness of our cloud-based log management and analytics service.

What the announcement means to you
As part of SOC 2 examination, Sumo Logic received evaluations which reviewed control confidentiality and integrity of customer’s log data and other machine data in the following three, key areas:

  • Security – The system is protected against unauthorized access (both physical and logical).
  • Availability – The system is available for operation and use as committed or agreed.
  • Confidentiality – Information designated as confidential is protected as committed or agreed.

… Continue Reading

Scala at Sumo: type classes with a machine learning example

09.23.2012 | Posted by David Andrzejewski, Data Sciences Engineer

At Sumo Logic we use the Scala programming language, and we are always on the lookout for ways that we can leverage its features to write high-quality software. The type class pattern (an idea which comes from Haskell) provides a flexible mechanism for associating behaviors with types, and context bounds make this pattern particularly easy to express in Scala code.  Using this technique, we can extend existing types with new functionalities without worrying about inheritance.  In this post we introduce a motivating example and examine a few different ways to express our ideas in code before coming back to type classes, context bounds, and what they can do for us.

… Continue Reading

Twitter