Securing Protected Health Information
05.09.2013 | Posted by Joan Pepin, Director of Security
Pharmacy networks, electronic medical records, third-party billing, referrals— the medical establishment in this country runs on shared data. To ensure the safety and proper use of all of this highly sensitive and widely-shared information the US Congress passed the Health Insurance and Portability Act of 1996 (HIPAA). This law has changed the way healthcare related businesses operate inside the United States, and has had wide-reaching and expensive effects on every aspect of the healthcare industry.
There is no central certification authority for HIPAA, and the onus is on individual medical providers to ensure they are compliant with all of the appropriate “rules” within the act. HIPAA, while affording important protection, is a complex and cumbersome regulation with potentially severe civil and criminal penalties for violation. As such, compliance with the act is of utmost importance to “covered entities” (largely, billing providers, employer sponsored health plans, health insurers, and medical service providers, including doctor’s offices and pharmacies) who must ensure that any service provider they do business with is compliant if there is any chance that “Protected Health Information” is involved.
In order to provide our cutting-edge log management and analytics platform to these businesses we need to assure them that Sumo Logic can be trusted to handle this highly sensitive information in a secure and compliant manner. To accomplish this, Sumo Logic has undergone an extensive examination by a well-respected Certified Public Accounting firm who determined that Sumo Logic’s information security program “incorporates the essential elements of the HIPAA final security rule, including but not limited to administrative, physical and technical safeguards.”
This report, (available to Sumo Logic customers and prospects under NDA) is easily digestible by the compliance office at any medical company and will demonstrate our best-in-class dedication to the security of our customers’ data. Our commitment to data security and privacy makes Sumo Logic the only cloud-based log management solution able to demonstrate the ability to operate in a HIPAA regulated environment (as well as the only cloud-based log management service to carry a SOC 2 attestation, the replacement for the venerable SAS70.)
And our compliance story is just beginning! We have several other very exciting initiatives on the way over the next 12 months which will continue to prove that our dedication to enterprise-grade information security practices sets us clearly apart from the rest.
Sumo Logic meets StatsD – DevOps tools Unite!
05.01.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager
Of all of the new tools spawned by the DevOps movement, I find Etsy’s open-source tool, statsd, the most interesting. The enterprise software market is being shaken to its foundation, and statsd is one of the tools providing the vibrations. Instead of relying on the more generic metrics provided by application performance management (APM) vendors, Etsy, and others like them, is delivering highly specific, and highly relevant metrics directly from their code with statsd. With just a few lines of code, developers can measure any part of their application they choose, in the way they choose. This is very similar to the freedom that developers gain with a proper log analysis tool – they can dump any data they want to a log and analyze it later. Freed from the issue of storage, and of the mechanics of log analysis, they can focus on using the data to enhance performance management, troubleshooting, business intelligence etc.
For current users of statsd, the question might be – why would I want to put this in Sumo Logic, as opposed to using a tool like graphite for dashboard purposes? First of all, Sumo Logic provides analytics that supplement basic statsd metrics very well. For example, if you are watching your error count skyrocket and your user performance plummet, your next step will usually be to look for specific applications errors and do root cause analysis, which is a perfect use case for Sumo Logic. Secondly, there is a lot of value of both having the statsd and Sumo Logic metrics in “single pane of glass”, where performance metrics can be viewed alongside more complex analytics. Finally, for current users of Sumo Logic, statsd is a simple way to push application performance data straight into Sumo Logic, without filling up log files or worrying about data volumes.
Background for Statsd
First a little background on StatsD. The basis for the project started at Flickr, and was expanded at Etsy. This is appropriate since John Allspaw and his team helped kick-start the DevOps movement at Flickr, before coming over to Etsy. From the technical perspective statsd is, in their own words:
A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services.
So, statsd modules forward clear-text metrics over UDP. StatsD supports a few different types of metrics, as well as analytics, but for the sake of simplicity, I will only cover two areas here: Counting and Timing. The counting metric sends the metric name, the amount to increment/decrement, and possibly the sampling interval:
counter.sample:1|c
The timing metric looks very similar, with a metric name and value:
timing.sample:320|ms
Generating the Metrics
To generate the data, I created a simple perl script using the statsd perl module Net::Statsd. I then created a Syslog Source on a Linux Collector over the standard port of 514. The Sumo Logic Syslog Source, essentially a listener for text over UDP, can receive the statsd message just fine. One caveat, though – since the statsd messages do not include a timestamp, Sumo Logic will assign the ingest time as the timestamp. This means that is essential that you set the timezone setting correctly. I tested this with thousands of events, and there were no issues. To make some interesting, and relevant, metrics I added extra logic to my perl script to create some patterns with the rand() function and some math:
use Net::Statsd;
# Configure where to send events
# That’s where your statsd daemon is listening.
$Net::Statsd::HOST = ‘localhost’; # Default
$Net::Statsd::PORT = 514; # Default
# Initial Values
$basepercent = 0.50;
$webTime = 50;
$appTime = 100;
$dbTime = 150;
$basecount = 5;
# Infinite loop
while(1) {
$basepercent = ($basepercent + (rand(100) + 50)/100)/2;
$webTime = $basepercent*($webTime + 50 + rand(750))/2;
$appTime = $basepercent*($appTime + 100 + rand(1000))/2;
$dbTime = $basepercent*($dbTime + 150 + rand(1200))/2;
Net::Statsd::timing(‘web.time’,$webTime);
Net::Statsd::timing(‘app.time’,$appTime);
Net::Statsd::timing(‘db.time’,$dbTime);
$k = 0;
$basecount = $basepercent*($basecount + rand(5))/2;
while($k < $basecount)
{
Net::Statsd::increment(‘site.logins’);
$k++;
}
sleep(5 + rand(10))
}
Making sense of the Metrics
Once the metrics were successfully being ingested into Sumo Logic, I needed to create some useful searches and Dashboard Monitors. With the statsd counter function, I simply wanted to extract the data, drop it into 1m buckets, and sum up the number of increments to the counter over each minute. The key-value structure of a statsd message can be easily parsed with our keyvalue operator. Basically, I just told Sumo Logic to look for a lower case key name with “.” in it [a-z\.]+ and a numerical value \d+. I only searched for “site.logins”, but you could use the statement to look for any number of different counters in the same dashboard.
_sourceCategory=*statsd*
| keyvalue regex “([a-z\.]+?):(\d+?)\|c” “site.logins” as logins
| timeslice by 1m
| sum(logins) by _timeslice
With the timing metrics, an average over each minute seems most relevant (though other functions like max, min, or standard deviations could be useful here). I pulled out all three timings together, by looking for key that looks like *.time – ?<tier>[a-z]+).time . Since I named my metrics web.time, app.time, and db.time, I was able to put each of the “tier” metrics on the same graph.
_sourceCategory=*statsd* AND time
| parse regex “(?<tier>[a-z]+).time:(?<test_time>\d+)\|ms”
| timeslice by 1m
| avg(test_time) by _timeslice, tier
| transpose row _timeslice column tier
As I ran each of these searches, I clicked the “Add to Dashboard” button on the far right to add them a newly created StatsD dashboard. I included a screenshot below (the tier metrics are on the left, and the counter is on the right):
Wrapping Up
You can see from this example how easy it is to analyze data in the statsd format. Once the data is in Sumo Logic, the sky is the limit to what you can do with it. There are other metrics and backend functions that Sumo Logic can support over the long term, but this simple integration provides the majority of functionality needed. Let us know you think, and sign up for a free account to try it out yourself.
Sending CloudPassage Halo Event Logs to Sumo Logic
04.23.2013 | Posted by CloudPassage: Cloud Security
The below is a guest post from CloudPassage.
Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions. For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful.
The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more.
The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur.
The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic.
Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script.
Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below).
When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed.
Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches:
Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.
Universal Collection of Machine Data
04.18.2013 | Posted by Sanjay Sarathy, CMO
Customers love flexibility, especially if that flexibility drives additional business value. In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation. There may be a variety of reasons why you don’t want or can’t have local collectors - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments. Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.
With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure. They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.
With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure. Sumo Logic provides that flexibility.
Dirty Haskell Phrasebook
04.05.2013 | Posted by Máté Kovács, Sumo Logic Intern
Whenever people ask me whether Hungarian is difficult to learn, I half-jokingly say that it can’t be too hard given that I had learned it by the time I turned three. Having said that, I must admit that learning a new language as a grown-up is a whole new ball game. Our struggle for efficiency is reflected in the way we learn languages: we focus on the most common patterns, and reuse what we know as often as possible.
Programming languages are no different. When I started at Sumo Logic just two months ago, I wanted to become fluent in Scala as quickly as possible. Having a soft spot for functional languages such as Haskell, a main factor in deciding to do an internship here was that we use Scala. I soon realized that a large subset of Haskell can easily be translated into Scala, which made the learning process a lot smoother so far.
You’ve probably guessed by now that this post is going to be a Scala phrasebook for Haskellers. I’m also hoping that it will give new insights to seasoned Scalaists, and spark the interest of programmers who are new to the functional paradigm. Here we go.
Basics
module Hello where main :: IO () main = do putStrLn "Hello, World!" |
object Hello {
def main(args: Array[String]): Unit = println("Hello, World!")
} |
While I believe that HelloWorld examples aren’t really useful, there are a few key points to make here.
The object keyword creates a singleton object with the given name and properties. Pretty much everything in Scala is an object, and has its place in the elaborate type hierarchy stemming from the root-type called Any. In other words, a set of types always has a common ancestor, which isn’t the case in Haskell. One consequence of this is that Scala’s ways of emulating heterogeneous collections are more coherent. For example, Haskell needs fairly involved machinery such as existential types to describe a list-type that can simultaneously hold elements of all types, which is simply Scala’s List[Any].
In Scala, every function (and value) needs an enclosing object or class. (In other words, every function is a method of some object.) Since object-orientation concepts don’t have direct analogues in Haskell, further examples will implicitly assume an enclosing object on the Scala side.
Haskell’s () type is Scala’s Unit, and its only value is called () just like in Haskell. Scala has no notion of purity, so functions might have side-effects without any warning signs. One particular case is easy to spot though: the sole purpose of a function with return type Unit is to exert side effects.
Values
answer :: Int answer = 42 |
lazy val answer: Int = 42 |
Evaluation in Haskell is non-strict by default, whereas Scala is strict. To get the equivalent of Haskell’s behavior in Scala, we need to use lazy values (see also lazy collections). In most cases however, this makes no difference. From now on, the lazy keyword will be dropped for clarity. Besides val, Scala also has var which is mutable, akin to IORef and STRef in Haskell.
Okay, let’s see values of some other types.
question :: [Char] question = "What's six by nine?" |
val question: String = "What's six by nine?" |
Can you guess what the type of the following value is?
judgement = (6*9 /= 42) |
val judgement = (6*9 != 42) |
Well, so can Haskell and Scala. Type inference makes it possible to omit type annotations. There are a few corner cases that get this mechanism confused, but a few well-placed type annotations will usually sort those out.
Data Structures
Lists and tuples are arguably the most ubiquitous data structures in Haskell.
In contrast with Haskell’s syntactic sugar for list literals, Scala’s notation seems fairly trivial, but in fact involves quite a bit of magic under the hood.
list :: [Int] list = [3, 5, 7] |
val list: List[Int] = List(3, 5, 7) |
Lists can also be constructed from a head-element and a tail-list.
smallPrimes = 2 : list |
val smallPrimes = 2 :: list |
As you can see, : and :: basically switched roles in the two languages. This list-builder operator, usually called cons, will come in handy when we want to pattern match on lists (see Control Structures and Scoping below for pattern matching).
Common accessors and operations have the same name, but they are methods of the List class in Scala.
head list |
list.head |
tail list |
list.tail |
map func list |
list.map(func) |
zip list_1 list_2 |
list_1.zip(list_2) |
If you need to rely on the non-strict evaluation semantics of Haskell lists, use Stream in Scala.
Tuples are virtually identical in the two languages.
tuple :: ([Char], Int) tuple = (question, answer) |
val tuple: (String, Int) = (question, answer) |
Again, there are minor differences in Scala’s accessor syntax due to object-orientation.
fst tuple |
tuple._1 |
snd tuple |
tuple._2 |
Another widely-used parametric data type is Maybe, which can represent values that might be absent. Its equivalent is Option in Scala.
singer :: Maybe [Char] singer = Just "Carly Rae Jepsen" |
val singer: Option[String] = Some("Carly Rae Jepsen")
|
song :: Maybe [Char] song = Nothing |
val song: Option[String] = None |
Algebraic data types translate to case classes.
data Tree = Leaf | Branch [Tree] deriving (Eq, Show) |
sealed abstract class Tree case class Leaf extends Tree case class Branch(kids: List[Tree]) extends Tree |
Just like their counterparts, case classes can be used in pattern matching (see Control Structures and Scoping below), and there’s no need for the new keyword at instantiation. We also get structural equality check and conversion to string for free, in the form of the equals and toString methods, respectively.
The sealed keyword prevents anything outside this source file from subclassing Tree, just to make sure exhaustive pattern lists don’t become undone.
See also extractor objects for a generalization of case classes.
Functions
increment :: Int -> Int increment x = x + 1 |
def increment(x: Int): Int = x + 1 |
If you’re coming from a Haskell background, you’re probably not surprised that the function body is a single expression. For a way to create more complex functions, see let-expressions in Control Structures and Scoping below.
three = increment 2 |
val three = increment(2) |
Most of the expressive power of functional languages stems from the fact that functions are values themselves, which leads to increased flexibility in reusing algorithms.
Composition is probably the simplest form of combining functions.
incrementTwice = increment . increment |
val incrementTwice = (increment: Int => Int).compose(increment) |
Currying, Partial Application, and Function Literals
Leveraging the idea that functions are values, Haskell chooses to have only unary functions and emulate higher arities by returning functions, in a technique called currying. If you think that isn’t a serious name, you’re welcome to call it schönfinkeling instead.
Here’s how to write curried functions.
addCurry :: Int -> Int -> Int addCurry x y = x + y |
def addCurry(x: Int)(y: Int): Int = x + y |
five = addCurry 2 3 |
val five = addCurry(2)(3) |
The rationale behind currying is that it makes certain cases of partial application very succinct.
addSix :: Int -> Int addSix = addCurry 6 |
val addSix: Int => Int = addCurry(6) |
val addSix = addCurry(6) : (Int => Int) |
|
val addSix = addCurry(6)(_) |
The type annotation is needed to let Scala know that you didn’t forget an argument but really meant partial application. If you want to drop the type annotation, use the underscore placeholder syntax.
To contrast with curried ones, functions that take many arguments at once are said to be uncurried. Scalaists seem to prefer their functions less spicy by default, most likely to save parentheses.
addUncurry :: (Int, Int) -> Int addUncurry (x, y) = x + y |
def addUncurry(x: Int, y: Int): Int = x + y |
seven = addUncurry (2, 5) |
val seven = addUncurry(2, 5) |
Uncurried functions can still be partially applied with ease in Scala, thanks to underscore placeholder notation.
addALot :: Int -> Int addALot = \x -> addUncurry (x, 42) |
val addALot: Int => Int = addUncurry(_, 42) |
val addALot = addUncurry(_: Int, 42) |
When functions are values, it makes sense to have function literals, a.k.a. anonymous functions.
(brackets :: Int -> [Char]) = \x -> "<" ++ show x ++ ">" |
val brackets: Int => String = x => "<%s>".format(x) |
brackets = \(x :: Int) -> "<" ++ show x ++ ">" |
val brackets = (x: Int) => "<%s>".format(x) |
Infix Notation
In Haskell, any function whose name contains only certain operator characters will take its first argument from the left side when applied, which is infix notation if it has two arguments. Alphanumeric function names surrounded by backticks also behave that way. In Scala, any single-argument function can be used as an infix operator by omitting the dot and parentheses from the function call syntax.
data C = C [Char] bowtie (C s) t = s ++ " " ++ t (|><|) = bowtie |
case class C(s: String) {
def bowtie(t: String): String = s + " " + t val |><| = bowtie(_) } |
(C "James") |><| "Bond" |
C("James") |><| "Bond"
|
(C "James") `bowtie` "Bond" |
C("James") bowtie "Bond"
|
Haskell’s sections provide a way to create function literals from partially applied infix operators. They can then be translated to Scala using placeholder notation.
tenTimes = (10*) |
val tenTimes = 10 * (_: Int) |
Again, the type annotation is necessary so that Scala knows you meant what you wrote.
Higher-order Functions and Comprehensions
Higher order functions are functions that have arguments which are functions themselves. Along with function literals, they can be used to express complex ideas in a very compact manner. One example is operations on lists (and other collections in Scala).
map (3*) (filter (<5) list) |
list.filter(_ < 5).map(3 * _) |
That particular combination of map and filter can also be written as a list comprehension.
[3 * x | x <- list, x < 5] |
for(x <- list if x < 5) yield (3 * x) |
Control Structures and Scoping
Pattern matching is a form of control transfer in functional languages.
countNodes :: Tree -> Int countNodes t = case t of Leaf -> 1 (Branch kids) -> 1 + sum (map countNodes kids) |
def countNodes(t: Tree): Int = t match {
case Leaf() => 1 case Branch(kids) => 1 + kids.map(countNodes).sum } |
For a definition of Tree, see the Data Structures section above.
Even though they could be written as pattern matching, if-expressions are also supported for increased readability.
if condition then expr_0 else expr_1 |
if (condition) expr_0 else expr_1 |
Let expressions are indispensable in organizing complex expressions.
result = let v_0 = bind_0 v_1 = bind_1 -- ... v_n = bind_n in expr |
val result = {
val v_0 = bind_0 val v_1 = bind_1 // ... val v_n = bind_n expr } |
A code block evaluates to its final expression if the control flow reaches that point. Curly brackets are mandatory; Scala isn’t indentation-sensitive.
Parametric Polymorphism
I’ve been using parametric types all over the place, so it’s time I said a few words about them. It’s safe to think of them as type-level functions that take types as arguments and return types. They are evaluated at compile time.
[a] |
List[A] |
(a, b) |
(A, B) // desugars to Tuple2[A, B] |
Maybe a |
Option[A] |
a -> b |
A => B // desugars to Function1[A, B] |
a -> b -> c |
A => B => C // desugars to Function2[A, B, C] |
Type variables in Haskell are required to be lowercase, whereas they’re usually uppercase in Scala, but this is only a convention.
In this context, Haskell’s type classes loosely correspond to Scala’s traits, but that’s a topic for another time. Stay tuned.
Comments
-- single-line comment |
// single-line comment |
{-
Feel free to suggest additions and corrections to the phrasebook in the comments section below. :] -} |
/* Feel free to suggest additions and corrections to the phrasebook in the comments section below. :] */ |
Here Be Dragons
Please keep in mind that this phrasebook is no substitute for the real thing; you will be able to write Scala code, but you won’t be able to read everything. Relying on it too much will inevitably yield some unexpected results. Don’t be afraid of being wrong and standing corrected, though. As far as we know, the only path to a truly deep understanding is the way children learn: by poking around, breaking things, and having fun.
Harder, Better, Faster, Stronger – Machine Data Analytics and DevOps
03.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager
Work It Harder, Make It Better
Do It Faster, Makes Us Stronger
More Than Ever Hour After
Our Work Is Never Over
Daft Punk – “Harder, Better, Faster, Stronger”
When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in “Stronger”). So often, I find the “spirit” of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps “essence” – programmable architecture, agile development, and lean methodology – in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps “team”, and the continuous improvement aspects of lean and agile methodology.
So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps? In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach.
1. Let the data speak for itself
Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns – both good and ill – that would escape the inflexible tools of the past.
2. Continuous reinterpretation
Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can’t wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now.
3. Any metric you want, any time you want it
The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers’ experience and the operation of your applications is truly groundbreaking.
4. Set the data free
Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy.
5. Develop DevOps applications, not DevOps tools
The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at – developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders.
The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace – and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.
Finding Needles in the the Machine Data Haystack – LogReduce in the Wild
03.19.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager
As with any new, innovative feature in a product, it is one thing to say it is helpful for customers – it is quite another to see it in action in the wild. Case in point, I had a great discussion with a customer about using LogReduce™ in their environment. LogReduce is a groundbreaking tool for uncovering the unknown in machine data, and sifting through the inevitable noise in the sea of log data our customers put in Sumo Logic. The customer in question had some great use cases for LogReduce that I would like to share.
Daily Summaries
With massive amounts of log data flowing through modern data centers, it is very difficult to get a bird’s eye view of what is happening. More importantly, the kind of summary that provides actionable data about the day’s events is elusive at best. In our customer example, they have been using LogReduce to provide exactly that type of daily, high-level overview of the previous day’s log data. How does it work? Instead of using obvious characteristics to group log data like the source (e.g. Window’s Events) or host (e.g. server01 in data center A), LogReduce uses “fuzzy logic” to look for patterns across all of your machine data at once – letting the data itself dictate the summary. Log data with the same patterns, or signatures, are grouped together – meaning that new patterns in the data will immediately stand out, and the noise will be condensed to a manageable level.
Our customer is also able to supply context to the LogReduce results – adjusting and extending signatures, and adjusting relevance as necessary. In particular, by adjusting the signatures that LogReduce finds, the customer is to “teach” LogReduce to provide the best results in the most relevant way. This allows them to separate the critical errors out, while still acknowledging the background noise of known messages. The end-result is a daily summary that is both more relevant because of the user-supplied, business context as well as being flexible enough to find important, new patterns.
Discovering the Unknown
And finding those new patterns is the essential essence of Big Data analytics. A machine-data analytics tool should be able to find unknown patterns, not simply reinforce the well-known ones. In this use case, our customer already has alerting established for known, critical errors. The LogReduce summary provides a way to identify, and proactively address, new, unknown errors. In particular, by using LogReduce’s baseline and compare functionality, Sumo Logic customers can establish a known state for log data and then easily identify anomalies by comparing the current state to the known, baselined state.
In summary, LogReduce provides the essence of Big Machine Data analytics to our customers – reducing the the constant noise of today’s datacenter, while finding those needles in the proverbial haystack. This is good news for customers who want to leverage the true value of their machine data without the huge investments in the time and expertise required in the past.
Show Me the VPN Logs!!!
03.07.2013 | Posted by Praveen Rangnath, Product Marketing
Show Me the Money!!! Show Me the VPN Logs!!!
Move over Tesla automobile logs, it’s time for Yahoo VPN logs to get their moment in the sun!
Just as soon as log data dropped out of the headlines they came right back, as Yahoo CEO Marissa Mayer announced a ban on telecommuting – with the decision reportedly driven by analysis of the company’s VPN log data.
From the VPN data, it’s said that the Yahoo CEO determined too many remote workers were not pulling their weight, as evidenced by their lack of connecting to the VPN and accessing Yahoo’s IT systems. Certainly, VPN logs don’t tell the entire story around telecommuter productivity, but they are an important data point, and the information contained in those logs certainly was compelling for Ms. Mayer.
There is of course a bigger picture to this, and it starts with the fact that this is not the first time VPN logs are in the news. (Not even the first time this year!). See this blog post from the Verizon RISK team, where they helped their client identify a developer who took global wage arbitrage to an extreme; he collected his six-figure paycheck in the USA and then outsourced his own job to a Chinese consulting firm, paying that firm a fraction of his salary to do his job for him!
How did he do this? Simple: He FedEx’d his RSA token to China. How did he get caught? Simple: They found him sitting in his office while the VPN logs showed him in China.
Busted.
All thanks to the logs.
At the highest level, what do the Tesla, Yahoo, and wage arbitrage stories tell us? Simply put, log data is immensely valuable, it’s increasingly becoming front and center, and it’s not going away anytime soon.
We at Sumo Logic couldn’t be happier, as this is further public recognition of the value hidden in machine data (the biggest component of which is log data). We’ve said it many times, log data holds the absolute and authoritative record of all the events that occurred. That’s true for automobile logs, server logs, application logs, device logs, and yes Mr. Developer who outsourced his job to China… VPN logs.
The Marriage of Machine Data and Customer Service
03.06.2013 | Posted by Sanjay Sarathy, CMO
Last week we announced how Atchik uses Sumo Logic and our ability to easily analyze machine data to reshape its customer service function. In fact, there are a variety of ways in which customer service organizations can become best friends with your log management infrastructure to improve your customers’ perception of your product or service. Specifically, companies can use a log management service to:
- Pinpoint exactly what the customer did during the course of a transaction or interaction with an application or service, as opposed to relying purely on email threads or phone logs. This root cause analysis can help in understanding bottlenecks that the customer complained about and, just as importantly, provide guidance to the development team on how customers are using the product or service. Actually it’s a great reason for the app development teams to use the service as well, but that’s the subject of another post.
- Easily correlate that application activity with the impact on other infrastructure elements that affect the consumer experience. Unfortunately, many companies today only focus on a single application view of the customer experience when, given how integrated applications and services are today, it’s critical to get a full picture of all the different ways in which the customer is affected.
- Proactively address potential customer-facing issues *before* they hit by receiving real-time alerts when application anomalies are diagnosed by the log management solution
- Create customer dashboards and reports that provide real-time insights into the customer activity you care most about tracking
We use Sumo Logic internally to support every function in the organization from application development to QA to customer service and even marketing. Our co-founder and VP of Engineering, Kumar Saurabh, is hosting a webinar on March 26th to talk about “Sumo and Sumo”. We invite you to attend.
Using the transpose operator
02.19.2013 | Posted by Yan Qiao, Software Engineer
Sumo Logic lets you access your logs through a powerful query language. In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators. There are currently about two dozen operators available and we are constantly adding new ones. In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.
Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:
2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 219.142.249.227 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449
There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds. We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed. One way to do that is to build a histogram of the response time using the following query:
stocktrade | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time
Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number. The result looks like:
Now, it would also be interesting to see how the distribution changes over time. That is easy with the timeslice operator:
stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time
and the result looks like the following:
This gets the data we want, but it is not presented in a format that is easy to digest. For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc. Wouldn’t it be nice if we could rearrange the data into the following table?
That is exactly what transpose does:
stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time
Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.
This is especially useful when the data is visualized. The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:
The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.
That’s it! To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!
@ernestmueller Thanks Ernest! Appreciate the referral!
16 May | SumoLogic





