Blog › Using Sumo
04.29.2014 | Posted by Sanjay Sarathy, CMO
Today’s reality is that companies have to deal with disjointed systems when it comes to detecting, investigating and remediating issues in their infrastructure. Compound that with the exponential growth of machine data and you have a recipe for frustrated IT and security teams who are tasked with uncovering insights from this data exhaust and then remediating issues as appropriate. Customer dissatisfaction, at-risk SLAs and even revenue misses are invariable consequences of this fragmented approach.
With our announcement today of a certified integration with ServiceNow, companies now have a closed loop system that makes it much easier for organizations to uncover known and unknown events in Sumo Logic and then immediately create alerts and incidents in ServiceNow. The bi-directional integration supports the ability for companies to streamline the entire change management process, capture current and future knowledge, and lay the groundwork for integrated event management capabilities. This integration takes advantage of all the Sumo Logic analytics capabilities, including LogReduce and Anomaly Detection, to identify what’s happening in your enterprise, even if you never had rules to detect issues in the first place.
The cloud-to-cloud integration of ServiceNow and Sumo Logic also boosts productivity by eliminating the whole concept of downloading, installing and managing software. Furthermore, IT organizations also have the ability to elastically scale their data analytics needs to meet the service management requirements of the modern enterprise.
Let us know if you’re interested in seeing our integration with ServiceNow. And while you’re at it, feel free to register for Sumo Logic Free. It’s a zero price way to understand how our machine data analytics service works.
PS – check out our new web page which provides highlights of recent capabilities and features that we’ve launched.
02.23.2014 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy
Security is a tricky thing and it means different things to different people. It is truly in the eye of the beholder. There is the checkbox kind, there is the “real” kind, there is the checkbox kind that holds up, and there is the “real” kind that is circumvented, and so on. Don’t kid yourself: the “absolute” kind does not exist.
I want to talk about security solutions based on log data. This is the kind of security that kicks in after the perimeter security (firewalls), intrusion detection (IDS/IPS), vulnerability scanners, and dozens of other security technologies have done their thing. It ties all of these technologies together, correlates their events, reduces false positives and enables forensic investigation. Sometimes this technology is called Log Management and/or Security Information and Event Management (SIEM). I used to build these technologies years ago, but it seems like decades ago.
A typical SIEM product is a hunking appliance, sharp edges, screaming colors – the kind of design that instills confidence and says “Don’t come close, I WILL SHRED YOU! GRRRRRRRRRR”.
Ahhhh, SIEM, makes you feel safe doesn’t it. It should not. I proclaim this at the risk at being yet another one of those guys who wants to rag on SIEM, but I built one, and beat many, so I feel I’ve got some ragging rights. So, what’s wrong with SIEM? Where does it fall apart?
SIEM does not scale
It is hard enough to capture a terabyte of daily logs (40,000 Events Per Second, 3 Billion Events per Day) and store them. It is couple of orders of magnitude harder to run correlation in real time and alert when something bad happens. SIEM tools are extraordinarily difficult to run at scales above 100GB of data per day. This is because they are designed to scale by adding more CPU, memory, and fast spindles to the same box. The exponential growth of data over the two decades when those SIEM tools were designed has outpaced the ability to add CPU, memory, and fast spindles into the box.
Result: Data growth outpaces capacity → Data dropped from collection → Significant data dropped from correlation → Gap in analysis → Serious gap in security
SIEM normalization can’t keep pace
SIEM tools depend on normalization (shoehorning) of all data into one common schema so that you can write queries across all events. That worked fifteen years ago when sources were few. These days sources and infrastructure types are expanding like never before. One enterprise might have multiple vendors and versions of network gear, many versions of operating systems, open source technologies, workloads running in infrastructure as a service (IaaS), and many custom written applications. Writing normalizers to keep pace with changing log formats is not possible.
Result: Too many data types and versions → Falling behind on adding new sources → Reduced source support → Gaps in analysis → Serious gaps in security
SIEM is rule-only based
This is a tough one. Rules are useful, even required, but not sufficient. Rules only catch the thing you express in them, the things you know to look for. To be secure, you must be ahead of new threats. A million monkeys writing rules in real-time: not possible.
Result: Your rules are stale → You hire a million monkeys → Monkeys eat all your bananas → You analyze only a subset of relevant events → Serious gap in security
SIEM is too complex
It is way too hard to run these things. I’ve had too many meetings and discussions with my former customers on how to keep the damned things running and too few meetings on how to get value out of the fancy features we provided. In reality most customers get to use the 20% of features because the rest of the stuff is not reachable. It is like putting your best tools on the shelf just out of reach. You can see them, you could do oh so much with them, but you can’t really use them because they are out of reach.
Result: You spend a lot of money → Your team spends a lot of time running SIEM → They don’t succeed on leveraging the cool capabilities → Value is low → Gaps in analysis → Serious gaps in security
So, what is an honest, forward-looking security professional who does not want to duct tape a solution to do? What you need is what we just started: Sumo Logic Enterprise Security Analytics. No, it is not absolute security, it is not checkbox security, but it is a more real security because it:
Processes terabytes of your data per day in real time. Evaluates rules regardless of data volume and does not restrict what you collect or analyze. Furthermore, no SIEM style normalization, just add data, a pinch of savvy, a tablespoon of massively parallel compute, and voila.
Result: you add all relevant data → you analyze it all → you get better security
It is SaaS, there are no appliances, there are no servers, there is no storage, there is just a browser connected to an elastic cloud.
Result: you don’t have to spend time on running it → you spend time on using it → you get more value → better analysis → better security
Rules, check. What about that other unknown stuff? Answer: machine that learns from data. It detects patterns without human input. It then figures out baselines and normal behavior across sources. In real-time it compares new data to the baseline and notifies you when things are sideways. Even if “things” are things you’ve NEVER even thought about and NOBODY in the universe has EVER written a single rule to detect. Sumo Logic detects those too.
Result: Skynet … nah, benevolent overlord, nah, not yet anyway. New stuff happens → machines go to work → machines notify you → you provide feedback → machines learn and get smarter → bad things are detected → better security
Read more: Sumo Logic Enterprise Security Analytics
09.10.2013 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy
What is “anomaly detection”?
Here is how the peeps on the interweb and wikipedia define it: Anomaly detection (also known as outlier detection) is the search for events which do not conform to an expected pattern. The detected patterns are called anomalies and often translate to critical and actionable insights that, depending on the application domain, are referred to as outliers, changes, deviations, surprises, intrusions, etc.
The domain: Machine Data
Machine data (most frequently referred to as log data) is generated by applications, servers , infrastructure, mobile devices, web servers, and more. It is the data generated by machines in order to communicate to humans or other machines exactly what they are doing (e.g. activity), what the status of that activity is (e.g. errors, security issues, performance), and results of their activity (e.g. business metrics).
The problem of unknown unknowns
Most problems with analyzing machine data orbit around the fact that existing operational analytics technologies enable users to find only those things they know to look for. I repeat, only things they KNOW they need to look for. Nothing in these technologies helps users proactively discover events they don’t anticipate getting, events that have not occurred before, events that may have occurred before but are not understood, or complex events that are not easy or even possible to encode into queries and searches.
Our infrastructure and applications are desperately, and constantly, trying to tell us what’s going on through the massive real-time stream of data they relentlessly throw our way. And instead of listening, we ask a limited set of questions from some playbook. This is as effective as a patient seeking advice about massive chest pain from a doctor who, instead of listening, runs through a checklist containing skin rash, fever, and runny nose, and then sends the patient home with a clean bill of health.
This is not a good place to be; these previously unknown events hurt us by repeatedly causing downtime, performance degradations, poor user experience, security breaches, compliance violations, and more. Existing monitoring tools would be sufficient if we lived in static, three system environments where we can enumerate all possible failure conditions and attack vectors. But we don’t.
We operate in environments where we have thousands of sources across servers, networks, and applications and the amount of data they generate is growing exponentially. They come from a variety of vendors, run a variety of versions, are geographically distributed, and on top of that, they are constantly updated, upgraded, and replaced. How can we then rely on hard-coded rules and queries and known condition tools to ensure our applications and infrastructure is healthy and secure? We can’t – it is a fairy tale.
We believe that three major things are required in order to solve the problem of unknown unknowns at a multi-terabyte scale:
Cloud: enables an elastic compute at the massive scale needed to analyze this scale of data in real-time across all vectors
Big Data technologies: enable a holistic approach to analyzing all data without being bound to schemas, volumes, or batch analytics
Machine learning engine: advanced algorithms that analyze and learn from data as well as humans in order to get smarter over time
Sumo Logic Real-Time Anomaly Detection
Today we have announced Beta access to our Anomaly Detection engine, an engine that uses thousands of machines in the cloud and continuously and in real-time analyzes ALL of your data to proactively detect important changes and events in your infrastructure. It does this without requiring users to configure or tune the engine, to write queries or rules, to set thresholds, or to write and apply data parsers. As it detects changes and events, it bubbles them up to the users for investigation, to add knowledge, classify events, and to apply relevance and severity. It is in fact this combination of a powerful machine learning algorithm and human expert knowledge that is the real power of our Anomaly Detection engine.
So, in essence, Sumo Logic Anomaly Detection continuously turns unknown events into known events. And that’s what we want: to make events known, because we know how to handle and what to do with known events. We can alert on them, we can create playbooks and remediation steps, we can prevent them, we can anticipate their impact, and, at least in some cases, we can make them someone else’s problem.
Sumo Logic Anomaly Detection has been more than three years in the making. During that time, it has had the energy of the whole company and our backers behind it. Sumo Logic was founded with the belief that this capability is transformational in the face of exponential data growth and infrastructure sprawl. We developed architecture and adopted a business model that enable us to implement an analytics engine that can solve the most complex problems of the Big Data decade.
06.12.2013 | Posted by Jacek Migdal
As human beings, we share quite a few life events that we keep track of, like birthdays, holidays, anniversaries, and so on. These are structured events that occur on exact dates or during specific times of year.
But how do you keep track of the unique, unexpected events that can be life-changing? The first meeting with someone, an inspiring conversation that sparked a realization—events that may seem common to many, but are so special to you.
Computer systems offer the same dilemma. Some events are expected, like adding a new user. Other events look routine, but from time to time they carry crucial, unexpected information. Unfortunately we most often realize how important pivotal events were after we experience a malfunction.
That’s where logs come in.
Virtually every computer program has some append-only structure for logs. Usually, it is as simple as a text file with a new line for each event. Sometimes the messages are saved to a database if the information may be used later. Why does it work that way? Well, it’s very easy to use and implement–usually it’s just one line of code. Don’t let the simplicity fool you. Logs provide a very powerful way of understanding and debugging systems. In many cases, logs are the sole method of figuring out the reason why something has happened.
From time to time, I’ll read about a new log management tool that converts log data into some standardized format. Well, there is limited value in that approach. Extracting data from logs is useful and could answer many business and operational questions. This works well with things that we expect, and things that answer numerical questions, like determining how many users have signed up in a given period of time.
However, during the process of converting logs to a standardized format, valuable data could be lost. For example, it’s interesting that many users couldn’t log in to your service, but the crucial information is why it happened. The unexpected part is usually very important and often even more valuable.
So do logs have a schema? Well, for the expected things, sure. But for analyzing the unexpected events it’s hard to think of a schema at all, beyond perhaps some partial structure.
That’s why at Sumo Logic, we accept any kind of log you throw at us. During log collection we just need to understand the events (e.g. separate lines) and the timestamp format. Everything else can be derived when you run a query.
Our query language lets you to find or extract structure, and data can be visualized and/or exported. Sumo Logic’s key advantage is how we handle the unexpected with machine learning algorithms. Our patent-pending LogReduce groups similar events on the fly to find anomalies, enabling our customers to review large sets of events quickly to identify the root cause of unexpected things.
No one ever intends to create bugs, but with the complexity and fast pace of software development they are inevitable. Well-designed systems should be debuggable. Log management tools, such as Sumo Logic, are here to help you deal with the logs that are a huge part of today’s technology.
“These days are only important, which are still unknown to us
These several moments are important, these for which we still wait”
(lyrics from famous Polish song by Marek Grechuta)
05.29.2013 | Posted by Amanda Saso, Sr. Tech Writer
Have you ever put your cell phone through the wash? Personally, I’ve done it. Twice. What did I learn, finally? To always double-check where I put my iPhone before I turn on the washing machine. It’s a very real and painful threat that I’ve learned to proactively manage by using a process with a low rate of failure. But, from time to time, other foreign objects slip through, like a lipstick, my kids’s crayon, a blob of Silly Putty—things that are cheaper than an iPhone yet create havoc in the dryer. Clothes are stained, the dryer drum is a mess, and my schedule is thrown completely off while I try to remember my grandmother’s instructions for removing red lipstick from a white shirt.
What do low-tech laundry woes have to do with Sumo Logic’s big data solution? Well, I see LogReduce as a tool that helps fortify your organization against known problems (for which you have processes in place) while guarding against unknown threats that may cause huge headaches and massive clean-ups.
When you think about it, a small but messy threat that you don’t know you need to look for is a nightmare. These days we’re dealing with an unbelievable quantity of machine data that may not be human-readable, meaning that a proverbial Chap Stick in the pocket could be lurking right below your nose. LogReduce takes the “noise” out of that data so you can see those hidden threats, problems, or issues that could otherwise take a lot of time to resolve.
Say you’re running a generic search for a broad area of your deployment, say billing errors, user creations, or log ins. Whatever the search may be, it returns thousands and thousands of pages of results. So, you could take your work day slogging through messages, hoping to find the real problem, or you can simply click Log Reduce. Those results are logically sorted into signatures–groups of messages that contain similar or relevant information. Then, you can teach Sumo Logic what messages are more important, and what data you just don’t need to see again. That translates into unknown problems averted.
Of course your team has processes in place to prevent certain events. How do you guard against the unknown? LogReduce can help you catch a blip before it turns into a rogue wave. Oh, and if you ever put Silly Putty through the washer and dryer, a good dose of Goo Gone will do the trick.
04.23.2013 | Posted by CloudPassage: Cloud Security
The below is a guest post from CloudPassage.
Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions. For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful.
The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more.
The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur.
The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic.
Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script.
Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below).
When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed.
Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches:
Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.
04.18.2013 | Posted by Sanjay Sarathy, CMO
Customers love flexibility, especially if that flexibility drives additional business value. In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation. There may be a variety of reasons why you don’t want or can’t have local collectors - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments. Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.
With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure. They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.
With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure. Sumo Logic provides that flexibility.
03.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager
Do It Faster, Makes Us Stronger
More Than Ever Hour After
Our Work Is Never Over
Daft Punk – “Harder, Better, Faster, Stronger”
When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in “Stronger”). So often, I find the “spirit” of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps “essence” – programmable architecture, agile development, and lean methodology – in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps “team”, and the continuous improvement aspects of lean and agile methodology.
So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps? In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach.
1. Let the data speak for itself
Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns – both good and ill – that would escape the inflexible tools of the past.
2. Continuous reinterpretation
Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can’t wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now.
3. Any metric you want, any time you want it
The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers’ experience and the operation of your applications is truly groundbreaking.
4. Set the data free
Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy.
5. Develop DevOps applications, not DevOps tools
The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at – developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders.
The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace – and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.
03.06.2013 | Posted by Sanjay Sarathy, CMO
Last week we announced how Atchik uses Sumo Logic and our ability to easily analyze machine data to reshape its customer service function. In fact, there are a variety of ways in which customer service organizations can become best friends with your log management infrastructure to improve your customers’ perception of your product or service. Specifically, companies can use a log management service to:
- Pinpoint exactly what the customer did during the course of a transaction or interaction with an application or service, as opposed to relying purely on email threads or phone logs. This root cause analysis can help in understanding bottlenecks that the customer complained about and, just as importantly, provide guidance to the development team on how customers are using the product or service. Actually it’s a great reason for the app development teams to use the service as well, but that’s the subject of another post.
- Easily correlate that application activity with the impact on other infrastructure elements that affect the consumer experience. Unfortunately, many companies today only focus on a single application view of the customer experience when, given how integrated applications and services are today, it’s critical to get a full picture of all the different ways in which the customer is affected.
- Proactively address potential customer-facing issues *before* they hit by receiving real-time alerts when application anomalies are diagnosed by the log management solution
- Create customer dashboards and reports that provide real-time insights into the customer activity you care most about tracking
We use Sumo Logic internally to support every function in the organization from application development to QA to customer service and even marketing. Our co-founder and VP of Engineering, Kumar Saurabh, is hosting a webinar on March 26th to talk about “Sumo and Sumo”. We invite you to attend.
02.19.2013 | Posted by Yan Qiao, Software Engineer
Sumo Logic lets you access your logs through a powerful query language. In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators. There are currently about two dozen operators available and we are constantly adding new ones. In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.
Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:
2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 126.96.36.199 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449
There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds. We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed. One way to do that is to build a histogram of the response time using the following query:
stocktrade | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time
Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number. The result looks like:
Now, it would also be interesting to see how the distribution changes over time. That is easy with the timeslice operator:
stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time
and the result looks like the following:
This gets the data we want, but it is not presented in a format that is easy to digest. For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc. Wouldn’t it be nice if we could rearrange the data into the following table?
That is exactly what transpose does:
stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time
Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.
This is especially useful when the data is visualized. The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:
The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.
That’s it! To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!