Blog › Using Sumo

Caleb Sotelo

How To Add Sumo Logic To A Dockerized App

08.27.2014 | Posted by Caleb Sotelo

This is a guest post from Caleb Sotelo who is a software engineer at OpenX and has been reprinted with his permission.  You can view the original here.  

kuniyoshi_utagawa_the_sumo_wrestler

Sumo Logic is a nifty service that makes it easy watch and analyze your apps’ logs. You install a collector local to the app in question, point it at some files to watch, and your logs are sent to the cloud, where they can be sliced and diced from a nice web UI.

At a high level, Sumo Logic wants to transform big data logs into operational information. But at a practical level, it’s nice not to have to SSH to a production machine and tail or grep enormous files to take your app’s pulse. Not to mention the idea is totally in line with one of the Twelve Factors: treating logs as event streams enables better understanding of an app’s behavior over time.

I’ve had success using SumoLogic at OpenX, so I wanted to give their free tier a shot for a personal project. The only limitations are a 500MB of data per day limit and 7 days of retention. I was surprised not to find anything on the web for installing Sumo Logic alongside a Dockerized app, and I had a couple of Docker-based candidates. So without further ado, here’s how to add Sumo Logic to a Dockerized app:

1. Sign up for Sumo Logic Free

Head over to sumologic.com/signup to sign up. The only catch here is that you’ll need a company email address. For the project I’m going to use SumoLogic for, I own and manage my own domain, so it wasn’t too much trouble to create an email address using my registrar’s mail service. Since I host the domain separately, I did have to add an MX record to my zone file to point to the registrar’s mail server. For example, with DigitalOcean.

2. Download a Collector

Once you confirm your email address and log in, you be stepped through a process for downloading and installing a collector. I chose the Installed Collector and downloaded sumocollector_19.91-2_amd64.deb, becuase my Docker image is based on Ubuntu.

sumo_logic_choose_collector_screenshot

After downloading the collector, the setup wizard proceeds to a screen that spins until it detects a newly installed collector. I didn’t yet know how I was going to install it, and I got logged out of Sumo Logic anyway due to inactivity, so I abandoned the wizard at that point. The Sumo Logic UI changed itself as soon as it detected that my first collector had been installed.

  • As I plan to install the Sumo Logic collector during the docker build process, I uploaded the .deb file to a Dropbox and grabbed the public link to use later.

3. Create Access Keys

When a collector client is installed it has to have some way of authenticating to the Sumo Logic server. The docs for creating a sumo.conf file (we’ll get there soon) offer two choices: (1) provide your Sumo Logic email and password, or (2) provide access keys generated from the UI. The latter is recommended if only to avoid storing a username/password in plaintext. Keys can be generated from ManageCollectorsAccess KeysCreate.

4. Augment your Docker Container

Here’s the Docker-specific part of installing Sumo Logic. We’ll add some lines to our app’s Dockerfile and author two files that are ADDed to the container during a docker build. I assume working knowledge of Docker, but here is the list of Dockerfile commands for good measure.

4.1 Create sumo.conf

First create a sumo.conf file like the following:

name={collector_name}  
accessid={your_access_id}  
accesskey={your_access_key}  

 

where name is an arbitrary name for this collector, and accessid and accesskey are those generated in step 3. There are many more conf options specified here but the important ones, namely sources, can actually be configured through the UI later on.

By convention I put Docker-specific files into .docker/{resource}, so this one goes to .docker/sumo/sumo.conf. It’ll be referenced in our Dockerfile shortly.

4.2 Modify your Dockerfile

Add a block like the following to your Dockerfile (assumed to live in the root of your app’s code), preferably before your actual app is added:

# install sumologic
RUN apt-get -qq update  
RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  
RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  
ADD .docker/sumo/sumo.conf /etc/sumo.conf  
ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

 

Let’s break this down:

RUN apt-get -qq update  

Update sources. This may not be necessary, but I like to put this before each dependancy installed by my Dockerfile to avoid issues with image caching.

RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  

We’ll use wget to grab the collector file we uploaded in step 2. You may opt to ADD the file locally, but this option avoids having to check the resource into your app’s source code, while housing it in a consistent location. Better practice would be to store it in some kind of artifact repository and version it.

RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  

Install the debian package and clean up.

ADD .docker/sumo/sumo.conf /etc/sumo.conf  

Copy the newly created sumo.conf file to the place where the collector expects to find it.

Before we get to the last line, let’s pause. If you were able to catch the output from installing the collector, you saw something like:

Preparing to unpack sumocollector_19.91-2_amd64.deb ...
Unpacking sumocollector (1:19.91-2) ...
Setting up sumocollector (1:19.91-2) ...
configuring collector....
configuring collector to run as root
Detected Ubuntu:
Installing the SumoLogic Collector daemon using init.d..
 Adding system startup for /etc/init.d/collector ...
   /etc/rc0.d/K20collector -> ../init.d/collector
   /etc/rc1.d/K20collector -> ../init.d/collector
   /etc/rc6.d/K20collector -> ../init.d/collector
   /etc/rc2.d/S20collector -> ../init.d/collector
   /etc/rc3.d/S20collector -> ../init.d/collector
   /etc/rc4.d/S20collector -> ../init.d/collector
   /etc/rc5.d/S20collector -> ../init.d/collector
Collector has been successfully installed. Please provide account credential in /etc/sumo.conf and start it up via service or init.d script!

 

It was only after sifting through my docker output that I saw this and learned about the existence of a sumo.conf file. Before that, nothing was happening in the Sumo Logic UI because no collector had been correctly installed and started, even when I started the container. Anyway, we got /etc/sumo.conf out of the way, so what about starting it up “via service or init.d script”?

My solution was to include a simple bash script that starts the collector service on startup. But my Dockerfile extends phusion/baseimage-docker, which uses a custom init system. So the last Dockerfile command,

ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

adds a file called start_sumo like:

#!/bin/bash
service collector start  

into /etc/my_init.d. Make sure it’s executable with chmod +x. Like the conf file, this is saved into .docker/sumo/start_sumo of the app code repository.

I am very open to more elegant ways for getting the Sumo Logic collector to start. I’d also like to see how non-baseimage users deal with init requirements. I would have done this as a runit script as recommended by the baseimage-docker README, but the collector script appears to automatically daemonize itself, which breaks runit.

5. Build and Deploy!

I ran docker build and docker run as usual, and voilà!, the newly installed collector popped up in ManageCollectors.

6. Configure Sources

Before we start seeing logs, we have to tell Sumo what a log file is. I clicked ManageCollectorsAddAdd Source and added a Local File entry that had the absolute path to a log file I was interested in. One of the Sumo Logic videos I watched noted that specifying /path/to/log/dir/** will pick up all log files in a directory.

sumo_logic_manage_collectors_screenshot

I waited a couple of minutes, and log messages started coming into the UI. Sweet! Keep in mind that multiple sources can be added for a single collector.


So far, I’ve learned that I can get a bird’s eye view of all my logs from ManageStatus, and look at actual log messages from Search. I haven’t spent time really getting to know the various queries yet, but if they’re worth writing about, expect another post.

Possible Improvement: The above example installs Sumo Logic inside the app container. An alternate approach might have Sumo installed on the host (or in its own Docker container), reading log files from a shared data volume. This has the benefits of (1) requiring only a single Sumo Logic install for potentially more than one app container, and (2) architectural separation of app from log consumption.

That’s it! This turned out to be surprisingly simple. Kudos to Sumo Logic for offering an easy to use service + free tier that’s totally feasible for smallish apps.

Dwayne Hoover, Senior Sales Engineer

Pushing AWS RDS Logs to Sumo Logic

07.28.2014 | Posted by Dwayne Hoover, Senior Sales Engineer

Collecting log data from Amazon RDS instances can be done through a hosted HTTP collector.  There is some configuration required to make this happen, but once the foundation is built, this can be a seamless integration from RDS to Sumo Logic.

Required Configuration:

Install the AWS RDS Command Line Tools and Configure Access:

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/StartCLI.html

This tutorial was performed on a Linux based EC2 machine, for detailed instructions on Windows, please refer to the documentation in the link above.

  1. Obtain the command line tools
    wget http://s3.amazonaws.com/rds-downloads/RDSCli.zip

  2. Copy the zip file to the desired installation path and unzip

  3. Set up the following environment variables (these might look differently on your system, refer to the documentation for additional detail)
    export AWS_RDS_HOME=/home/ec2-user/RDSCli-1.15.001/
    export PATH=$PATH:$AWS_RDS_HOME/bin
    export JAVA_HOME=/usr/lib/jvm/jre

  4. Set up the proper credentials for RDS access by entering access keys here:
    $AWS_RDS_HOME/credential-file-path.template
    For detailed instructions for RDS access, please see (Providing Credentials for the Tools): http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/StartCLI.html
    You must also be sure that the user account interacting with RDS has the proper permissions configured in IAM: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAM.html

  5. Verify by issuing the following command

    $ rds-describe-db-log-files <rds instance name here>
  6. If a list of the available log files is returned, you are ready to push the data into Sumo Logic.

Set Up a Sumo Logic Hosted HTTP Collector and Source:

http://help.sumologic.com/Help/Default.htm#Setting_up_a_Hosted_Collector.htm

http://help.sumologic.com/Help/Default.htm#Configuring_an_HTTP_Source.htm

  1. Log in to Sumo Logic and select Add Collector

  2. Choose Hosted Collector, Name it and Select OK when asked if you would like to add a data source:

  3. Select HTTP:

  4. Give the source a name and fill out relevant metadata.  Also configure the options for timestamp parsing and multi line settings:

  5. Upon saving the new source, you will be provided with a unique URL.  This is the endpoint to which you will push the AWS RDS logs:

 

Collecting Logs from RDS and Pushing them to Sumo Logic:

To list available log files for your RDS instance, issue the following command:

$ rds-describe-db-log-files <db instance name>

You can limit the list by date last written as follows (note, uses UNIX POSIX timestamp):

$ rds-describe-db-log-files <db instance name> --file-last-written 1395341819000

To manually push logs to your newly configured HTTP endpoint, this can be done using curl.  In the following example, we are pulling one log file and pushing it to Sumo Logic:

$ rds-download-db-logfile orasumo --log-file-name trace\/alert_ORASUMO.log | curl -X POST -d @- https://collectors.sumologic.com/receiver/v1/http/redactedKEY

Note: the forward slash in the file name is escaped with a back slash and the output of the rds-download-db-logfile is piped into a curl command that posts the data to Sumo Logic.

Luckily, the RDS command line tools provide an option to continuously monitor log files for activity, to use this feature for an HTTP push, you can do the following:

$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log | ./watch-rds.sh

Note, that we are piping the output into a shell script.  The contents of our sample script can be seen below:

#! /bin/bash
URL="https://collectors.sumologic.com/receiver/v1/http/<unique URL string>"
while read data;
do
        curl --data "$data" $URL
done


This script will run until cancelled, so it is best to launch it in the background/nohup.

$ nohup sh -c 'rds-watch-db-logfile <your db instance name> --log-file-name <your db log file name> | ./watch-rds.sh'
 

Installed Collector Alternative:

If you already have a Sumo Logic collector installed and can access your RDS logs from the command line utilities, simply piping the results from above to a local file and sending the log messages via the collector will also work.

$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log > /path/to/localfile.log

Where /path/to/localfile.log is a configured Sumo Logic source for the installed collector.

Helpful links:

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference//CLIReference-cmd-DescribeDBLogFiles.html

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference//CLIReference-cmd-DownloadDBLogFilePortion.html

This article originally appeared on DwayneHoover.com

Sanjay Sarathy, CMO

Sumo Logic, ServiceNow and the Future of Event Management

04.29.2014 | Posted by Sanjay Sarathy, CMO

Today’s reality is that companies have to deal with disjointed systems when it comes to detecting, investigating and remediating issues in their infrastructure.  Compound that with the exponential growth of machine data and you have a recipe for frustrated IT and security teams who are tasked with uncovering insights from this data exhaust and then remediating issues as appropriate.  Customer dissatisfaction, at-risk SLAs and even revenue misses are invariable consequences of this fragmented approach.  

With our announcement today of a certified integration with ServiceNow, companies now have a closed loop system that makes it much easier for organizations to uncover known and unknown events in Sumo Logic and then immediately create alerts and incidents in ServiceNow.  The bi-directional integration supports the ability for companies to streamline the entire change management process, capture current and future knowledge, and lay the groundwork for integrated event management capabilities.  This integration takes advantage of all the Sumo Logic analytics capabilities, including LogReduce and Anomaly Detection, to identify what’s happening in your enterprise, even if you never had rules to detect issues in the first place.  

ServiceNow Integration

The cloud-to-cloud integration of ServiceNow and Sumo Logic also boosts productivity by eliminating the whole concept of downloading, installing and managing software.  Furthermore, IT organizations also have the ability to elastically scale their data analytics needs to meet the service management requirements of the modern enterprise.

Let us know if you’re interested in seeing our integration with ServiceNow.  And while you’re at it, feel free to register for Sumo Logic Free.  It’s a zero price way to understand how our machine data analytics service works.

PS – check out our new web page which provides highlights of recent capabilities and features that we’ve launched. 

Bruno Kurtic, Founding Vice President of Product and Strategy

The New Era of Security – yeah, it’s that serious!

02.23.2014 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

Security is a tricky thing and it means different things to different people.   It is truly in the eye of the beholder.  There is the checkbox kind, there is the “real” kind, there is the checkbox kind that holds up, and there is the “real” kind that is circumvented, and so on.  Don’t kid yourself: the “absolute” kind does not exist. 

I want to talk about security solutions based on log data.  This is the kind of security that kicks in after the perimeter security (firewalls), intrusion detection (IDS/IPS), vulnerability scanners, and dozens of other security technologies have done their thing.  It ties all of these technologies together, correlates their events, reduces false positives and enables forensic investigation.  Sometimes this technology is called Log Management and/or Security Information and Event Management (SIEM).  I used to build these technologies years ago, but it seems like decades ago. 

SIEM

A typical SIEM product is a hunking appliance, sharp edges, screaming colors – the kind of design that instills confidence and says “Don’t come close, I WILL SHRED YOU! GRRRRRRRRRR”.

Ahhhh, SIEM, makes you feel safe doesn’t it.  It should not.  I proclaim this at the risk at being yet another one of those guys who wants to rag on SIEM, but I built one, and beat many, so I feel I’ve got some ragging rights.  So, what’s wrong with SIEM?  Where does it fall apart?

SIEM does not scale

It is hard enough to capture a terabyte of daily logs (40,000 Events Per Second, 3 Billion Events per Day) and store them.  It is couple of orders of magnitude harder to run correlation in real time and alert when something bad happens.  SIEM tools are extraordinarily difficult to run at scales above 100GB of data per day.  This is because they are designed to scale by adding more CPU, memory, and fast spindles to the same box.  The exponential growth of data over the two decades when those SIEM tools were designed has outpaced the ability to add CPU, memory, and fast spindles into the box.

Result: Data growth outpaces capacity → Data dropped  from collection → Significant data dropped from correlation → Gap in analysis → Serious gap in security

SIEM normalization can’t keep pace

SIEM tools depend on normalization (shoehorning) of all data into one common schema so that you can write queries across all events.  That worked fifteen years ago when sources were few.  These days sources and infrastructure types are expanding like never before.  One enterprise might have multiple vendors and versions of network gear, many versions of operating systems, open source technologies, workloads running in infrastructure as a service (IaaS), and many custom written applications.  Writing normalizers to keep pace with changing log formats is not possible.

Result: Too many data types and versions → Falling behind on adding new sources → Reduced source support → Gaps in analysis → Serious gaps in security

SIEM is rule-only based

This is a tough one.  Rules are useful, even required, but not sufficient.  Rules only catch the thing you express in them, the things you know to look for.   To be secure, you must be ahead of new threats.  A million monkeys writing rules in real-time: not possible.

Result: Your rules are stale → You hire a million monkeys → Monkeys eat all your bananas → You analyze only a subset of relevant events → Serious gap in security

SIEM is too complex

DuckTapeSIEM

It is way too hard to run these things.  I’ve had too many meetings and discussions with my former customers on how to keep the damned things running and too few meetings on how to get value out of the fancy features we provided.  In reality most customers get to use the 20% of features because the rest of the stuff is not reachable.  It is like putting your best tools on the shelf just out of reach.  You can see them, you could do oh so much with them, but you can’t really use them because they are out of reach.

Result: You spend a lot of money → Your team spends a lot of time running SIEM → They don’t succeed on leveraging the cool capabilities → Value is low → Gaps in analysis → Serious gaps in security   

So, what is an honest, forward-looking security professional who does not want to duct tape a solution to do?  What you need is what we just started: Sumo Logic Enterprise Security Analytics.  No, it is not absolute security, it is not checkbox security, but it is a more real security because it:

Scales

Processes terabytes of your data per day in real time. Evaluates rules regardless of data volume and does not restrict what you collect or analyze.  Furthermore, no SIEM style normalization, just add data, a pinch of savvy, a tablespoon of massively parallel compute, and voila.

Result: you add all relevant data → you analyze it all → you get better security 

Simple

It is SaaS, there are no appliances, there are no servers, there is no storage, there is just a browser connected to an elastic cloud.

Result: you don’t have to spend time on running it → you spend time on using it → you get more value → better analysis → better security

Machine Learning

SecurityAnomaliesRules, check.  What about that other unknown stuff?  Answer: machine that learns from data.  It detects patterns without human input.  It then figures out baselines and normal behavior across sources.  In real-time it compares new data to the baseline and notifies you when things are sideways.  Even if “things” are things you’ve NEVER even thought about and NOBODY in the universe has EVER written a single rule to detect.  Sumo Logic detects those too. 

Result: Skynet … nah, benevolent overlord, nah, not yet anyway.   New stuff happens → machines go to work → machines notify you → you provide feedback → machines learn and get smarter → bad things are detected → better security

Read more: Sumo Logic Enterprise Security Analytics

Bruno Kurtic, Founding Vice President of Product and Strategy

Sumo Logic Anomaly Detection is now in Beta!

09.10.2013 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

What is “anomaly detection”?

Here is how the peeps on the interweb and wikipedia define it: Anomaly detection (also known as outlier detection) is the search for events which do not conform to an expected pattern. The detected patterns are called anomalies and often translate to critical and actionable insights that, depending on the application domain, are referred to as outliers, changes, deviations, surprises, intrusions, etc.

The domain: Machine Data

data growthMachine data (most frequently referred to as log data) is generated by applications, servers , infrastructure, mobile devices, web servers, and more.  It is the data generated by machines in order to communicate to humans or other machines exactly what they are doing (e.g. activity), what the status of that activity is (e.g. errors, security issues, performance), and results of their activity (e.g. business metrics). 

The problem of unknown unknowns

Most problems with analyzing machine data orbit around the fact that existing operational analytics technologies enable users to find only those things they know to look for.  I repeat, only things they KNOW they need to look for.  Nothing in these technologies helps users proactively discover events they don’t anticipate getting, events that have not occurred before, events that may have occurred before but are not understood, or complex events that are not easy or even possible to encode into queries and searches.  

Our infrastructure and applications are desperately, and constantly, trying to tell us what’s going on through the massive real-time stream of data they relentlessly throw our way.  And instead of listening, we ask a limited set of questions from some playbook. This is as effective as a patient seeking advice about massive chest pain from a doctor who, instead of listening, runs through a checklist containing skin rash, fever, and runny nose, and then sends the patient home with a clean bill of health.

This is not a good place to be; these previously unknown events hurt us by repeatedly causing downtime, performance degradations, poor user experience, security breaches, compliance violations, and more.  Existing monitoring tools would be sufficient if we lived in static, three system environments where we can enumerate all possible failure conditions and attack vectors.  But we don’t.

unknown events

We operate in environments where we have thousands of sources across servers, networks, and applications and the amount of data they generate is growing exponentially.  They come from a variety of vendors, run a variety of versions, are geographically distributed, and on top of that, they are constantly updated, upgraded, and replaced.  How can we then rely on hard-coded rules and queries and known condition tools to ensure our applications and infrastructure is healthy and secure?  We can’t – it is a fairy tale.  

We believe that three major things are required in order to solve the problem of unknown unknowns at a multi-terabyte scale:

  1. Cloud: enables an elastic compute at the massive scale needed to analyze this scale of data in real-time across all vectors

  2. Big Data technologies: enable a holistic approach to analyzing all data without being bound to schemas, volumes, or batch analytics

  3. Machine learning engine: advanced algorithms that analyze and learn from data as well as humans in order to get smarter over time

Sumo Logic Real-Time Anomaly Detection

anomalyToday we have announced Beta access to our Anomaly Detection engine, an engine that uses thousands of machines in the cloud and continuously and in real-time analyzes ALL of your data to proactively detect important changes and events in your infrastructure.  It does this without requiring users to configure or tune the engine, to write queries or rules, to set thresholds, or to write and apply data parsers.  As it detects changes and events, it bubbles them up to the users for investigation, to add knowledge, classify events, and to apply relevance and severity.  It is in fact this combination of a powerful machine learning algorithm and human expert knowledge that is the real power of our Anomaly Detection engine.

known eventsSo, in essence, Sumo Logic Anomaly Detection continuously turns unknown events into known events.  And that’s what we want: to make events known, because we know how to handle and what to do with known events.  We can alert on them, we can create playbooks and remediation steps, we can prevent them, we can anticipate their impact, and, at least in some cases, we can make them someone else’s problem.

 

In conclusion

Sumo Logic Anomaly Detection has been more than three years in the making.  During that time, it has had the energy of the whole company and our backers behind it.  Sumo Logic was founded with the belief that this capability is transformational in the face of exponential data growth and infrastructure sprawl.  We developed architecture and adopted a business model that enable us to implement an analytics engine that can solve the most complex problems of the Big Data decade.

We look forward to learning from the experience of our Beta customers and soon from all of our customers about how to continue to grow this game changing capability.  Read more here and join us.

Jacek Migdal

Do logs have a schema?

06.12.2013 | Posted by Jacek Migdal

As human beings, we share quite a few life events that we keep track of, like birthdays, holidays, anniversaries, and so on. These are structured events that occur on exact dates or during specific times of year. 

But how do you keep track of the unique, unexpected events that can be life-changing? The first meeting with someone, an inspiring conversation that sparked a realization—events that may seem common to many, but are so special to you.

Computer systems offer the same dilemma. Some events are expected, like adding a new user. Other events look routine, but from time to time they carry crucial, unexpected information. Unfortunately we most often realize how important pivotal events were after we experience a malfunction.

That’s where logs come in.

Virtually every computer program has some append-only structure for logs. Usually, it is as simple as a text file with a new line for each event. Sometimes the messages are saved to a database if the information may be used later. Why does it work that way? Well, it’s very easy to use and implement–usually it’s just one line of code. Don’t let the simplicity fool you. Logs provide a very powerful way of understanding and debugging systems. In many cases, logs are the sole method of figuring out the reason why something has happened.

From time to time, I’ll read about a new log management tool that converts log data into some standardized format. Well, there is limited value in that approach. Extracting data from logs is useful and could answer many business and operational questions. This works well with things that we expect, and things that answer numerical questions, like determining how many users have signed up in a given period of time.

However, during the process of converting logs to a standardized format, valuable data could be lost. For example, it’s interesting that many users couldn’t log in to your service, but the crucial information is why it happened. The unexpected part is usually very important and often even more valuable.

So do logs have a schema? Well, for the expected things, sure. But for analyzing the unexpected events it’s hard to think of a schema at all, beyond perhaps some partial structure.

That’s why at Sumo Logic, we accept any kind of log you throw at us. During log collection we just need to understand the events (e.g. separate lines) and the timestamp format. Everything else can be derived when you run a query.

Our query language lets you to find or extract structure, and data can be visualized and/or exported. Sumo Logic’s key advantage is how we handle the unexpected with machine learning algorithms. Our patent-pending LogReduce groups similar events on the fly to find anomalies, enabling our customers to review large sets of events quickly to identify the root cause of unexpected things.

No one ever intends to create bugs, but with the complexity and fast pace of software development they are inevitable. Well-designed systems should be debuggable. Log management tools, such as Sumo Logic, are here to help you deal with the logs that are a huge part of today’s technology.

“These days are only important, which are still unknown to us
These several moments are important, these for which we still wait”
(lyrics from famous Polish song by Marek Grechuta)

 

Amanda Saso, Sr. Tech Writer

Logs and laundry: What you don’t know can hurt you

05.29.2013 | Posted by Amanda Saso, Sr. Tech Writer

Have you ever put your cell phone through the wash?  Personally, I’ve done it. Twice.  What did I learn, finally?  To always double-check where I put my iPhone before I turn on the washing machine.  It’s a very real and painful threat that I’ve learned to proactively manage by using a process with a low rate of failure.  But, from time to time, other foreign objects slip through, like a lipstick, my kids’s crayon, a blob of Silly Putty—things that are cheaper than an iPhone yet create havoc in the dryer.  Clothes are stained, the dryer drum is a mess, and my schedule is thrown completely off while I try to remember my grandmother’s instructions for removing red lipstick from a white shirt.  

What do low-tech laundry woes have to do with Sumo Logic’s big data solution? Well, I see LogReduce as a tool that helps fortify your organization against known problems (for which you have processes in place) while guarding against unknown threats that may cause huge headaches and massive clean-ups.

When you think about it, a small but messy threat that you don’t know you need to look for is a nightmare. These days we’re dealing with an unbelievable quantity of machine data that may not be human-readable, meaning that a proverbial Chap Stick in the pocket could be lurking right below your nose. LogReduce takes the “noise” out of that data so you can see those hidden threats, problems, or issues that could otherwise take a lot of time to resolve.

Say you’re running a generic search for a broad area of your deployment, say billing errors, user creations, or log ins. Whatever the search may be, it returns thousands and thousands of pages of results. So, you could take your work day slogging through messages, hoping to find the real problem, or you can simply click Log Reduce. Those results are logically sorted into signatures–groups of messages that contain similar or relevant information. Then, you can teach Sumo Logic what messages are more important, and what data you just don’t need to see again. That translates into unknown problems averted.

Of course your team has processes in place to prevent certain events. How do you guard against the unknown? LogReduce can help you catch a blip before it turns into a rogue wave. Oh, and if you ever put Silly Putty through the washer and dryer, a good dose of Goo Gone will do the trick.

Sending CloudPassage Halo Event Logs to Sumo Logic

04.23.2013 | Posted by CloudPassage: Cloud Security

The below is a guest post from CloudPassage.

Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions.  For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful.

The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more.

The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur.

The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic.

Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script.  

Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below).

SumoLogicScreenshot1

When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed.

SumoLogicScreenshot3

Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches:

SumoLogicScreenshot2

Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.

Sanjay Sarathy, CMO

Universal Collection of Machine Data

04.18.2013 | Posted by Sanjay Sarathy, CMO

Customers love flexibility, especially if that flexibility drives additional business value.  In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation.  There may be a variety of reasons why you don’t want or can’t have local collectors  - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments.  Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.    

 

 

With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure.  They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.

With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure.  Sumo Logic provides that flexibility.

Ben Newton, Corporate Sales Engineering Manager

Harder, Better, Faster, Stronger – Machine Data Analytics and DevOps

03.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager

Work It Harder, Make It Better

Do It Faster, Makes Us Stronger

More Than Ever Hour After

Our Work Is Never Over

     Daft Punk – “Harder, Better, Faster, Stronger”

 

When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in “Stronger”). So often, I find the “spirit” of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps “essence” – programmable architecture, agile development, and lean methodology – in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps “team”, and the continuous improvement aspects of lean and agile methodology.

So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps?  In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach.

 

1.  Let the data speak for itself

Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns – both good and ill – that would escape the inflexible tools of the past.

2.  Continuous reinterpretation

Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can’t wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now.

3. Any metric you want, any time you want it

The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers’ experience and the operation of your applications is truly groundbreaking. 

4. Set the data free

Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy.

5. Develop DevOps applications, not DevOps tools

The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at – developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders.

 

The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace – and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.

Twitter