Blog › Use Cases

Mike Cook

Why TuneIn Chose Sumo Logic For Machine Data Analytics

09.15.2014 | Posted by Mike Cook

The following is a guest post from Mike Cook, Director of Technical Operations at TuneIn.

Introduction

TuneIn is a rapidly growing service that allows consumers to listen to over 100,000 radio stations and more than four million podcasts from every continent. During the recently held Soccer World Cup, over 10.5 million people listened live to the games on radio stations streamed via TuneIn, one of the biggest events in our company’s history.

The State of Machine Data Analytics, pre-Sumo Logic

We had no consolidated strategy to analyze our logs, across our systems and applications alike. As a result and especially because of our rapid growth, troubleshooting and event correlation became manual and increasingly painful affairs that involved looking at individual server and application logs. We tried internal tools including syslog-ng and rsyslog, but the maintenance and overhead costs on a lean IT team were too high.

Why Sumo Logic?

There were a number of reasons why Sumo Logic was appealing to us:

  • As a cloud-based service, Sumo Logic immediately offloaded the maintenance issues we had to deal with running our own home-grown log management infrastructure.
  • With pre-built support for a number of infrastructure components that TuneIn runs on, including AWS, Cisco, VMware, Windows/IIS and Linux, we were confident that we could get insights around our logs far faster than other options.
  • In addition, the Sumo Logic LogReduce technology provided a more robust investigation tool to find root causes of issues that traditional monitoring tools just can’t detect.
  • Finally, Sumo Logic provides a compelling business value for what we are trying to accomplish

Internal Adoption of Sumo Logic

We started with creating basic dashboards and alerts around our key operating system logs. As the application development teams realized the value of what Sumo Logic could provide them, we added additional log sources and launched a series of lunch-and-learns to demonstrate the value of the service. These lunch-and-learns have rapidly broadened the adoption of Sumo Logic across TuneIn. We’re now seeing support teams using it to get customer statistics; different development teams using it to analyze API performance; and the executive team getting overall visibility through real-time dashboards.

Business Benefits

It didn’t take us long to see the benefit of Sumo Logic. Since TuneIn is a distributed PaaS, it was frequently difficult to correspond and troubleshoot issues in a particular API. Time to resolution began to drop from several hours to several minutes as developer and operations staff can search and pinpoint issues. Our development teams were quick to create custom searches and alerts for errors and exceptions in the logs, allowing us to reduce overall error rates by close to 20%. Without Sumo Logic we wouldn’t have even known most of those errors were even occurring. We’re just scratching the surface with Sumo Logic. We’re continue to expand our usage and gain critical insight into API performance, what our most popular user activities are, and where our application bottlenecks are. Sumo Logic isn’t just a log consolidation tool, it also serves as a critical tool in our Business Intelligence toolbox.

Dwayne Hoover, Senior Sales Engineer

Four Ways to Collect Docker Logs in Sumo Logic

09.03.2014 | Posted by Dwayne Hoover, Senior Sales Engineer

Docker is incredibly popular right now and is changing the way sysadmins, developers and engineers go about their day to day lives.  This walkthrough isn’t going to introduce you to Docker, I’ll assume that you know what it is and how to use it.  If not, don’t worry, here is a great place to start: What is Docker?

With Sumo Logic’s firm entrenchment in the DevOps and cloud culture, it’s no surprise that a large number of our customers are utilizing Docker.  Some are already pushing their Docker related logs and machine data to Sumo Logic, others have reached out with some interesting use cases for guidance and best practices to getting their valuable machine data into Sumo Logic.  This post will walk you through a few scenarios which will ultimately result in your machine data in one centralized location for troubleshooting and monitoring.

Four (OK, Five) Ways to Push Your Logs from Docker into Sumo Logic

  1. Install a Sumo Logic collector per container
  2. Collect local JSON logs from the host
  3. Stream the live logs from your container to a Sumo Logic HTTP endpoint
  4. Bind your containers to a volume on the host and collect from there
  5. *BONUS* Install a collector running in a helper container

Install a Sumo Logic Collector per Container

Before the Docker purists jump all over this one, let me clarify that I’m fully aware that this method may invoke some heated debate.  If you are a one application/service and one application/service only per container individual, then skip to the next method.  Otherwise, thanks for sticking around.

In this approach, we will install an ephemeral collector as part of a container build, deploy a sumo.conf file to provide authentication (and other information) and optionally, deploy a sources.json file to provide paths to log files.

Here is an example Dockerfile to accomplish this task.  Note, I deployed MongoDB as an example application, but this could represent any application, just modify accordingly.

# Sumo Logic docker
# VERSION 0.3
# Adjusted to follow approach outlined here:
# http://paislee.io/how-to-add-sumologic-to-a-dockerized-app/
FROM phusion/baseimage:latest
MAINTAINER Dwayne Hoover
ADD https://collectors.sumologic.com/rest/download/deb/64 /tmp/collector.deb
ADD sumo.conf /etc/sumo.conf
# ADD sources.json /etc/sources.json

# ensure that the collector gets started when you launch
ADD start_sumo /etc/my_init.d/start_sumo
RUN chmod 755 /etc/my_init.d/start_sumo

# this start_sumo file should look something like this (minus the comments)
# #!/bin/bash
# service collector start

# install deb
RUN dpkg -i /tmp/collector.deb

# let’s install something
# put your own application here
RUN apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv 7F0CEB10
RUN echo ‘deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen’ | tee /etc/apt/sources.list.d/mongodb.list
RUN apt-get update
RUN apt-get install -y -q mongodb-org
# Create the MongoDB data directory
RUN mkdir -p /data/db
EXPOSE 27017

# ensure that mongodb gets started when you launch
ADD start_mongo /etc/my_init.d/start_mongo
RUN chmod 755 /etc/my_init.d/start_mongo

CMD /sbin/my_init

It’s worth noting that the container will start the services specified in the scripts deployed to your /etc/my_init.d directory by issuing the /sbin/my_init command at container launch.  Unless specified, logs will go straight to STDOUT and will not be available via path from the Sumo Logic UI using the collector that you just installed.  However, you can add logs from /var/log just as if it were a standard Linux image: 

docker source 1

Let’s add the Sumo Logic collector logs as well:

docker source 2 

And now, we have log data flowing in:

docker collectors status

Simply add the additional paths that you are interested in gathering machine data from and it will flow into your Sumo Logic account while the container is running. 

Collect Local JSON Logs From the Docker Host

If you would like to see all of the log data from all of your containers, it’s as easy as installing a collector on the Docker host and pointing it to the
directory containing the container logs.  Follow the instructions here to deploy a collector

Once installed and registered, point it to the Docker logs:

docker source local

The default path for the docker logs are in: /var/lib/docker/containers/[container id]/[container-id]-json.log

These logs are in JSON format and are easily parsed using our anchor based parse or JSON operator.  For more information, see the following:

http://help.sumologic.com/Help/Default.htm#Parse__Anchor_.htm
http://help.sumologic.com/Help/Default.htm#JSON_Operator.htm

Stream the Live Logs From a Container to a Sumo Logic HTTP Endpoint

If you don’t want to install a collector, you can use a “Hosted Collector” and push your log messages from a container to your unique HTTP endpoint.  This is done on the Docker host and utilizes the “docker logs” command.

 First, configure a hosted collector with a HTTP Source:
http://help.sumologic.com/Help/Default.htm#Setting_up_a_Hosted_Collector.htm
http://help.sumologic.com/Help/Default.htm#Configuring_an_HTTP_Source.htm

add new collector

 

HTTP Source

The basic steps here are to choose “Add Collector” then “Hosted Collector” and choose “Add Source” selecting HTTP.  Once the source metadata is entered, you will be provided a unique URL that can accept incoming data via HTTP POST.  Scripting a basic wrapper for CURL will do the trick.  Here is an example:

#!/bin/bash
URL="https://collectors.sumologic.com/receiver/v1/http/[your unique URL]"
while read data;
do
     curl –data “$data” $URL
done

Edit the URL variable to match the unique URL you obtained when setting up the HTTP source. 

Using the docker logs command, you can follow the STDOUT data that your container is generating.  Assuming that we named our curl wrapper “watch-docker.sh” and we are running a container named “mongo_instance_002” we could issue a command like:

nohup sh -c 'docker logs --follow=true --timestamps=true mongo_instance_002 | ./watch-docker.sh' &

 While the container is running, its output will be sent directly to Sumo Logic.  Note that “follow” is set to true.  This ensures that we are collecting a live stream of the STDOUT data from the container.  I’ve also chosen to turn on timestamps so that Sumo Logic will use the timestamp from the docker logs command when the data is indexed. 

For more information on docker logs, go here.

Bind Your Containers to a Volume on the Host and Collect There

Docker provides the ability to mount a volume on the host to the container.  Without getting into too much detail about how to achieve this (different methods may be applicable for your needs), check out the Docker documentation here:

https://docs.docker.com/userguide/dockervolumes/

Assuming that your dockerized applications are logging to a predefined directory and that directory is mounted to the host filesystem, you can install a Sumo Logic collector on your Docker host and create a new source that monitors the shared volume that you are logging to.

Again, the instructions for setting up a collector and sources can be found here.

There are some concerns that this could open up a potential attack vector if your container is compromised, so please ensure that the necessary security measures have been put into place.  For internal use cases on an already secured environment, this is likely a non-factor.

*BONUS* Install a Collector Running in a “Helper” Container

Borrowing from Caleb Sotelo’s post it’s possible to stick to the “one process, one container” mentality, and run a Sumo Logic collector within its own Docker container.   

You can create a container that only contains a Sumo Logic collector.  Assuming that this container has the ability to see other containers in your Docker environment, you can configure data sources to pull logs from other containers.  This can be done over SSH using remote sources in Sumo Logic or by utilizing shared volumes and using Local File sources.

Setting up remote sources requires SSH access either via a shared key or username/password combo.  Here is some additional detail on setting up remote sources.

Please note, this is dependent upon the configuration of your Docker environment supporting communication between containers via SSH or shared volumes and is a bit out of scope for this walkthrough.  For some additional information, check these out:

Docker’s instructions on setting up volumes
The phusion baseimage that includes the SSH daemon and server

Caleb Sotelo

How To Add Sumo Logic To A Dockerized App

08.27.2014 | Posted by Caleb Sotelo

This is a guest post from Caleb Sotelo who is a software engineer at OpenX and has been reprinted with his permission.  You can view the original here.  

kuniyoshi_utagawa_the_sumo_wrestler

Sumo Logic is a nifty service that makes it easy watch and analyze your apps’ logs. You install a collector local to the app in question, point it at some files to watch, and your logs are sent to the cloud, where they can be sliced and diced from a nice web UI.

At a high level, Sumo Logic wants to transform big data logs into operational information. But at a practical level, it’s nice not to have to SSH to a production machine and tail or grep enormous files to take your app’s pulse. Not to mention the idea is totally in line with one of the Twelve Factors: treating logs as event streams enables better understanding of an app’s behavior over time.

I’ve had success using SumoLogic at OpenX, so I wanted to give their free tier a shot for a personal project. The only limitations are a 500MB of data per day limit and 7 days of retention. I was surprised not to find anything on the web for installing Sumo Logic alongside a Dockerized app, and I had a couple of Docker-based candidates. So without further ado, here’s how to add Sumo Logic to a Dockerized app:

1. Sign up for Sumo Logic Free

Head over to sumologic.com/signup to sign up. The only catch here is that you’ll need a company email address. For the project I’m going to use SumoLogic for, I own and manage my own domain, so it wasn’t too much trouble to create an email address using my registrar’s mail service. Since I host the domain separately, I did have to add an MX record to my zone file to point to the registrar’s mail server. For example, with DigitalOcean.

2. Download a Collector

Once you confirm your email address and log in, you be stepped through a process for downloading and installing a collector. I chose the Installed Collector and downloaded sumocollector_19.91-2_amd64.deb, becuase my Docker image is based on Ubuntu.

sumo_logic_choose_collector_screenshot

After downloading the collector, the setup wizard proceeds to a screen that spins until it detects a newly installed collector. I didn’t yet know how I was going to install it, and I got logged out of Sumo Logic anyway due to inactivity, so I abandoned the wizard at that point. The Sumo Logic UI changed itself as soon as it detected that my first collector had been installed.

  • As I plan to install the Sumo Logic collector during the docker build process, I uploaded the .deb file to a Dropbox and grabbed the public link to use later.

3. Create Access Keys

When a collector client is installed it has to have some way of authenticating to the Sumo Logic server. The docs for creating a sumo.conf file (we’ll get there soon) offer two choices: (1) provide your Sumo Logic email and password, or (2) provide access keys generated from the UI. The latter is recommended if only to avoid storing a username/password in plaintext. Keys can be generated from ManageCollectorsAccess KeysCreate.

4. Augment your Docker Container

Here’s the Docker-specific part of installing Sumo Logic. We’ll add some lines to our app’s Dockerfile and author two files that are ADDed to the container during a docker build. I assume working knowledge of Docker, but here is the list of Dockerfile commands for good measure.

4.1 Create sumo.conf

First create a sumo.conf file like the following:

name={collector_name}  
accessid={your_access_id}  
accesskey={your_access_key}  

 

where name is an arbitrary name for this collector, and accessid and accesskey are those generated in step 3. There are many more conf options specified here but the important ones, namely sources, can actually be configured through the UI later on.

By convention I put Docker-specific files into .docker/{resource}, so this one goes to .docker/sumo/sumo.conf. It’ll be referenced in our Dockerfile shortly.

4.2 Modify your Dockerfile

Add a block like the following to your Dockerfile (assumed to live in the root of your app’s code), preferably before your actual app is added:

# install sumologic
RUN apt-get -qq update  
RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  
RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  
ADD .docker/sumo/sumo.conf /etc/sumo.conf  
ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

 

Let’s break this down:

RUN apt-get -qq update  

Update sources. This may not be necessary, but I like to put this before each dependancy installed by my Dockerfile to avoid issues with image caching.

RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  

We’ll use wget to grab the collector file we uploaded in step 2. You may opt to ADD the file locally, but this option avoids having to check the resource into your app’s source code, while housing it in a consistent location. Better practice would be to store it in some kind of artifact repository and version it.

RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  

Install the debian package and clean up.

ADD .docker/sumo/sumo.conf /etc/sumo.conf  

Copy the newly created sumo.conf file to the place where the collector expects to find it.

Before we get to the last line, let’s pause. If you were able to catch the output from installing the collector, you saw something like:

Preparing to unpack sumocollector_19.91-2_amd64.deb ...
Unpacking sumocollector (1:19.91-2) ...
Setting up sumocollector (1:19.91-2) ...
configuring collector....
configuring collector to run as root
Detected Ubuntu:
Installing the SumoLogic Collector daemon using init.d..
 Adding system startup for /etc/init.d/collector ...
   /etc/rc0.d/K20collector -> ../init.d/collector
   /etc/rc1.d/K20collector -> ../init.d/collector
   /etc/rc6.d/K20collector -> ../init.d/collector
   /etc/rc2.d/S20collector -> ../init.d/collector
   /etc/rc3.d/S20collector -> ../init.d/collector
   /etc/rc4.d/S20collector -> ../init.d/collector
   /etc/rc5.d/S20collector -> ../init.d/collector
Collector has been successfully installed. Please provide account credential in /etc/sumo.conf and start it up via service or init.d script!

 

It was only after sifting through my docker output that I saw this and learned about the existence of a sumo.conf file. Before that, nothing was happening in the Sumo Logic UI because no collector had been correctly installed and started, even when I started the container. Anyway, we got /etc/sumo.conf out of the way, so what about starting it up “via service or init.d script”?

My solution was to include a simple bash script that starts the collector service on startup. But my Dockerfile extends phusion/baseimage-docker, which uses a custom init system. So the last Dockerfile command,

ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

adds a file called start_sumo like:

#!/bin/bash
service collector start  

into /etc/my_init.d. Make sure it’s executable with chmod +x. Like the conf file, this is saved into .docker/sumo/start_sumo of the app code repository.

I am very open to more elegant ways for getting the Sumo Logic collector to start. I’d also like to see how non-baseimage users deal with init requirements. I would have done this as a runit script as recommended by the baseimage-docker README, but the collector script appears to automatically daemonize itself, which breaks runit.

5. Build and Deploy!

I ran docker build and docker run as usual, and voilà!, the newly installed collector popped up in ManageCollectors.

6. Configure Sources

Before we start seeing logs, we have to tell Sumo what a log file is. I clicked ManageCollectorsAddAdd Source and added a Local File entry that had the absolute path to a log file I was interested in. One of the Sumo Logic videos I watched noted that specifying /path/to/log/dir/** will pick up all log files in a directory.

sumo_logic_manage_collectors_screenshot

I waited a couple of minutes, and log messages started coming into the UI. Sweet! Keep in mind that multiple sources can be added for a single collector.


So far, I’ve learned that I can get a bird’s eye view of all my logs from ManageStatus, and look at actual log messages from Search. I haven’t spent time really getting to know the various queries yet, but if they’re worth writing about, expect another post.

Possible Improvement: The above example installs Sumo Logic inside the app container. An alternate approach might have Sumo installed on the host (or in its own Docker container), reading log files from a shared data volume. This has the benefits of (1) requiring only a single Sumo Logic install for potentially more than one app container, and (2) architectural separation of app from log consumption.

That’s it! This turned out to be surprisingly simple. Kudos to Sumo Logic for offering an easy to use service + free tier that’s totally feasible for smallish apps.

Sanjay Sarathy, CMO

Machine Data Analytics, Down Under

08.20.2014 | Posted by Sanjay Sarathy, CMO

Not often have I spent two weeks in August in a “winter” climate, but it was a great opportunity to spend some time with our new team in Australia, visit with prospects, customers and partners, and attend a couple of Amazon Web Service Summits to boot.  

Here are some straight-off-the-plane observations.

A Local “Data Center” Presence Matters:  We now have production instances in Sydney, Dublin and the United States.  In conversations with Australian enterprises and government entities, the fact that we have both a local team and a local production instance went extremely far when determining whether we were a good match for their needs.  This was true whether their use case centered around supporting their security initiatives or enabling their DevOps teams to release applications faster to market.  You can now select where your data resides when you sign up for Sumo Logic Free.

Australia is Ready For the Cloud:  From the smallest startup to extremely large mining companies, everyone was interested in how we could support their cloud initiatives.  The AWS Summits were packed and the conversations we had revolved not just around machine data analytics but what we could do to support their evolving infrastructure strategy.  The fact that we have apps for Amazon S3, Cloudfront, CloudTrail and ELB made the conversations even more productive, and we’ve seen significant interest in our special trial for AWS customers.  

We’re A Natural Fit for Managed Service Providers:  As a multi-tenant service born in the Cloud, we have a slew of advantages for MSP and MSSPs looking to embed proactive analytics into their service offering, as our work with The Herjavec Group and Medidata shows.  We’ve had success with multiple partners in the US and the many discussions we had in Australia indicate that there’s a very interesting partner opportunity there as well.    

Analytics and Time to Insights:  In my conversations with dozens of people at the two summits and in 1-1 meetings, two trends immediately stand out.  While people remain extremely interested in how they can take advantage of real-time dashboards and alerts, one of their bigger concerns typically revolved around how quickly they could get to that point.  ”I don’t have time to do a lot of infrastructure management” was the common refrain and we certainly empathize with that thought.  The second is just a reflection on how we sometimes take for granted our pattern recognition technology, aka, LogReduce.  Having shown this to quite a few people at the booth, the reaction on their faces never gets old especially after they see the order of magnitude by which we reduce the time taken to find something interesting in their machine data.  

At the end of the day, this is a people business.  We have a great team in Australia and look forward to publicizing their many successes over the coming quarters.

photo (6)

Dwayne Hoover, Senior Sales Engineer

Pushing AWS RDS Logs to Sumo Logic

07.28.2014 | Posted by Dwayne Hoover, Senior Sales Engineer

Collecting log data from Amazon RDS instances can be done through a hosted HTTP collector.  There is some configuration required to make this happen, but once the foundation is built, this can be a seamless integration from RDS to Sumo Logic.

Required Configuration:

Install the AWS RDS Command Line Tools and Configure Access:

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/StartCLI.html

This tutorial was performed on a Linux based EC2 machine, for detailed instructions on Windows, please refer to the documentation in the link above.

  1. Obtain the command line tools
    wget http://s3.amazonaws.com/rds-downloads/RDSCli.zip

  2. Copy the zip file to the desired installation path and unzip

  3. Set up the following environment variables (these might look differently on your system, refer to the documentation for additional detail)
    export AWS_RDS_HOME=/home/ec2-user/RDSCli-1.15.001/
    export PATH=$PATH:$AWS_RDS_HOME/bin
    export JAVA_HOME=/usr/lib/jvm/jre

  4. Set up the proper credentials for RDS access by entering access keys here:
    $AWS_RDS_HOME/credential-file-path.template
    For detailed instructions for RDS access, please see (Providing Credentials for the Tools): http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/StartCLI.html
    You must also be sure that the user account interacting with RDS has the proper permissions configured in IAM: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAM.html

  5. Verify by issuing the following command

    $ rds-describe-db-log-files <rds instance name here>
  6. If a list of the available log files is returned, you are ready to push the data into Sumo Logic.

Set Up a Sumo Logic Hosted HTTP Collector and Source:

http://help.sumologic.com/Help/Default.htm#Setting_up_a_Hosted_Collector.htm

http://help.sumologic.com/Help/Default.htm#Configuring_an_HTTP_Source.htm

  1. Log in to Sumo Logic and select Add Collector

  2. Choose Hosted Collector, Name it and Select OK when asked if you would like to add a data source:

  3. Select HTTP:

  4. Give the source a name and fill out relevant metadata.  Also configure the options for timestamp parsing and multi line settings:

  5. Upon saving the new source, you will be provided with a unique URL.  This is the endpoint to which you will push the AWS RDS logs:

 

Collecting Logs from RDS and Pushing them to Sumo Logic:

To list available log files for your RDS instance, issue the following command:

$ rds-describe-db-log-files <db instance name>

You can limit the list by date last written as follows (note, uses UNIX POSIX timestamp):

$ rds-describe-db-log-files <db instance name> --file-last-written 1395341819000

To manually push logs to your newly configured HTTP endpoint, this can be done using curl.  In the following example, we are pulling one log file and pushing it to Sumo Logic:

$ rds-download-db-logfile orasumo --log-file-name trace\/alert_ORASUMO.log | curl -X POST -d @- https://collectors.sumologic.com/receiver/v1/http/redactedKEY

Note: the forward slash in the file name is escaped with a back slash and the output of the rds-download-db-logfile is piped into a curl command that posts the data to Sumo Logic.

Luckily, the RDS command line tools provide an option to continuously monitor log files for activity, to use this feature for an HTTP push, you can do the following:

$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log | ./watch-rds.sh

Note, that we are piping the output into a shell script.  The contents of our sample script can be seen below:

#! /bin/bash
URL="https://collectors.sumologic.com/receiver/v1/http/<unique URL string>"
while read data;
do
        curl --data "$data" $URL
done


This script will run until cancelled, so it is best to launch it in the background/nohup.

$ nohup sh -c 'rds-watch-db-logfile <your db instance name> --log-file-name <your db log file name> | ./watch-rds.sh'
 

Installed Collector Alternative:

If you already have a Sumo Logic collector installed and can access your RDS logs from the command line utilities, simply piping the results from above to a local file and sending the log messages via the collector will also work.

$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log > /path/to/localfile.log

Where /path/to/localfile.log is a configured Sumo Logic source for the installed collector.

Helpful links:

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference//CLIReference-cmd-DescribeDBLogFiles.html

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference//CLIReference-cmd-DownloadDBLogFilePortion.html

This article originally appeared on DwayneHoover.com

Bruno Kurtic, Founding Vice President of Product and Strategy

The New Era of Security – yeah, it’s that serious!

02.23.2014 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

Security is a tricky thing and it means different things to different people.   It is truly in the eye of the beholder.  There is the checkbox kind, there is the “real” kind, there is the checkbox kind that holds up, and there is the “real” kind that is circumvented, and so on.  Don’t kid yourself: the “absolute” kind does not exist. 

I want to talk about security solutions based on log data.  This is the kind of security that kicks in after the perimeter security (firewalls), intrusion detection (IDS/IPS), vulnerability scanners, and dozens of other security technologies have done their thing.  It ties all of these technologies together, correlates their events, reduces false positives and enables forensic investigation.  Sometimes this technology is called Log Management and/or Security Information and Event Management (SIEM).  I used to build these technologies years ago, but it seems like decades ago. 

SIEM

A typical SIEM product is a hunking appliance, sharp edges, screaming colors – the kind of design that instills confidence and says “Don’t come close, I WILL SHRED YOU! GRRRRRRRRRR”.

Ahhhh, SIEM, makes you feel safe doesn’t it.  It should not.  I proclaim this at the risk at being yet another one of those guys who wants to rag on SIEM, but I built one, and beat many, so I feel I’ve got some ragging rights.  So, what’s wrong with SIEM?  Where does it fall apart?

SIEM does not scale

It is hard enough to capture a terabyte of daily logs (40,000 Events Per Second, 3 Billion Events per Day) and store them.  It is couple of orders of magnitude harder to run correlation in real time and alert when something bad happens.  SIEM tools are extraordinarily difficult to run at scales above 100GB of data per day.  This is because they are designed to scale by adding more CPU, memory, and fast spindles to the same box.  The exponential growth of data over the two decades when those SIEM tools were designed has outpaced the ability to add CPU, memory, and fast spindles into the box.

Result: Data growth outpaces capacity → Data dropped  from collection → Significant data dropped from correlation → Gap in analysis → Serious gap in security

SIEM normalization can’t keep pace

SIEM tools depend on normalization (shoehorning) of all data into one common schema so that you can write queries across all events.  That worked fifteen years ago when sources were few.  These days sources and infrastructure types are expanding like never before.  One enterprise might have multiple vendors and versions of network gear, many versions of operating systems, open source technologies, workloads running in infrastructure as a service (IaaS), and many custom written applications.  Writing normalizers to keep pace with changing log formats is not possible.

Result: Too many data types and versions → Falling behind on adding new sources → Reduced source support → Gaps in analysis → Serious gaps in security

SIEM is rule-only based

This is a tough one.  Rules are useful, even required, but not sufficient.  Rules only catch the thing you express in them, the things you know to look for.   To be secure, you must be ahead of new threats.  A million monkeys writing rules in real-time: not possible.

Result: Your rules are stale → You hire a million monkeys → Monkeys eat all your bananas → You analyze only a subset of relevant events → Serious gap in security

SIEM is too complex

DuckTapeSIEM

It is way too hard to run these things.  I’ve had too many meetings and discussions with my former customers on how to keep the damned things running and too few meetings on how to get value out of the fancy features we provided.  In reality most customers get to use the 20% of features because the rest of the stuff is not reachable.  It is like putting your best tools on the shelf just out of reach.  You can see them, you could do oh so much with them, but you can’t really use them because they are out of reach.

Result: You spend a lot of money → Your team spends a lot of time running SIEM → They don’t succeed on leveraging the cool capabilities → Value is low → Gaps in analysis → Serious gaps in security   

So, what is an honest, forward-looking security professional who does not want to duct tape a solution to do?  What you need is what we just started: Sumo Logic Enterprise Security Analytics.  No, it is not absolute security, it is not checkbox security, but it is a more real security because it:

Scales

Processes terabytes of your data per day in real time. Evaluates rules regardless of data volume and does not restrict what you collect or analyze.  Furthermore, no SIEM style normalization, just add data, a pinch of savvy, a tablespoon of massively parallel compute, and voila.

Result: you add all relevant data → you analyze it all → you get better security 

Simple

It is SaaS, there are no appliances, there are no servers, there is no storage, there is just a browser connected to an elastic cloud.

Result: you don’t have to spend time on running it → you spend time on using it → you get more value → better analysis → better security

Machine Learning

SecurityAnomaliesRules, check.  What about that other unknown stuff?  Answer: machine that learns from data.  It detects patterns without human input.  It then figures out baselines and normal behavior across sources.  In real-time it compares new data to the baseline and notifies you when things are sideways.  Even if “things” are things you’ve NEVER even thought about and NOBODY in the universe has EVER written a single rule to detect.  Sumo Logic detects those too. 

Result: Skynet … nah, benevolent overlord, nah, not yet anyway.   New stuff happens → machines go to work → machines notify you → you provide feedback → machines learn and get smarter → bad things are detected → better security

Read more: Sumo Logic Enterprise Security Analytics

Vance Loiselle, CEO

Black Friday, Cyber Monday and Machine Data Intelligence

11.25.2013 | Posted by Vance Loiselle, CEO

The annual craze of getting up at 4am to either stand in line or shop online for the “best” holiday deals is upon us.  I know first-hand, because my daughter and I have participated in this ritual for the last four years (I know – what can I say – I grew up in Maine).  While we are at the stores fighting for product, many Americans will be either watching football, or surfing the web from the comfort of their couch looking for that too-good-to-be-true bargain.  And with data indicating a 50% jump in Black Friday and Cyber Monday deals this year, it’s incumbent on companies to ensure that user experiences are positive.  As a result, the leading companies are realizing the need to obtain visibility end-to-end across their applications and infrastructure, from the origin to the edge.  Insights from machine data (click-stream in the form of log data), generated from these environments, helps retailers of all stripes maximize these two critical days and the longer-term holiday shopping season.  

What are the critical user and application issues that CIOs should be thinking about in the context of these incredibly important shopping days?

  • User Behavior Insights. From an e-commerce perspective, companies can use log data to obtain detailed insights into how their customers are interacting with the application, what pages they visit, how long they stay, and the latency of specific transactions.  This helps companies, for example, correlate user behavior with the effectiveness of specific promotional strategies (coupons, etc) that allow them to rapidly make adjustments before the Holiday season ends.

  • The Elasticity of The Cloud.  If you’re going to have a problem, better it be one of “too much” rather than “too little”.  Too frequently, we hear of retail web sites going down during this critical time.  Why? The inability to handle peak demand – because often they don’t know what that demand will be.   Companies need to understand how to provision for the surge in customer interest on these prime shopping days that in turn deliver an exponential increase in the volume of log data.  The ability to provide the same level of performance at 2, 3 or even 10x usual volumes in a *cost-effective* fashion is a problem few companies have truly solved.  The ability of cloud-based architectures to easily load-balance and provision for customer surges at any time is critical to maintaining that ideal shopping experience while still delivering the operational insights needed to support customer SLAs.

  • Machine Learning for Machine Data. It’s difficult enough for companies to identify the root cause of an issue that they know something about.  Far more challenging for companies is getting insights into application issues that they know nothing about.  However, modern machine learning techniques provide enterprises with a way to proactively uncover the symptoms, all buried within the logs, that lead to these issues.  Moreover, machine learning eliminates the traditional requirement of users writing rules to identify anomalies, which by definition limit the ability to understand *all* the data.  We also believe that the best analytics combine machine learning with human knowledge about the data sets – what we call Machine Data Intelligence – and that helps companies quickly and proactively root out operational issues that limit revenue generation opportunities.

  • Security and Compliance Analytics.  With credit cards streaming across the internet in waves on this day, it’s imperative that you’ve already set up the necessary environment to both secure your site from fraudulent behavior and ensure your brand and reputation remain intact.  As I mentioned in a previous post, the notion of a perimeter has long since vanished which means companies need to understand that user interactions might occur across a variety of devices on a global basis.  The ability to proactively identify what is happening in real-time across your applications and the infrastructure on which they run is critical to your underlying security posture.  All this made possible by your logs and the insights they contain.  

Have a memorable shopping season and join me on twitter – @vanceloiselle – to continue the conversation.

Vance Loiselle, CEO

What CIOs (and I) Can Learn From Healthcare.gov

11.19.2013 | Posted by Vance Loiselle, CEO

There is little debate that the “Obamacare” rollout has been choppy at best.  Regardless of which side of the political debate you fall, many of us in technology, CIOs and software executives alike, can learn from this highly publicized initiative as we approach the November 30th deadline.  Development of new applications, especially web applications, is no longer just about the myopic focus on Design, Develop, Test, and Rollout.  The successful development and deployment of these applications must have a holistic, information-driven approach, which includes the following four key processes:

  1. Application Quality Analytics – the constant tracking of the errors, exceptions, and problems that are occurring in each new release of the code.
  2. Application Performance Analytics – the real-time measurement of the performance of the application as the users are experiencing it.
  3. Security Analytics – the real-time information required to analyze and conduct forensics on the security of the entire application and the infrastructure on which it runs.
  4. User Analytics – real-time insights on which users are in the application, what pages they are viewing, and the success they’ve had in conducting transactions in the application.

Application Quality Analytics – Is it really necessary that in the year 2013, that development of applications still need to be 4 parts art and 1 part science?  I’m sure that the Secretary of Health and Human Services, Kathleen Sibelius, wished it was more science when she testified in front of Congress about why the site was not ready.  She had no real-time information or metrics at her disposal about the number of defects that were being fixed each day, the number of errors being encountered by users, the severity of the errors, and the pace at which these errors and defects were being resolved. 

These metrics are sitting there in the log files (data exhaust from all applications and software components to track what the software is doing), and are largely untapped by most development organizations.   Moreover, this data could be shared between multiple teams to pinpoint the root cause of problems between the application itself and the network and infrastructure on which it is running.  It was so frustrating to see CGI (the contractor hired to build the application) and Verizon (the contractor hired to host the application in their “cloud”) passing the buck between each other in front of Congress.

Application Performance Management – Much has been made about the performance of Healthcare.gov.  The HHS secretary even had gall to say that the site had not crashed, it was just “performing slowly”, while in the congressional hearing there was a live image on the screen informing users that the site was down.  The site was down AND performing slowly because the site’s developers are stuck in a previous generation of thinking – that you can measure the site performance without taking into account user analytics.  It’s not good enough to measure application performance by sampling the transaction response times periodically.  Testers and managers need access to real-time information about each user, the session they were running, the performance at each step, and the outcomes (e.g. new plan sign-up created or failed, 5 insurance plans compared, pricing returned from 2 out of 3 carriers, etc.) along the way.  Most monitoring tools look at just the application or just the network and infrastructure it runs on, and have little to no visibility about the outcomes the user is experiencing.  Guess what?  Most, if not all, of this information is sitting in the logs waiting to be harnessed.

Security Analytics – I appreciated when Secretary Sibelius was asked about what steps had been taken to ensure the privacy and security of the data that fellow Americans had submitted on Healthcare.gov.  The reality is that most IT organizations have very bifurcated organizations to address security and application development.  The old school view is that you put a web application firewall in place and you close down the ports, and your perimeter is safe.  The reality today is that there is no perimeter.  People have mobile phones and tablets and use third-party services to store their documents.  Healthcare.gov itself is dependent on 3rd parties (insurance carriers) to provide and receive private information. 

The most effective way today to ensure some level of security is to have a real-time security analytics and forensics solution in place.  These solutions can scan every element of user and system activity, from – you guessed it – the log data, and determine everything from invalid logins and potential breaches to unauthorized changes to firewall rules and user permissions.     

User Analytics – Ok, I get it, the Obama administration did not want to share information about how many people had signed up on Healthcare.gov for weeks.  The argument was made that the accuracy of the data could not be trusted.  Either it was a political maneuver or incompetence, but either reason is sad in the year 2013.  And why do the White House and HHS have the right to keep this information a secret?  The American taxpayers are paying the hefty sum of $200M+ to get this application up and running.  Shouldn’t we know, in real-time, the traction and success of the Affordable Care Act?  It should be posted on the home page of the web site.  I guarantee the information about every enrollee, every signup – successful or failed – every county from which they logged in, every plan that was browsed, every price that was displayed, every carrier that’s providing quotes was AVAILABLE IN THE LOG DATA.

 There has been a lot of coverage about President Kennedy recently and we are reminded that he challenged our government to put people on the moon in the 1960s, and they did – with a very limited set of computer and software tools at their disposal.  I would ask the CIOs and software types out there, let’s learn from the Healthcare.gov rollout, and embrace a modern, information-driven approach to developing and rolling out applications.  And President Obama, if you need some help with this, give me a shout – I’m ready to serve.

Bruno Kurtic, Founding Vice President of Product and Strategy

Sumo Logic Application for AWS CloudTrail

11.13.2013 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

Cloud is opaque

One of the biggest adoption barriers of SaaS, PaaS, and IaaS is the opaqueness and lack of visibility into changes and activities that affect cloud infrastructure.  While running an on-premise infrastructure, you have the ability to audit activity ; for example, you can easily tell who is starting and stopping VMs in virtualization clusters, see who is creating and deleting users, and watch who is making firewall configuration changes. This lack of visibility has been one of the main roadblocks to adoption, even though the benefits have been compelling enough for many enterprises to adopt the Cloud.

This information is critical to securing infrastructure, applications, and data. It’s critical to proving and maintaining compliance, critical to understanding utilization and cost, and finally, it’s critical for maintaining excellence in operations.

Not all Clouds are opaque any longer

Today, the world’s biggest cloud provider, Amazon Web Services (AWS),  announced a new product that, in combination with Sumo Logic, changes the game for cloud infrastructure audit visibility.  AWS CloudTrail is the raw log data feed that will tell you exactly who is doing what, on which sets of infrastructure, at what time, from which IP addresses, and more.  Sumo Logic is integrated with AWS CloudTrail and collects this audit data in real-time and enables SOC and NOC style visibility and analytics.

Here are few examples of what AWS CloudTrail data contains:Network Access

  • Network acl changes.

  • Creation and deletion of network interfaces.

  • Authorized Ingress/Egress across network segments and ports.

  • Changes to privileges, passwords and user profiles.

  • Deletion and creation of security groups.

  • Starting and terminating instances.

  • And much more.

Sumo Logic Application for AWS CloudTrail

Cloud data comes to life with our Sumo Logic Application for AWS CloudTrail, helping our customers across security and compliance, operational visibility, and cost containment. Sumo Logic Application for AWS CloudTrail delivers:

User Activity

  • Seamless integration with AWS CloudTrail data feed.

  • SOC-style, real-time Dashboards in order to monitor access and activity.

  • Forensic analysis to understand the “who, what, when, where, and how” of  events and logs.

  • Alerts when important activities and events occur.

  • Correlation of AWS CloudTrail data with other security data sets, such as intrusion detection system data, operating system events, application data, and more.

This integration delivers improved security posture and better compliance with internal and external regulations that protect your brand.  It also improves operational analytics that can improve SLAs and customer satisfaction.  Finally, it provides deep visibility into the utilization of AWS resources that can help improve efficiency and reduce cost.

The integration is simple: AWS CloudTrail deposits data in near-real time into your S3 account,  and Sumo Logic collects it as soon as it is deposited using an S3 Source.  Sumo Logic also provides a set of pre-built Dashboards and searches to analyze the CloudTrail Data.

To learn more, click here for more details: http://www.sumologic.com/applications/aws-cloudtrail/ and read the documentation: https://support.sumologic.com/entries/30216746-Sumo-Logic-for-Amazon-CloudTrail-App.

Bruno Kurtic, Founding Vice President of Product and Strategy

Akamai and Sumo Logic integrate for real-time application insights!

10.09.2013 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

I’m very pleased to announce our strategic alliance with Akamai. Our integrated solution delivers a unified view of application availability, performance, security, and business analytics based on application log data.  Customers who rely on Akamai’s globally distributed infrastructure now can get the real-time feed of all logs generated by Akamai’s infrastructure into their Sumo Logic account in order to integrate and cross-analyze them with their internally generated application data sets!

What problems does the integrated solution solve?

To date, there have been two machine data sets generated by applications that leverage Akamai:

1. Application logs at the origin data centers, which application owners can usually access.

2. Logs generated by Akamai as an application is distributed globally. Application owners typically have zero or limited access to these logs.

Both of these data sets provide important metrics and insights for delivering highly-available, secure applications that also provide detailed view of business results. Until today there was no way to get these data sets into a single tool for real-time analysis, causing the following issues:

  • No single view of performance. While origin performance could be monitored, but that provides little confidence that the app is performant for end users.
  • Difficult to understand user interaction. Without data on how real users interact with an application, it was difficult to gauge how users interacted with the app, what content was served, and ultimately how the app performed for those users (and if performance had any impact on conversions).
  • Issues impacting customer experience remained hidden. The root cause of end-user issues  caused at the origin remained hidden, impacting customer experience for long periods of time.
  • Web App Firewall (WAF) security information not readily available. Security teams were not able to detect and respond to attacks in real-time and take defensive actions to minimize exposure.

The solution!

Quality of Service

Akamai Cloud Monitor and Sumo Logic provide an integrated approach to solving these problems. Sumo Logic has developed an application specifically crafted for customers to extract insights from their Akamai data, which is sent to Sumo Logic in real time.  The solution has been deployed by joint customers (at terabyte scale) to address the following use cases:

  • Real-time analytics about user behavior.  Combine Akamai real-user monitoring data and internal data sets to gain granular insights into user behavior. For example, learn how users behave across different device types, geographies, or even how Akamai quality of service impacts user behavior and business results.

  • AttacksSecurity information management and forensics. Security incidents and attacks on an application can be investigated by deep-diving into sessions, IP addresses, and individual URLs that attackers are attempting to exploit and breach.

  • Application performance management from edge to origin. Quickly determine if an application’s performance issue is caused by your origin or by Akamai’s infrastructure, and which regions, user agents, or devices are impacted.

  • Application release and quality management. Receive an alert as soon as Akamai detects that one or more origins have an elevated number of 4xx or 5xx errors that may be caused by new code push, configuration change, or another issue within your origin application infrastructure.

  • Impact of quality of service and operational excellence. Correlate how quality of service impacts conversions or other business metrics to optimize performance and drive better results

I could go on, but I’m sure you have plenty of ideas of your own.

Join us for a free trial here – as always, there is nothing to install, nothing to manage, nothing to run – we do it all for you.  You can also read our announcement here or read more about the Sumo Logic application for Akamai here.  Take a look at the Akamai press release here.

Twitter