Pricing Login
Pricing
Support
Demo
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

November 7, 2018 By Tanay Jha

Near Real-Time Log Collection From Amazon S3 Storage

We are very excited to announce a new capability for our Amazon S3 sources.

Until recently, the only method Sumo Logic used for discovering new data in an S3 bucket was periodic polling. However, with our new notification-based approach, users can now configure S3 sources such that Sumo Logic is notified immediately (via AWS SNS) whenever a new item is added to an S3 bucket, eliminating the need to wait for new objects to be discovered via periodic polling.

This capability is available today for all S3 sources, either new or pre-existing.

Start free trial

More than 1,600 enterprises around the world rely on Sumo Logic to operate and secure their applications at cloud scale.

Benefits of Notification-Based Approach

Polling is still an effective approach in many cases. If one’s bucket size is not very large, or the bucket is regularly groomed, polling could still be the right choice. It is easier to set up, as you don’t need to configure SNS notifications. For very large buckets, however, a polling-based approach introduces significant lag. With the new notification-based approach, users get near-real time data and better reliability.

Near Real-time Data

With SNS notifications powering object discovery, Sumo Logic is always notified of new S3 objects within seconds, regardless of how many other objects are present in the bucket. This guarantees near real-time data collection from S3 buckets of any size. So with this approach, data from S3 buckets can be reliably used for real-time alerting.

Best of Both Worlds

Sumo Logic can now offer a unique hybrid approach combining SNS notifications and a poller, thus providing the best of both worlds — speed and a data reliability guarantee.

This places us ahead of our competitors who are using just one of these methods. Maintaining a polling system as a backup provides two advantages:

  • Capability to discover and collect historical data already present in an S3 bucket.
  • Guarantee of 100 percent data discovery even in the event of a service disruption or missing SNS notifications. The polling system works as a backup in this case.

Architecture

The above diagram explains how the notification based approach works. The workflow can be summarized as: S3 bucket -> SNS topic -> SNS subscription -> Sumo Logic source.

An easy way to create and configure these AWS resources is to download the CloudFormation template provided while creating the source in Sumo Logic. The template can then be uploaded in your AWS console thereby completing the majority of the setup.

Setup

The notification-based discovery approach is easy to set up. The steps can be found in our documentation page.

Although SNS based discovery is an opt-in capability, we recommend using it for all your S3 sources. It is equally easy to set it up for your existing sources. A video showing the configuration steps can be found below:

Results

We used this feature internally to collect logs from our own production S3 buckets. We turned this feature on for the buckets containing ELB logs from various Sumo Logic deployments. After switching to notification-based discovery, we saw a drop in ingest delay from 8 minutes to 2 minutes. This drop in log collection latency has improved our agility when responding to production alerts and incidents. Our SRE teams are now able to identify issues more quickly and respond to them more proactively thanks to the reduction in delay.

We have heard the same story from our early adopting customers. During our customer beta, we saw a similar reduction in S3 ingest lag for those customers. In one particular case where the customer had a very large S3 bucket with hundreds of objects being added every second, we were able to reduce ingest lag from 7 hours down to couple of minutes.

Here is a graph showing the improvement in object discovery time for the above-mentioned customer:

Since making the new notification-based approach generally available for our customers, we’ve been seeing significant adoption. Want to try it out on your S3 sources? Log into the Sumo Logic platform today, and follow these steps.

If you don’t yet have a Sumo Logic account, you can sign up for a free trial today.

Additional Resources

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Tanay Jha

Tanay Jha is a software engineer at Sumo Logic, working with the data collection team to develop a reliable, scalable and efficient method to ingest machine data into the Sumo Logic platform. His interest lies in large scale distributed systems.

More posts by Tanay Jha.

People who read this also enjoyed