Pricing Login
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

January 13, 2021 By Frank Reno

Embracing open source data collection

Open source has come a long way. One of my favorite reports on the subject is Red Hat’s State of Enterprise Open Source. For 2020, 95% of respondents said that open source is strategically important to their business needs.

Here, I will be recapping my recent Illuminate presentation about embracing open source data collection and I thought it’s important to first talk about how open source has changed.

Take the open-source container-orchestration system Kubernetes--they’ve been wildly successful, building massive communities, and changing the way open source works in enterprise software.

At Sumo, we just passed our tenth year, and we’re migrating away from a lot of homegrown tooling into open source. When we started, there were tools that just didn’t exist that we had to create ourselves. For example, we built a homegrown tool called DSH for cloud provisioning, and now we’re moving from that to Terraform as the default choice open source solution. We’re also moving from AWS EC2 to Kubernetes, from JMX Trans to Prometheus, from our own Installed Collectors to open source technologies like Fluentd.

We’ve made it part of our work to help customers move to open source not only for provisioning and moving into different architectures, but for data collection as well.

Benefits of open source data collection

There are a number of key benefits in moving to open source data collection.

  1. Building on Open Standards

    The top benefit is the fact that you’re building on open standards. There are a number of open standards for metrics--OpenMetrics, OpenTracing, OpenAPM, Carbon, Graphite--that are out there to unify some things. Many agents used for collection all support one of these open standards.

  2. Comprehensive Data Collection

    Building on open standards gives you a very comprehensive set of data to collect from. Many open source agents have been built and expanded upon over the years to grow what they can integrate with--so you have the ability to collect all data without having to reinvent or build integrations from scratch. If you do have to build, oftentimes open source exposes you with a pluggable framework that you can simply just leverage and use to get started very quickly.

  3. Flexibility

    You can pour your focus into your business and not data collection. Data collection is not an easy task and being able to do so quickly on open standards gives you a lot of flexibility.
  4. Gives you choice

    Open source data collection gives you the power of choice. As long as the tooling embraces open standards, then the tooling doesn't matter.

Open source collection agent landscape

When it comes to the open source landscape, there's a number of open source agents out there. This table shows you the options available depending on what you’re collecting.

When it comes to agent collection for tracing, OpenTelemetry is the only one out there. Of course you have Jaeger and Zipkin, among others, but OpenTelemetry sees the value of machine data, be it logs, metrics, or traces. OpenTelemetry recognizes that customers need the ability to collect all of their machine data--and it is the only universal data collector out there that is 100% fully open source and maintained solely in the community.

I think that this is a really interesting approach to things because it's a number of vendors coming together around a concept of OpenTelemetry for data collection, and, in a way, democratizing that.

It all goes back to my benefit point for open source--that it gives you choice. You don't have vendor lock-in and you're able to send all your data to anywhere.

Now, let's take a look at some examples of how you can use open source agents to get data into Sumo Logic. And the first one I want to start with is Fluentd.


We've actually had an integration with Fluentd for a number of years now. Like most open source agents, Fluentd is built on the concept of plugin driven architecture. You have input plugins that get data into the agents such as Fluentd, and output plugins that get data into the destination you want to send it to. Input plugins can be log files, database logs, systems logs--anything you really want it to be, because open source plugins are integrated with a number of technologies.

Once you have data in Fluentd, you can get it wherever you need it to as long as there’s an output plugin. For example, we have a Sumo Logic output plugin, and that's how we get data from Fluentd into Sumo Logic. Anything that can go into Fluentd through its vast integrations of input plugins can all come to Sumo Logic now.


Telegraf is an open source metrics collector. It collects metrics data primarily, it is built on the same plug-in logic as Fluentd.

Telegraf is really the first metrics agent that we were actually adding as far as full-on open source support, so now, users who use Telegraf can very easily get anything that comes into Telegraf into Sumo Logic.

Telegraf supports a number of metrics sources, including databases, operating systems, and system stats. It also supports networking layer or message cues, and even very specific applications like NGINX or JMX, and Patchy. And as part of this, we've actually updated some of our own integrations, expanding what we can collect from by having apps that not only collect logs for components, but now adding metrics for things.

During the conference, we also announced that we have an updated NGINX app that now supports logs and metrics. And we also have an updated Redis app that supports logs and metrics. And we're also adding a new app for JMX integration to enable collecting JMX integration metrics out of the box. We have an app that can help you visualize all that content and provides you a great out-of-the-box experience with low time to value.

Kubernetes collection process

Last year, we released our Kubernetes collection process that’s completely open source, built on a lot of these technologies that I've discussed so far. There's a number of components in play.

We use Fluent Bit for log collection runs as a DaemonSet on the cluster. It's responsible for gathering all of the logs that are running in the cluster. We also run Fluentd, serving as the enrichment point. And it's actually the central point that logs and metrics also flow through to give us the consistent unified metadata across the system. It collects the events that happen in the Kubernetes cluster as well. And for metrics, we leverage Prometheus to get the data from Prometheus into Fluentd and, therefore, into Sumo.

Now, what we've done is we've actually added and incorporated OpenTelemetry, specifically for tracing. Now we have full visibility into every component in Kubernetes and every data stream that's important to watch and would be most crucial to watch. All of this gets packaged up as a Helm chart and is as easy to install in Sumo Logic as Helm installs Sumo Logic.

We can see the benefits I’ve discussed earlier in this kind of architecture.

  • Comprehensive data collection for all observability telemetry
  • Leverages open source Kubernetes stack
  • Native, metadata-based K8s hierarchy navigation
  • Deep integration across logs, metrics, traces and metadata
  • Efficiently manages ephemeral container time-series
  • Sumo Logic out-of-the-box advanced analytics with dependency mapping and correlation

Putting all of these things together, you get a really great observability experience. It's seamlessly integrated across your infrastructure, across your applications, across all of the data streams at Sumo, and really focusing on leveraging that entity-driven troubleshooting approach to drive from signal to root cause as fast as possible.

Open source requires investment

The one really important consideration is that open source still requires an investment. You still have an agent that you need to go learn. You still have an agent or a framework that you need to go deploy.

It's not all for nothing. When you select the right agent that suits your needs and is also built on open standards that most vendors support, then you're able to make that a truly one-time investment. You don't have to spend time ripping everything out if you decide to change vendors.

Embracing open source data collection

Open Source expands what you can collect and gives you flexibility. In my presentation, I left three key things to think over.

  1. Know that collection built on open standards makes it easy to pivot to any tool.
  2. Don’t reinvent the wheel of getting the data you need into a system.
  3. Consider if you can make the investment for the benefits of open source.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Frank Reno

Frank Reno

Principal Product Manager

Frank Reno is a Principal Product Manager at Sumo Logic, where he leads Product for Data Collection. He also serves as Sumo Logic's Open Source Ambassador co-leading all efforts around Open Source. He is also an active contributor to Sumo Logic's open source solutions and the general open source community.

More posts by Frank Reno.

People who read this also enjoyed