2022 Gartner® Magic Quadrant™ SIEM
Get the reportMore
Here at Sumo Logic, we share a lot of thoughts about managing data at scale, and the innovative ways we help customers address their unique use cases. It’s not just about analysis of logs.
In this article, I will talk about another important observability signal: distributed traces. I will share a few observations about how we at Sumo think about the future of adoption of distributed traces, a very important concept, taking from our own experience.
We believe that an application’s observability gains a lot from the fact that telemetry signals are designed, composed, and produced by an application developer/vendor in compliance with industry standards, and are not a proprietary, black box component of the monitoring vendor.
Log analytics was definitely a cradle for the Continuous Intelligence market Sumo Logic continues to drive, lead, and innovate--and that is for a very simple reason: it is really difficult to overestimate how useful and content-rich log data is. Logs can contain messages, geographic coordinates, metric values--anything you can turn into text.
It is my belief that the value of analyzing such data for the end user is very high, mainly due to the fact that it's the application development teams who produce the log content, own it, and have full control over the information that lands there, with the end goal of providing the right answers based on analytics of such data on a massive scale.
There's a plethora of free frameworks and standards that you can include in your applications that can produce logs. Almost all applications today produce logs out of the box, allowing developers to modify, amend, and add information to them--and it has worked like that for a long time.
You just need help to collect the logs and analyze them in a scalable, efficient, easy, and cost-effective way. In other words: You don't need anyone to provide you a proprietary, expensive agent to produce logs, right?
When I talk to our customers that are used to this easy and flexible paradigm, I often get asked how they can extend that approach to tracing data. Distributed tracing, for those who are new to it, is a way to track the execution of a transaction across distributed application tiers, particularly insightful in a microservices-based environment. As you may know, to achieve full visibility into the health of such applications, it is recommended to rely on log, metrics, and tracing data.
Most of the time, when asked about tracing, the default answer from established market vendors would be, “You need to buy an agent from us.” These vendors don't have agents that are required to produce the log type of telemetry, but for some reason tracing data is treated in a different way. Vendors will sell you an agent. They will charge you money for the generation of tracing data in a proprietary format. They won't allow you to customize the data, and, since you don't own the data, you don't even know what’s inside! Established vendors will also be slow and reactive with supporting new application frameworks, slowing your innovation.
Just think - when you deploy a new service built on a new language, on day 1 you can produce logs from it and include anything you need to diagnose it in there. Why wouldn't that be the case for traces?
Surely that area of observability was for years neglected by software framework vendors, but that is changing fast. Traditionally, their main audience was developers whose only task was to produce and diagnose code. Logs were enough to achieve that goal.
With the DevOps revolution, that changed. Devops and Site Reliability Engineering (SRE) teams are also responsible for running the code in production, and that requires much wider end-to-end visibility, which often becomes part of initial non-functional requirements when designing an app. After all, observability is not a product that collects logs, metrics, and traces. It is a model in which the people responsible for an application’s reliability can get the information they need to monitor and troubleshoot its health. Therefore, every new service or system you build should produce telemetry in the form of logs, metrics, and traces in order to become observable. Modern software framework vendors that help you build these systems recognize this requirement fully (take a look at service meshes!) and provide such telemetry out of the box, and it is reasonable to expect that much more will come.
According to Gartner (source: MQ for APM 2020) : “By 2025, 50% of new cloud-native application monitoring will use open-source instrumentation instead of vendor-specific agents for improved interoperability, up from 5% in 2019.”
Obviously, you also need a backend to analyze that data, so the format of telemetry data should not be proprietary. It needs to be built on industry-wide, vendor-neutral standards. That way, you have the liberty to choose the analytics platform that is right for your needs, and also avoid vendor lock-in that requires you to rip-and-replace the whole observability client end instrumentation when you want to switch the backend.
There were historically a few competing ways of achieving this goal. Fortunately, the OpenTelemetry project recently appeared giving hope for a single industry-wide standard for observability across application frameworks.
We in Sumo Logic fully support and contribute to this project, and we encourage our customers to include full observability from day 1 of their apps’ lifetimes, using open standards that allow them to use free ways of producing telemetry data, without locking themselves to any vendor.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.Start free trial
Observability has become one of the most important areas of your application and infrastructure landscape, and the market has an abundance of tools available that seem to do what you need. In reality, however, most products – especially leading open-source based products – were created to solve a single problem extremely well, and have added additional supporting functionality to become a more robust solution; but the non-core functionality is rarely best of breed. Examples of these are Prometheus and Grafana.