The Sumo Logic Architecture

A Scalable, Service-Based Log Management Architecture

The basic architecture of Sumo Logic's cloud log management system revolves around the best practice of divide-and-conquer by composing many small and independent services in order to build a scalable, flexible log management platform. Major components of the Sumo Logic system, such as the log data intake and collection management facilities, the full-text indexing pipeline, and the interactive analytics platform, are encapsulated into a set of decoupled services. Each service employs a set of shared lower-level modules to make use of common functionality. Each service runs as an independent executable, and each service can scale independently of the others in the cloud based on demand and the specific CPU and I/O requirements of each service.

Messaging Affords Reliability and Redundancy

Individual services are not only physically separated in the Sumo Logic service, but are also further logically decoupled by the use of an ultra-fast messaging bus. The Sumo Logic system uses asynchronous messaging to decouple all senders and receivers within the system, while additional durability is afforded by the message bus. The message bus allows for elegant inter-service load-balancing internal to the Sumo Logic system, which gives the system elasticity along a lot of different dimensions. Message durability is important to be able to continue to operate the system in light of unavoidable partial failures in the infrastructure layer. In order to achieve absolute scalability, the Sumo Logic system is designed to scale on many layers.

Optimized for Throughput and Low Latency

Sumo Logic leverages Amazon S3 for log data persistence. The superior properties of S3 make it possible for Sumo Logic to offer durable, reliable retention of even large amounts of log data for extended periods of time at a very competitive cost. Since low latency is a major concern in real-time log processing, the Sumo Logic system is finely tuned all the way down to the collector to trade-off batch operations with minimal delays in order to achieve high throughput at low latencies. At the same time, customer log data is always tagged and the separation of different customer's data is always fiercely enforced during processing as well as at rest, ensuring reliable and secure log management to the end user.

Innovative Streaming Query Engine

All log data query processing in the Sumo Logic system, be it real-time or historically/forensics-oriented, is performed by the same underlying stream processing engine. Expressing and configuring real-time alerts is therefore no different than running interactive queries, blurring the lines between queries and alerts, and making it extremely straightforward to operationalize log data analytics. For Interactive Analytics, the stream engine makes it possible to evaluate queries in an incremental fashion, while pushing intermediate results immediately to the web-based UI. Users will see query results on their screens at the moment the query has started; results are refined and extended as the query completes, delivering a superior log management experience.