Why prometheus isn’t enough to monitor complex environments

Blog

Why Prometheus isn’t enough to monitor complex environments

Table of contents

FAQs

Not always. The accuracy of the analysis depends on data quality, the expertise of the individuals conducting the analysis and the thoroughness of the investigation process. Log data is at the atomic level of data, making it the most helpful and accurate for root cause analysis.

A Kubernetes workload can have many problems and modern application monitoring tools must pinpoint which combination of a pod and node is having issues. Then, drill into the associated container logs to identify the root cause of the issue. Ideally, Kubernetes infrastructure failures should be visualized in a monitoring tool that can capture container metrics, node metrics, resource metrics, Kubernetes cluster logs and trace data in histograms and charts.

Legacy monitoring solutions impose a server-based solution on a microservices problem. Your team wastes precious minutes correlating serious customer and security issues with infrastructure problems at the pod, container and node levels. Sumo Logic has turned this model on its head.

With Sumo Logic you can view your Kubernetes environment in the form of logs, metrics and events in various hierarchies, allowing you to view your cluster through the lens of your choice. For example, we can use native Kubernetes metadata like a namespace to visualize the performance of all pods associated with a namespace.

There are many critical metrics for monitoring Kubernetes clusters. Monitoring occurs at two levels: cluster and pod. Cluster monitoring tracks the health of an entire Kubernetes cluster to verify if nodes function properly and at the right capacity, and how many applications run on a node and how the cluster utilizes resources. Pod monitoring tracks issues affecting individual pod metrics, like resource utilization, application and pod replication or autoscaling metrics.

At the cluster level, you want to measure how many nodes are available and healthy to determine the cloud resources you need to run the cluster. You also need to measure which computing resources your nodes use—including memory, CPU, bandwidth and disk utilization––to know if you should decrease or increase the size or number of nodes in a cluster.

At the pod level, there are three key metrics:

Container: network, CPU and memory usage

Application: specific to the application and related to its business logic

Pod health and availability: how the orchestrator handles a specific pod, health checks, network data and on-progress deployment.

David Girvin
Lead Technical Advocate
David Girvin is a Technical Advocate at Sumo Logic, facilitating technical accuracy in the cloud of marketing. Previously, he was an AppSec / offensive security architect for places like 1Password and Red Canary. When not working, David travels to surf destinations for surfing and foiling.