Pricing Login
Pricing
Support
Demo
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

December 14, 2021 By Michael Baldani | Rishi Divate | Himanshu Pal

Host and process metrics - monitoring beyond apps

Consumers and users of applications expect near 100% availability and reliability to work, transact, collaborate, etc. There's a lot of talk about monitoring the performance of the application itself, but what about the underlying systems and components supporting the app, and in particular the infrastructure it sits on? If any piece of this stack fails, it can negatively impact the user experience, and in turn, your business.

It's critical to have full-stack observability, including the infrastructure your application is running on. This is why having granular visibility with proactive notifications on the performance and resource utilization of the hosts and processes running your production applications is also important. Real-time alerts equip your team to make the necessary changes before an end-user experiences an issue.

Monitoring host and process metrics is key

Sumo Logic's new Host and Process Metrics capability provides visibility across hosts and processes, in one place, to measure and manage compute, memory, storage, and network resource utilization for both hosts and processes that run on them, so you can:

  • Ensure the infrastructure is functioning reliably

  • Provide the best possible end-user experience

  • Quickly troubleshoot issues to reduce MTTI and MTTR

  • Optimize infrastructure and maintenance costs

How does it work?

Collection

The new Host and Process Metrics app supports the collection of host and process metrics telemetry data from Windows and Linux hosts, physical or virtual running in multiple hybrid environments.

This app uses Telegraf for the collection of metrics from your hosts. Telegraf is an open-source data collection agent and uses built-in input plugins to fetch metrics from hosts and software applications. We use a variety of input plugins to collect CPU, memory, disk, network, and process metrics from hosts. The Sumo Logic output plugin sends collected metrics to Sumo Logic.

Host and Process Metrics - Collection

For more information on how collection works, see:

Collect Metrics for Host and Processes

Using the Sumo Logic app

Once data collection has been set up, the next step is to analyze it with dashboards and set up alerts to get notified when critical conditions occur.

Alerts for host and process metrics

Pre-packaged alerts enable you to get proactively notified when critical conditions occur on your hosts. These alerts are based on Sumo Logic Monitors, which allow you to set robust and configurable alerting policies that notify you about critical conditions in your application infrastructure that could adversely affect your production applications and customer experience.

Monitors for host and process metrics include preset thresholds for high CPU/memory/filesystem/swap utilization, network errors, unusual network throughput, page faults, and open file descriptors. For a complete list, see Host and Process Metrics Alerts.

Monitoring hosts

While running your applications in production, it's critical you monitor all your hosts across various dimensions. The Host Metrics - Overview dashboard gives you exactly this with an at-a-glance view of key metrics like CPU, memory, disk, network, and TCP connections across all your hosts. You can use this dashboard to quickly identify hosts with high CPU, disk, memory utilization, and identify anomalies over time.

Host and Process Metrics 2

For more detailed investigations, we have dashboards for analyzing disk, memory, network, and TCP connections.

You can drill down from this Host Metrics Overview dashboard to any of the detailed dashboards by using the honeycombs or line charts in all the panels.

Host Metrics Overview - Drill down to Memory

You can also use each of the host dashboards to also filter by individual hosts you want to monitor.

Host Metrics - Memory

Monitoring processes

Once you've established the overall resource utilization on a host, it's essential to understand what processes are causing spikes.

To do so, we have the Process Metrics Overview dashboard that gives you a view of all the top processes by open file descriptors, CPU usage, memory usage, disk read/write operations, and thread count. You can also use this dashboard to identify the longest-running processes and users that have spawned the most number of processes.

Service Sumologic

You can drill down from this dashboard to the Process Metrics - Details dashboard by using the honeycombs or line charts in all the panels.

Process Metrics Overview Drilldown

The Process Metrics - Details dashboard can give you a detailed view of key process-related metrics such as CPU and memory utilization, disk read/write throughput, and major/minor page faults. You can also use this dashboard to:

  • Determine the number of open file descriptors across processes since if the number of open file descriptors exceeds maximum file descriptor limits, your applications will get IOException errors

  • Identify anomalies in CPU usage, memory usage, major/minor page faults and reads/writes over time

  • Troubleshoot memory leaks using the resident set memory trend chart

Process Metrics - Details


For a complete list of Host and Process Metrics dashboards see Host and Process Metrics Dashboards.

Related apps

This new Host and Process Metrics app can also be used in conjunction with other Sumo Logic Apps:

  • The Linux app allows you to view information about events, logins, and the security status of your Linux hosts. The app consists of predefined searches and dashboards that provide visibility into your environment for real-time or historical analysis.

  • The Windows app provides insight into the operations of your Windows hosts and consists of predefined searches and dashboards that provide visibility into security status, system activity, OS updates, user activity, and application installation activity.

Summary

In conclusion, this new Sumo Logic app for host and process metrics can help you comprehensively monitor your critical application infrastructure that identifies key service level indicators (SLI) and reduce MTTI and MTTR, which further help you achieve your Service Level Objectives (SLO).

Get started now!

To get started, check out the following documentation for the new Host and Process Metrics App.

If you don't yet have a Sumo Logic account, you can sign up for a free trial today.

Additional resources

For more great DevOps/Observability and security-focused reads, check out the Sumo Logic blog.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Michael Baldani | Rishi Divate | Himanshu Pal

Product Marketing Manager | Principal Technical Product Manager | Senior Software Engineer

More posts by Michael Baldani | Rishi Divate | Himanshu Pal.

People who read this also enjoyed