Sumo Logic ahead of the packRead article
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
Monitoring Redis, the open source in-memory data platform, is complicated enough when you are hosting your Redis instance on just a single server. It gets even more complex when you build a Redis cluster that consists of multiple nodes and distribute your data across them.
But as long as you know which metrics to prioritize and how to collect them, Redis monitoring is feasible enough. This article offers an overview of how to monitor the state of Redis database clusters. We’ll discuss which metrics to prioritize and how to work with them in a multi-node Redis cluster.
Redis is an open source platform for storing data. It’s often referred to as a database because it offers NoSQL-style database functionality, but Redis is technically more than a mere database. It can also operate as a cache and message broker.
Redis also stands apart from most other databases because it stores data both in memory and on disk at the same time. In-memory storage enables high throughput (because data can be read and written from memory much faster than from disk), while storage on disk ensures that the data can be retrieved and restored in the event that the system shuts down.
These features make Redis one of the most popular NoSQL-style data storage platforms.
You can set up a Redis installation in two ways. The simplest is to host all of your data on a single server.
The second is to build a cluster and distribute data across it. This feature, which Redis introduced in 2015, can increase the performance and reliability of Redis by allowing applications to read and write data from multiple servers, eliminating the risk of bottlenecks or failures caused by problems with a single server.
Of course, the ability of a Redis cluster to deliver fully on the performance benefits it can potentially offer depends on how healthy the cluster is. And maintaining cluster health requires continuous monitoring of what is happening within the cluster.
For this purpose, there are four main metrics to track in order to understand the state of the overall cluster.
By tracking the total CPU usage of all nodes within the cluster, admins can identify situations where node resources are being maxed out. Events like this could mean that the cluster’s capacity is close to being exhausted and more nodes should be added. Or, it could signal a problem with an application that is placing unnecessary demand on the cluster.
Monitoring CPU usage in Redis clusters also helps identify situations where the cluster is over-provisioned, meaning it has more nodes than necessary. In that case, admins can shut down some nodes to save money.
The total memory consumption of the cluster is another key metric for identifying under- or over-provisioned clusters: if total memory usage is too high or low, nodes may need to be added or removed from the cluster.
Memory usage is also particularly important to monitor in the case of Redis because Redis stores data in memory. Thus, if your cluster runs low on memory, not only will it take longer for nodes to respond to requests from applications, but the data store itself may run out of space, making it impossible for applications to continue writing data.
Likewise, if you run out of disk space, you won’t be able to continue writing data to your Redis cluster until more space is added or freed up. That’s because, as noted above, Redis stores data on disk as well as in memory.
Redis reports metrics about the total average load placed on the cluster. The load average, which is reported across different time intervals, reflects how many processes the cluster is handling. A spike in load average could simply mean a sudden increase in application requests, but it could also be a sign of a problem such as a buggy application that is issuing redundant or unnecessary requests.
The metrics discussed above are the most important to monitor in order to gain a high-level overview of the health of your cluster. However, you can track a range of other metrics in a Redis cluster, too, such as the total number of clients that are connected, latency rates, and much more. You may need to look at this additional data in order to drill down into specific performance issues that you identify based on the high-level metrics discussed above.
There are multiple ways to get metrics data from a Redis cluster.
One is to run the info command in a Redis shell (which you can open by running redis-cli in your terminal). Graphical management frontends for Redis, such as Redsmin and Redis Commander, also provide some monitoring functionality.
Those tools are useful if you need a one-off look at the most recent metrics data. For comprehensive Redis monitoring, however, you will be better served by a tool that automatically collects metrics and logs from across your Redis cluster and aggregates them in a central location where you can analyze them alongside other data sources, such as application logs.
In addition to being much more efficient than collecting Redis metrics manually, this approach makes it possible to correlate data from Redis with other data sources. That’s critical if, for example, you are working to understand the relationship between a performance issue in Redis and a performance issue in your application. On top of this, the aggregation of Redis logs into a central location also makes it easy to retain and rotate Redis log data based on whichever timeline works best for you.
To learn more about how Sumo Logic provides automated, centralized log analysis and management for Redis, read about the Sumo Logic Redis ULM.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.Start free trial
Moving to the cloud offers more than economics; it comes with unique security challenges that on-premises solutions cannot address. In minutes, Cloud Infrastructure Security for AWS from Sumo Logic brings cloud-native security analytics to AWS cloud environments. Curated workflows, out-of-the-box dashboards and AI-driven anomaly detection help security personnel easily monitor cloud security posture and cloud configurations and manage cloud risk from a centralized platform.
In a perfect world, computers would function properly on the network at all times. There would be no issues with the operating system and no problems with the applications. Unfortunately, this isn’t a perfect world. System failures can and will occur, and when they do, it is the responsibility of system administrators to diagnose and resolve the issues. But where can system administrators begin the search for solutions when problems arise? The answer is Windows event logs.