Sign up for a live Kubernetes or DevSecOps demo

Click here


Learn how to get started with Kubernetes including how to monitor and manage your clusters, view your Kubernetes logs, and how to improve your Kubernetes security

Monitoring Kubernetes Clusters

Now that we know what goes into monitoring Kubernetes, let's discuss how to monitor all of the pieces. In order to simplify your Kubernetes monitoring strategy, it's helpful to break monitoring operations down into different parts, each focused on a different "layer" of your Kubernetes environment or part of the overall workload (clusters, pods, applications and the end-user experience).

Monitoring Kubernetes Clusters

The highest-level component of Kubernetes is the cluster. As noted above, in most cases each Kubernetes installation consists of only one cluster. Thus, by monitoring the cluster, you gain an across-the-board view of the overall health of all of the nodes, pods and applications that form your cluster. (If you use federation to maintain multiple clusters, you'll have to monitor each cluster separately. But that is not very difficult, especially because it would be very rare to have more than two or three clusters per organization.)

Specific areas to monitor at the cluster level include:

  • Cluster usage: Which portion of your cluster infrastructure is currently in use? Cluster usage lets you know when it's time to add more nodes so that you don't run out of resources to power your workloads. Or, if your cluster is significantly under-utilized, tracking cluster usage will help you know it's time to scale down so that you're not paying for more infrastructure than you need.
  • Node consumption: You should also track the load on each node. If some nodes are experiencing much more usage than others, you may want to rebalance the distribution of workloads using DaemsonSets.
  • Failed pods: Pods are destroyed naturally as part of normal operations. But if there is a pod that you think should be running but is not active anywhere on your cluster, that's an important issue to investigate.

Monitoring Kubernetes Pods

While cluster monitoring provides a high-level overview of your Kubernetes environment, you should also collect monitoring data from individual pods. This data will provide much deeper insight into the health of pods (and the workloads that they host) than you can glean by simply identifying whether or not pods are running at the cluster level.

When monitoring pods, you'll want to focus on:

  • Pod deployment patterns: Monitoring how pods are being deployed – which nodes they are running on and how resources are distributed to them – helps identify bottlenecks or misconfigurations that could compromise the high availability of pods.
  • Total pod instances: You want enough instances of a pod to ensure high availability, but not so many that you waste hosting resources running more pod instances than you need.
  • Expected vs. actual pod instances: You should also monitor how many instances for each pod are actually running, and compare it to how many you expected to be running. If you find that these numbers are frequently different, it could be a sign that your ReplicaSets are misconfigured and/or that your cluster does not have enough resources to achieve the desired state regarding pod instances.

Monitoring Applications Running in Kubernetes

Although applications are not a specifically defined component within Kubernetes, hosting applications is a reason why you ultimately use Kubernetes in the first place. Thus, it's important to monitor the applications being hosted on your cluster (or clusters) by checking:

  • Application availability: Are your apps actually up and responding?
  • Application responsiveness: How long do your apps take to respond to requests? Do they struggle to maintain acceptable rates of performance under heavy load?
  • Transaction traces: If your apps are experiencing performance or availability problems, transaction traces can help to troubleshoot them.
  • Errors: When application errors occur, it's important to get to the bottom of them before they impact end-users.

Problems revealed by application monitoring in Kubernetes could be the result of an issue with your Kubernetes environment, or they could be rooted in your application code itself. Either way, you'll want to be sure to identify the problem so you can correct it

Monitoring End-user Experience when Running Kubernetes

Like applications, the end-user experience is not a technical part of the Kubernetes platform. But delivering a positive experience for end-users – meaning the people who use the applications hosted on Kubernetes – is a critical consideration for any successful Kubernetes strategy.

Toward that end, it's important to collect data that provides insight into the performance and usability of applications. We discussed some of this above in the context of monitoring for application responsiveness, which provides insight into performance. When it comes to assessing usability, performing both synthetic and real-user monitoring is critical for understanding how users are interacting with Kubernetes workloads and whether there are any adjustments you can make within Kubernetes (such as enhancing your application frontend) to improve usability.

Monitoring Kubernetes in a Cloud Environment

In addition to the various Kubernetes monitoring considerations described above, which apply to any type of Kubernetes environment, there are some special factors to weigh when you're running Kubernetes in the cloud.

In a cloud-based installation, you'll also need to monitor for:

  • Cloud APIs: Your cloud provider has its own APIs, which your Kubernetes installation will use to request resources.
  • IAM events: Monitoring for IAM activity, like logins or permissions changes, is important for staying on top of security in a cloud-based environment.
  • Cost: Cloud bills can get large quickly. Performing cost monitoring will help ensure you are not overspending on your cloud-based Kubernetes service.

Network performance: In the cloud, the network is often the biggest performance bottleneck for your applications. Monitoring the cloud network to ensure that it is moving data as quickly as you need, helps to safeguard against network-related performance issues.