Amazon Elastic Kubernetes Service, or EKS, is a managed Kubernetes service. That means that Amazon Web Services (AWS) handles some of the deployment and management tasks for users. But the fact that EKS is a managed service doesn’t mean that AWS manages all administrative tasks.
One key management task that isn’t fully covered as part of EKS is monitoring. Although AWS provides some tools to help collect metrics from EKS, responsibility for analyzing those metrics and solving problems that they may reveal lies primarily with users.
That’s why developing an EKS monitoring strategy is a prerequisite for deploying EKS effectively. This article explains how to monitor EKS, including which metrics to collect and which monitoring tools are available (including within the AWS cloud and from external sources) to assist with EKS monitoring.
Defining AWS EKS
EKS is a managed Kubernetes service in the AWS cloud.
Compared to generic Kubernetes – meaning a Kubernetes cluster that you set up and manage yourself – EKS offers key benefits such as automated infrastructure provisioning, autoscaling, and a simplified process for upgrading to a newer Kubernetes release. EKS also integrates with various AWS services, like CloudWatch and Identity and Access Management (IAM), to address some administrative tasks.
Through features like these, EKS provides a simpler Kubernetes experience than you would typically get by setting up Kubernetes yourself. Alongside similar managed Kubernetes services, like Azure AKS and Google Cloud GKE, EKS lowers the barrier to entry for Kubernetes.
What EKS does do and doesn’t do
Although EKS simplifies Kubernetes deployment, EKS is hardly a set-it-and-forget-it solution.
AWS handles most of the management tasks related to the EKS host infrastructure but it doesn’t proactively monitor or secure your clusters for you. It gives you some tools to help monitor and secure EKS but performing these processes is a task left up to users.
What EKS does do
AWS works to ensure that the servers that function as nodes in your Kubernetes cluster remain stable and available.
What EKS doesn’t do
EKS won’t monitor containers that you deploy in Kubernetes to detect performance issues, for instance.
Nor will it monitor Kubernetes audit logs to alert you to potential security problems.
EKS logging and monitoring: a piece-by-piece approach
Because EKS (like any distribution of Kubernetes) is a complex system that consists of many different parts, the easiest way to approach EKS monitoring is to focus on monitoring the various parts of Kubernetes. Let’s do that by walking through the main components of an EKS cluster and explaining which data to collect to monitor them.
Nodes are the servers that form a Kubernetes cluster. They come in two forms: worker nodes, which host running containers, and master nodes, which host the Kubernetes control plane software. Problems affecting master nodes are generally more serious because a master node failure could bring down your entire cluster, but monitoring worker nodes is critical, too, given that your applications will cease to work well if you fail to provision sufficient resources.
As we’ve noted, EKS automates some aspects of node infrastructure management. With EKS, you don’t need to worry about actually setting up and provisioning virtual machines or bare-metal servers to create nodes.
However, you do need to monitor for and address performance issues related to nodes, such as node CPU and memory usage. These problems could arise from trying to deploy more containers than your nodes can handle or from a problem with EKS cluster autoscaling.
Amazon CloudWatch’s Container Insights feature allows you to collect key metrics related to node performance, including:
Node_cpu_usage_total: The total CPU units available in your cluster.
Node_cpu_utilization: The total percentage of CPU units actively used.
Node_filesystem_utilization: Total file system capacity being used by nodes.
Node_memory_utilization: Total node memory utilization.
CloudWatch can track other node-related metrics, too (see the documentation for a full list of EKS metrics), but these are the most important node metrics to start with.
In EKS (and Kubernetes in general), a pod is a container or set of containers used to host an application. While EKS makes it easy to set up a cluster to host pods, ensuring the availability and optimizing the performance of pods is up to you.
Here again, CloudWatch Container Insights provides utilities for collecting essential pod-related metrics, including:
Pod_cpu_utilization: The percentage of CPU units in use by pods.
Pod_memory_utilization: The percentage of the total memory used by pods.
Service_number_of_running_pods: The total number of pods running a service or services.
Note that you can track these metrics based on the pods, namespaces, and services to which they correspond. Thus, you can determine which individual pods, namespaces, or services are associated with potential issues, like a spike in CPU or memory utilization.
In Kubernetes, an audit log is a record of API requests and how they were handled. In other words, audit logs let you track who (or what) issued a request to the Kubernetes API, which resources were requested, when the request took place, and how the API responded to the request.
Using audit logs, you can identify potential security issues (like repeated unauthorized requests to create or list a Kubernetes resource), as well as gain context on performance problems. For example, if pod CPU utilization spikes, you can check the audit log to see which API requests correspond with the change in CPU utilization.
Audit logging in generic Kubernetes is typically complex because Kubernetes provides the facilities for generating audit logs, but leaves it up to users to configure and generate the actual logs. However, in EKS, audit logging is considerably easier. You can send audit logs to AWS CloudTrail, the AWS cloud’s native auditing service.
You may need to deploy additional tools to help interpret the audit logs, but at least creating and collecting them is simple in EKS.
EKS control plane logs
In addition to audit logs (which EKS considers to be a form of control plane log), you can configure several other types of logs related to the EKS control plane (which means the software that manages your EKS clusters). These include:
API server logs: For tracking the performance of the Kubernetes API server.
Authenticator logs: For tracking authentication requests to EKS through the EKS Role-Based Access Control framework, which is based on AWS IAM.
Controller manager logs: For tracking activity in the Kubernetes controller manager, which enforces Kubernetes configurations.
Scheduler logs: For monitoring the Kubernetes scheduler, which is responsible for determining which worker nodes should host which pods.
You can enable control plane logs using the AWS CLI tool and a command like this:
aws eks update-cluster-config \
--region <region-code> \
--name <prod> \
This command enables all of the five available control plane logs (including audit logs). You can modify the “types” descriptor to enable only certain logs if you wish.
Like other EKS logs and metrics, control plane logs are stored in CloudWatch by default.
Using external tools to monitor EKS
While CloudWatch and CloudTrail are useful for collecting basic EKS logs and metrics and generating alerts based on them, they’re not full-featured log analytics or visualization solutions. For that reason, many EKS users opt to leverage an external tool that provides more extensive EKS monitoring features.
For example, you can use the Amazon EKS App for Sumo Logic to centralize and analyze monitoring data from all layers of the EKS architecture – nodes, pods, and the control plane. Sumo Logic provides built-in dashboards that are preconfigured specifically for EKS monitoring, making it easy to make sense of the large volumes of data that the various components of EKS generate.
Sumo Logic provides built-in dashboards that are preconfigured specifically for EKS monitoring
In addition, while Sumo Logic provides a bird’s eye view of overall EKS health and status, it also allows you to drill down into monitoring data. That means you can track the performance of individual pods, nodes, and other components, for instance, or trace a specific API request within audit logs.
Drill into individual pods, nodes and other components to track performance
Conclusion: getting the most from EKS monitoring
Although AWS provides some tools to help collect metrics and log data from EKS, it leaves users to figure out how to derive the most value out of that data. Tools like CloudWatch are useful for basic metrics collection and alerting, but you can gain deeper visibility into EKS clusters and workloads by using tools that are purpose-built for EKS monitoring, like the Amazon EKS App for Sumo Logic.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.