Sign up for a live Kubernetes or DevSecOps demo

Click here

Kubernetes

Learn how to get started with Kubernetes including how to monitor and manage your clusters, view your Kubernetes logs, and how to improve your Kubernetes security

Metrics to Monitor in Kubernetes

Now that we know which types of monitoring to perform for Kubernetes, let's discuss the specific metrics to collect in order to achieve visibility into a Kubernetes installation.

Common Metrics

Common metrics refer to metrics you can collect from the code of Kubernetes itself, which is written in Golang. This information helps you understand what is happening deep under the hood of Kubernetes.

Metric

Components

Description

go_gc_duration_seconds

All

A summary of the GC invocation durations

go_threads

All

Number of OS threads created

go_goroutines

All

Number of goroutines that currently exist

etcd_helper_cache_hit_count

API Server, Controller Manager

Counter of etcd helper cache hits

etcd_helper_cache_miss_count

API Server, Controller Manager

Counter of etcd helper cache misses

etcd_request_cache_add_latencies_summary

API Server, Controller Manager

Latency in microseconds of adding an object to etcd cache

etcd_request_cache_get_latencies_summary

API Server, Controller Manager

Latency in microseconds of getting an object from etcd cache

etcd_request_latencies_summary

API Server, Controller Manager

Etcd request latency summary in microseconds for each operation and object type

API Server Metrics

Since APIs serve as the glue that binds the Kubernetes frontend together, API metrics are crucial for achieving visibility into the API Server – and, by extension, into your entire frontend.

Metric

Description

apiserver_request_count

Count of apiserver requests broken out for each verb, API resource, client, and HTTP response contentType and code

apiserver_request_latencies

Response latency distribution in microseconds for each verb, resource and subresource

Etcd Metrics

Since Etcd stores all of the configuration data for Kubernetes itself, Etcd metrics deliver critical visibility into the state of your cluster.

Metric

Description

etcd_server_has_leader

1 if a leader exists, 0 if not

etcd_server_leader_changes_seen_total

Number of leader changes

etcd_server_proposals_applied_total

Number of proposals that have been applied

etcd_server_proposals_committed_total

Number of proposals that have been committed

etcd_server_proposals_pending

Number of proposals that are pending

etcd_server_proposals_failed_total

Number of proposals that have failed

etcd_debugging_mvcc_db_total_size_in_bytes

Actual size of database usage after a history compaction

etcd_disk_backend_commit_duration_seconds

Latency distributions of commit called by the backend

etcd_disk_wal_fsync_duration_seconds

Latency distributions of fsync calle by wal

etcd_network_client_grpc_received_bytes_total

Total number of bytes received by gRPC clients

etcd_network_client_grpc_sent_bytes_total

Total number of bytes sent by gRPC clients

grpc_server_started_total

Total number of gRPC’s started on the server

grpc_server_handled_total

Total number of gRPC’s handled on the server

Scheduler Metrics

Monitoring latency in the Scheduler helps identify delays that may arise and prevent Kubernetes from deploying pods smoothly.

Metric

Description

scheduler_e2e_scheduling_latency_microseconds

The end-to-end scheduling latency, which is the sum of the scheduling algorithm latency and the binding latency

Controller Manager Metrics

Watching the requests that the Controller makes to external APIs helps ensure that workloads can be orchestrated successfully, especially in cloud-based Kubernetes deployments.

Metric

Description

cloudprovider_*_api_request_duration_seconds

The latency of the cloud provider API call

cloudprovider_*_api_request_errors

Cloud provider API request errors

Kube-State-Metrics

Kube-State-Metrics is an optional Kubernetes add-on that generates metrics from the Kubernetes API. These metrics cover a range of resources; following, are the most valuable ones.

Metric

Description

kube_pod_status_phase

The current phase of the pod

kube_pod_container_resource_limits_cpu_cores

Limit on CPU cores that can be used by the container

kube_pod_container_resource_limits_memory_bytes

Limit on the amount of memory that can be used by the container

kube_pod_container_resource_requests_cpu_cores

The number of requested cores by a container

kube_pod_container_resource_requests_memory_bytes

The number of requested memory bytes by a container

kube_pod_container_status_ready

Will be 1 if the container is ready, and 0 if it is in a not ready state

kube_pod_container_status_restarts_total

Total number of restarts of the container

kube_pod_container_status_terminated_reason

The reason that the container is in a terminated state

kube_pod_container_status_waiting

The reason that the container is in a waiting state

kube_daemonset_status_desired_number_scheduled

The number of nodes that should be running the pod

kube_daemonset_status_number_unavailable

The number of nodes that should be running the pod, but are not able to

kube_deployment_spec_replicas

The number of desired pod replicas for the Deployment

kube_deployment_status_replicas_unavailable

The number of unavailable replicas per Deployment

kube_node_spec_unschedulable

Whether a node can schedule new pods or not

kube_node_status_capacity_cpu_cores

The total CPU resources available on the node

kube_node_status_capacity_memory_bytes

The total memory resources available on the node

kube_node_status_capacity_pods

The number of pods the node can schedule

kube_node_status_condition

The current status of the node

Kubelet Metrics

Monitoring the Kubelet agent will help ensure that the Control Plane can communicate effectively with each of the nodes that Kubelet runs on. Beyond the common GoLang runtime metrics described above, Kubelet exposes some internals about its actions that are good to track as well.

Metric

Description

kubelet_running_container_count

The number of containers that are currently running

kubelet_runtime_operations

The cumulative number of runtime operations available by the different operation types

kubelet_runtime_operations_latency_microseconds

The latency of each operation by type in microseconds

Node Metrics

Monitoring standard metrics from the operating systems that power Kuberntees nodes provides insight into the health of each node. Common node metrics to monitor include CPU load, memory consumption, filesystem activity and usage and network activity.

Container Metrics

While metrics from Kubernetes can provide insight into many parts of your workload, you should also home in on individual containers to monitor for resource consumption. CAdvisor, which analyzes resource usage inside containers, is helpful for this purpose.

Log Data

When you need to investigate an issue revealed by metrics, logs are invaluable for diving deeper by collecting information that goes beyond metrics themselves. Kubernetes offers a range of logging facilities for most of its components. Applications themselves also typically generate log data.