2022 Gartner® Magic Quadrant™ SIEM
Get the reportMore
If your team falls into the majority of organizations that use NGINX – which remains the world’s most popular Web server – to host websites and Web applications, monitoring NGINX usage, performance, and transactions is critical for maintaining a positive end-user experience.
Keep reading for tips on doing so. This article identifies the most important metrics to monitor for NGINX in order to understand key usage and performance trends within NGINX transactions.
NGINX is most famous as a Web server, although it can also function as a reverse proxy, HTTP cache, and load balancer.
NGINX has become massively popular as a Web server solution due in part to the fact that it is open source. It is also simpler to administer in some ways than Apache, the next-most-popular open source Web server. NGINX tends to consume fewer resources, too, which makes it useful in fast-moving environments that need to operate efficiently and scale quickly.
In order to guarantee that NGINX fully delivers the performance that it can potentially offer, it’s critical to monitor it continuously. Like any Web server, NGINX is a complicated platform that performs a complex set of tasks: it accepts requests from clients, retrieves their content, serves the content, caches content to improve performance, and so on.
On top of this, NGINX’s load-balancing and reverse proxy features add other facets of functionality that may be critical to monitor, depending on what your team is using NGINX for.
To ensure that NGINX transactions perform adequately and that NGINX resource usage does not become dangerously high, teams should track the following metrics.
Request time tracks the total time that it takes NGINX to read a client request, then process and deliver the response.
Long request time could indicate a range of problems: an overwhelmed server, exhaustion of network bandwidth, malformed requests, and more. To gain greater clarity into the issue, it’s helpful to track whether all requests are slow, or just requests for certain clients or certain types of content.
Tracking the total number of connections that NGINX accepts each second helps measure the overall load placed on the server. In addition, by measuring how this metric varies over time, you can predict periods of high demand (like certain times of day) when you may need to add more resources to the server or create additional NGINX instances to handle increased demand.
While connections per second measures how many clients have open sessions with the server, requests per second measures how many are actually requesting content. Look at this metric alongside connections per second to gain a stronger sense of what your server load actually is: high connections per second paired with low requests per second doesn’t necessarily translate to high load.
Sometimes, a client may stay active for a long period of time. This could be benign behavior, like a Web service that needs long-lasting connectivity rather than just a one-off request for content. On the other hand, a large number of active connections that are not related to long-lived connections could lead to inefficiency and an unnecessary increase in server load.
Typically, NGINX shouldn’t drop connections. If you see a large number of dropped connections, there’s a good chance that the server is overloaded and needs more resources. Or, dropped connections could result from problems reading or processing data from applications.
In addition to the NGINX-specific metrics described above, you should track key infrastructure metrics on your NGINX servers in order to monitor their overall health. The three core metrics here include:
CPU usage: CPU usage above about 90 percent means that you should probably allocate more resources to the server before it becomes maxed out. You should also dig deeper to determine whether the high CPU usage is simply the result of high traffic, or if it is a result of an inefficient configuration or buggy application.
Memory usage: Likewise, you don’t want to run out of memory and cause NGINX to start dropping connections or crash. When memory usage gets high, you should allocate more memory if possible, then figure out whether there is an underlying problem to address.
Disk usage: High disk usage is less likely to be the result of a critical application or server problem than it is a simple exhaustion of storage resources. Still, when you get close to running out of disk space, you’ll need to provision more or free up some of what you have to ensure NGINX can keep writing (and, where applicable, caching) data.
There are a variety of other metrics that you can collect in NGINX. Ideally, you’ll monitor more than just the basic usage and performance data described above.
You will also ideally correlate a variety of NGINX metrics in order to understand complex problems. Monitoring individual metrics is often not very useful; to identify the root cause of problems with NGINX transactions or performance, you’ll need to evaluate different types of metrics (like CPU usage, active connection rates, and dropped connections) side-by-side.
Sumo Logic makes it simple to do this. The Sumo Logic App for NGINX automatically and continuously collects data from across your NGINX environment, then makes it easy to analyze with prebuilt dashboards and advanced analytics tailored to NGINX. You can monitor complex transactions and drill down into usage and performance patterns to understand what is really happening within complex NGINX instances, without having to spend hours.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.Start free trial
Observability has become one of the most important areas of your application and infrastructure landscape, and the market has an abundance of tools available that seem to do what you need. In reality, however, most products – especially leading open-source based products – were created to solve a single problem extremely well, and have added additional supporting functionality to become a more robust solution; but the non-core functionality is rarely best of breed. Examples of these are Prometheus and Grafana.