2022 Gartner® Magic Quadrant™ SIEM
Get the reportMore
System administrators hold many key responsibilities within an IT organization. Most importantly, they must ensure that all systems, services, and applications are up, running, and performing as expected. When a system starts to lag or an application is down, the system administrators are called upon to troubleshoot and resolve the issue as quickly as possible to limit the impact on customers.
Reacting to and resolving these issues in a time-efficient manner requires useful metrics that can be leveraged to diagnose problems. One way in which these metrics can be collected is through the use of collectd, a Unix daemon for collecting systems and application performance statistics. In this article, I’ll explain how to get started with collectd by showing you how to install and configure the daemon, along with detailed instructions for properly collecting system metrics that can be used to resolve performance-related issues.
As mentioned above, collectd is a Unix daemon for collecting system and application performance statistics. As a daemon, collectd runs in the background and gathers key system metrics that can be used to produce valuable visualizations for gaining insight into issues within a particular system. If you are in the process of evaluating options for tools that record system metrics, there are several key advantages to working with collectd that you should take into consideration:
Now that we’ve familiarized ourselves with collectd and its advantages, let’s take a look at how to get started with it by implementing metrics gathered on a machine running Ubuntu 18.04. In this example, I will install an Apache web server on an Ubuntu machine, and then install and configure collectd to gather metrics from our Apache web server.
As a first step, you will need to check to see if you have Python installed on your machine and install it if you do not. This is a relatively straightforward process and can be accomplished with a few simple commands:
Prior to installing any software on the machine, first run the following command:
sudo apt-get update
This will update our package lists to ensure that we will be downloading and installing the latest and greatest versions of the software that we require. Once this command has been executed successfully, we can continue with our installations.
For the purposes of this exercise, let’s install Python 3.7 on the Ubuntu 18.04 machine. To do so, run the following command:
sudo apt-get install python3.7
First, let’s install our Apache web server prior to the collectd installation. The following command will install our Apache web server with the default set up for an Ubuntu machine:
sudo apt-get install apache2
Once the install command for Apache executes successfully, we should start our web server to ensure that it has been properly installed:
sudo service apache2 start
After starting the web server, you should be able to access the localhost. Open a web browser and type in the following URL:
Now that we have our Apache web server running on our machine, it’s time to install collectd! This can be done using the apt package manager. Simply execute the following command and the apt package utility will install collectd on our host:
sudo apt-get install collectd
Now collectd is installed on the Ubuntu machine and you’re ready to collect a variety of systems and applications metrics.
The next step is to configure collectd for our purposes. This configuration is defined in the collectd.conf file located in /etc/collectd on the Linux machine. The command for using vim to modify this configuration file is as follows:
sudo vim /etc/collectd/collectd.conf
Collectd makes configuration simple by providing as much information as possible to help you get started. You will find that many lines within the configuration file are commented out, and simply commenting/uncommenting will help you set up a basic configuration that will work for you. As we’ll see later, collectd also provides commented configurations for plugins that are disabled by default to help format your configuration file properly when enabling them.
Right now, we’re just going to set the name of the host machine that we’re running collectd on and we’re going to disable the FQDNLookup option to prevent the daemon from trying to discern the fully qualified domain name. I am choosing “localhost” for my host name, so my configuration file looks like this:
While we will demonstrate the LoadPlugin option later in this tutorial, there are also a variety of other configuration options that are beyond the scope of this article. Please visit collectd’s configuration documentation for more insight.
In order to gather metrics for the web server, the Apache plugin for collectd queries the status page generated via the Apache status module - mod_status. Thus, we must first ensure that the mod_status module is enabled for apache2 on the host machine. To see if it is enabled by default, visit the following URL:
If this link brings you to the Apache web server statistics page generated by your Apache instance, you are all set! If not, you must enable the mod_status module. There are a few ways to do this, and one is to run the following command in your terminal:
sudo a2enmod status
Another way to enable mod_status is to open the status.conf configuration file and either uncomment or add a few lines of code. On a machine running Ubuntu 18.04, the status.conf file will be located in /etc/apache2/mods-enabled.
Using vim, we can open the configuration file with the following command:
sudo vim /etc/apache2/mods-enabled/status.conf
Either uncommenting or adding the following lines should enable the mod_status plugin and allow the apache2 instance to generate the web statistics page at the /server-status endpoint.
Stop and restart the Apache web server with the following commands and revisit the /server-status link to view the statistics page:
sudo service apache2 stop
sudo service apache2 start
With the status module enabled within the Apache instance and the /server-status URL available, it’s time to configure the Apache plugin within the the collectd configuration file.
Configuration documentation: https://collectd.org/wiki/index.php/Plugin:Apache
Once again, open the collectd.conf file located in /etc/collectd. Locate the LoadPlugin section within the configuration file and add or uncomment the following line:
Insert the following block of code:
In the code snippet above, we are configuring one instance for the Apache plugin. This instance will be referred to as “web-tracking,” and http://localhost/server-status?auto will be utilized by collectd to gather the web server metrics. Be sure to append “?auto” to the end of the URL, as a failure to do so will result in a MIME type of “text/html” being returned. This is incompatible with the plugin, and “?auto” will force the MIME type to be “text/plain.” Please consult the official collectd documentation for this plugin for further information on the process of configuration.
Collecting metrics for with collectd can lead to data overload for an organization. In other words, while it’s great that collectd has the ability to be so granular with their data collection, it is also nice to be able to aggregate these statistics where it can make things simpler for the SysAdmin analyzing the data. Fortunately, collectd has a plugin for that. The Aggregation plugin has a variety of applications and configuration options designed to allow the user to take the raw data gathered via collectd and consolidates this data to make it more understandable to the human eye. For example, taking the CPU utilization statistics for each core on a particular host machine and performing calculations to obtain the average across these cores for that particular host.
Once you have identified the metrics being gathered that would be more useful to you when aggegated in a particular manner, open the collectd.conf file (located at /etc/collectd/collectd.conf) and add the following line in the LoadPlugin section:
From here, you will need to configure an instance of the plugin to aggregate the desired metric in a particular manner. This will include adding a block of code within the configuration file that will resemble the following:
Within the <Aggregation> tag you will implement your particular configuration for metrics aggregation. Keep in mind that these aggregated values generate new names based on this configuration. And you’ll want to understand the naming schema. If you wish to dive into these options a little further I encourage you to check out the naming schema information as well as a few useful sample configurations which are readily available in the official collectd wiki.
Earlier, we mentioned stopping and starting the Apache web server and carbon services, but what about the collectd service? In order to refresh the configuration values within collectd, you should stop (if running) and restart the collectd service. This can be done using straightforward stop and start commands in your Linux terminal:
sudo service collectd stop
sudo service collectd start
After executing these commands, I recommend checking the status of all services to ensure that both apache2 and collectd are running on the host machine:
sudo service --status-all
Tracking other system metrics with collectd is often as simple as enabling other plugins within the collectd configuration. There are a variety of plugins to consider, some of which are configured with the default installation of collectd.
sudo apt-get install kcollectd
Once installed, and assuming that collectd is configured properly and the service is running, we can view our data using kcollectd. First, launch kcollectd by running the following command:
The interface for kcollectd should now be open. You will see your configured instances in the left pane. In our case, let’s take a look at the graphs for the Apache plugin. Since we applied the name “web-tracking” to our instance, we will be looking for that name in the tree. Selecting that instance and the various categories nested within it will allow us to visualize the Apache metrics gathered by collectd.
As we discussed earlier, there are simple programs (such as kcollectd) that enable data visualizations for gleaning insights from metrics gathered by collectd; but sometimes you need a more complete solution. Sumo Logic is a log management and analytics platform that (with a little help from an open source plugin) can read the statistics gathered by collectd and produce visualizations that can assist the system administrators with the network management.
Once you’re set up with a Sumo Logic account (free trials are available), the process for integrating with collectd is relatively straightforward:
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.Start free trial
Observability has become one of the most important areas of your application and infrastructure landscape, and the market has an abundance of tools available that seem to do what you need. In reality, however, most products – especially leading open-source based products – were created to solve a single problem extremely well, and have added additional supporting functionality to become a more robust solution; but the non-core functionality is rarely best of breed. Examples of these are Prometheus and Grafana.