What are VPC Flow Logs?
As the complexity of websites and networking has increased, so has Amazon’s support for monitoring all of the rich data that sites and networks hosted in the AWS cloud generate. In summer 2015, Amazon released a Flow Logs feature for AWS Virtual Private Cloud (VPC). VPC Flow Logs make it easy to collect log data for an entire VPC, a specific subnet or an individual Elastic Network Interface (ENI).
Here’s a look at why VPC Flow Logs are useful, how to enable them, and how to connect them to Sumo Logic for deep analysis of the log data.
The Value of VPC Flow Logs
Let’s start by discussing why you would want to use VPC Flow Logs.
If you use an AWS VPC to host a website or web app, a lot of web traffic flows through your virtual hosting environment. Monitoring and analyzing that data can lead to key insights about who is using your site or app, how they are connecting, when they are logging in and so on.
VPC Flow data is equally important for monitoring internal application metrics, especially for apps built using microservice architectures. That’s because the microservices that constitute an app rely heavily on the internal network to communicate. Monitoring internal traffic is therefore key to understanding how the app is performing.
VPC Flow Logs help provide those insights because they collect the following types of information:
- Where a connection originated (such as the source IP).
- Detect connection’s endpoint (such as the destination IP).
- The Protocol used to send the data.
- Port numbers used for requests.
- Success or failure of the data flow.
- Traffic rejected due to security group and network Access Control List rules.
With that information, you can answer questions such as the following:
- Which geographic regions are generating the most users of your site or app?
- How many repeat visits are being made to your site?
- At which times of day does your site experience the heaviest load?
- Are attempts being made to find open ports or other potential security vulnerabilities in your configuration?
- Is data bottlenecking on internal or external network connections, and if so, where?
These and similar questions are beneficial to several different types of groups within the organization. For example, marketing will benefit by being able to use VPC Flow data to create a more accurate profile of the current user base.
Likewise, security admins can leverage the VPC information in many ways:
Augment data collected by other threat-detection systems to bolster security insights.
Create a baseline of normal activity, which is useful when trying to identify abnormal events that could signal an attempted attack.
Identify potential botnet activity on a network by comparing the time-stamps and periodicity of certain traffic, or looking for connections to hosts associated with known botnets.
Detect and block vulnerability scans against their network by checking for ping sweeps, port scans and other malicious activity associated with attempts to discover weaknesses in the network. Once the sources of such scans are identified, security admins can block them from further access in order to prevent intrusions.
Improve troubleshooting of performance problems and the optimization of connectivity for development, testing and ITOps teams.
VPC Flow Logs vs. other Data Sources
To be sure, VPC Flow logs are not the only way to gain visibility into some of the trends outlined above. You could also use Web server access logs to determine the geographic origins of your traffic and which times of day traffic is heaviest, for example. You could use tcpdump or a similar tool to monitor connections on a local network.
What makes VPC Flow logs especially valuable, however, is that they provide a single source of information for monitoring data across parts of the network. Inbound network connections from external IP addresses, traffic produced by traditional services (such as NFS file shares) on the internal network and connections between microservices are all visible from VPC Flow logs.
So, while VPC Flow logs may not be the exclusive source of information for most types of network activity, they offer a centralized, comprehensive way to monitor all aspects of the network. That makes them an especially useful source of information for DevOps teams focused on efficiency and across-the-board visibility.
Setting Up VPC Flow Logs
There are two ways to enable VPC Flow Logs. The first approach entails using the command-line, and the second involves pointing-and-clicking your way through the VPC GUI. (For the record, you could also do this with the CreateFlowLogs actionon the AWS API, But that is the topic for the future article.
AWS CLI set up
I’m a command-line guy, so I’ll start with that option. To enable VPC Flow Logs from the command line, you use the create-flow-logscommand on the AWS command-line interface. The syntax summary of the command is as follows:
The details of the arguments are specified in the command documentation, and the arguments are probably self-explanatory for the most part if you are already familiar with AWS, so we won’t rehash them here.
But to provide a basic example, here is a command that would enable VPC Flow Logs for subnet traffic on the resource ID i-1234567890abcdef0. The logs would monitor all types of traffic (in other words, both accepted and rejected traffic), save it to a CloudWatch Logs log group called flow-logs under account 123456789101 and use the IAM role publishFlowLogs.
aws ec2 create-flow-logs –resource-ids i-1234567890abcdef0 –resource-type Subnet –traffic-type ALL –log-group-name flow-logs –deliver-logs-permission-arn arn:aws:iam::123456789101:role/publishFlowLogs
AWS Management Console GUI
If you prefer to use a graphical interface, you can also enable VPC Flow Logs through the AWS Management Console.
To do so, first select the VPC for which you want to enable logs, then click Create Flow Log, as below:
This will open up a wizard where you set up the Flow Log. It will allow you to configure the same parameters that we covered in the CLI section above. Here’s an example of what the wizard looks like:
That’s all there is to it.
Accessing Your VPC Flow Logs
Once your logs are enabled, you can access them through the CloudWatch interface by selecting the appropriate log group, then the log stream that you want to view. Log stream data will look something like this:
Analyzing VPC Flow Log Data
That’s a lot of information, and it would be hard to make sense of it all by hand. That’s where log analysis tools like Sumo Logic come in.
Sumo Logic offers an app for VPC Flow, which delivers rich visualizations of VPC Flow log data. For example, this Sumo Logic dashboard helps you visualize data related to geographic sources and destinations of data:
Here’s another example of the Sumo Logic’s log analysis app, which provides a visual summary of reject traffic data logs:
When combined with real-time visualizations like the ones above, the diverse data sources that VPC Flow Logs can monitor deliver crucial information to many different teams across your organization.
Flow Log Limitations
For all that they can do, VPC Flow Logs are subject to some limitations. Amazon explains them in full , but the most significant ones to note include:
- You can’t modify a Flow Log’s configuration parameters once it is created. Instead, you have to delete it and create a new log. That’s not difficult, but it’s a bit annoying from a usability perspective.
- Network interfaces with multiple IP addresses will have data logged only for the primary IP as the destination address. This makes Flow Logs less useful in configurations involving multiple IPs on a single interface.
- Flow Logs exclude traffic related to DHCP requests and Amazon DNS activity. (Traffic for a non-Amazon DNS server is logged.) In many cases, this may not matter, but it is a limitation if you need to troubleshoot an issue with your site related to DHCP or DNS. For example, you may be experiencing poor performance due to slow DNS resolution. There are also valuable security insights that you can glean from DHCP and DNS traffic, such as detecting packet sniffing attempts by looking for unusual rates of IP conflicts, usage of the same MAC address by multiple hosts or the sharing of DNS records by machines with the same IP address.
Despite these relatively minor drawbacks, VPC Flow logs are a powerful tool if you’re seeking to get the most out of your site. They not only help make sure your site is running smoothly and securely on AWS, but also deliver insights that can be leveraged by non-technical teams to drive a business forward.