Pricing Login
Interactive demos

Click through interactive platform demos now.

Live demo, real expert

Schedule a platform demo with a Sumo Logic expert.

Start free trial
Back to blog results

May 30, 2024 By Anton Ovrutsky

What’s going on? The power of normalization in Cloud SIEM

Normalization in Cloud SIEM

Many of us in the information security sphere have sat in front of a console and furiously executed various queries while either mumbling internally or externally, with varying levels of stress and frustration: what is going on?

When investigating a particular system, an odd event, or a declared incident, we are all attempting to answer this question in one way or another.

Detections, documented threat hunts and security operations procedures do not manifest out of thin air. To craft detections, hunts or operational alerts, we first need to get a handle on our network and the systems that make up our digital footprint.

Many years ago, this network meant something much different than it does today. In today’s world, a network is composed of SaaS services, cloud workloads, identity providers and on premises infrastructure all working together to enable some kind of business operation and process.

How then do we as defensive security professionals begin to get a handle on our network and examine aspects such as data source availability, odd or malicious behaviors and deviations from policies and baselines?

One data feature that can assist us with this task is normalization. Normalization is critical when trying to answer the proverbial question of what’s going on. While plenty of solutions claim to offer normalization on paper, not all normalization is normal or easy to use

Let’s focus on queries that sit at the intersection of detection engineering, security operations and threat hunting:

Intersection of detection engineering, security operations and threat hunting

The queries are not designed to instantly flag on malicious activity, but rather to provide a broad overview of network and cloud operations and oddities. They are also designed to be tweaked and modified to suit particular environments.

A common thread throughout these queries, however, is normalization. As we will demonstrate, normalization has the potential to serve as the bedrock of your detection engineering, threat hunting and security operations initiatives.

Normalization in Cloud SIEM

Cloud SIEM has a robust record-processing pipeline that turns raw messages into records. Each record contains various enrichments as well as data that are normalized to the Cloud SIEM schema.

You can read more about the critical importance of the command line as a source for detection engineering and threat-hunting efforts, and this is a great example of the importance of normalization.

The command line data source component can potentially have different field names, depending on the data source that is generating the telemetry. In practical terms, this means that an analyst writing a query based on the command line field being called “command_Line” may or may not be capturing all the relevant telemetry.

In visual form, this dynamic looks something like the image below:

Normalization in Cloud SIEM

In this example, regardless of the origin of the telemetry, so long as a proper mapper and parser is applied, all the variations of “command line” will get normalized in accordance with the Cloud SIEM schema into a field named commandLine. Sumo Logic is uniquely capable of this form of normalization compared to other solutions because we can ingest structured and unstructured data, so you won’t simply drop that data if it doesn’t fit into the existing schema.

This dynamic applies to other fields as well, such as the username values, source and destination IP addresses and other critical fields.

Although we think normalized data is exciting in and of itself, there are even more exciting features to our platform. Namely, users have the ability to query this normalized data with the exact same Sumo Search Query Language as is used for raw unnormalized or raw data. Let’s take a look at some examples of this.

Normalization use cases and examples

General data insights

Before diving into any detection engineering or threat hunting activities, it’s a good idea to familiarize yourself with the data that are available in your environment.

We can get a very high level overview of what data sources are available to us that contain a username value of some kind, using the following query:

| count(user_username) by metadata_product

The results, in pie chart format, will look something like the following:

General data insights

With this view, we can gain a high-level overview of what data sources are available, as well as their volume.

Taking this example further, we can expand on the above query and look at more detailed event breakdown, sorted by the normalized username field:

| where !isBlank(user_username)
| where !isBlank(action)
| count(action) by user_username,action,metadata_deviceEventId,metadata_product

This type of query is great to get a sense of which users or service accounts are generating telemetry and insights into what exact telemetry is being generated.

Looking at data in this manner is also useful for getting a handle on what’s going into your SIEM and identifying potential spikes or noise in data:

General data insights 2

Pivoting to cloud data sources, we can also get a broad overview of what kinds of actions users are taking in various cloud providers and services:

| where !isBlank(cloud_provider)
| values(normalizedAction) as actions by user_username,cloud_provider

And the results:

General data insights 3

As anyone who has looked at telemetry from disparate cloud providers knows, overlaying data isn’t easy. Indeed, this is an area where normalization provides a very heavy lift.

Peering into authentication events

Authentication events are critical to understand, both from a general systems operations perspective and from a security operations framework.

We can use normalized data to quickly craft a time chart of authentication events by user name and product:

| timeslice 1d
| count(user_username) by metadata_product,user_username,_timeslice
| transpose row _timeslice column user_username,metadata_product

The results are a bit difficult to present in screenshot form, but look something like the following:

Peering into authentication events

From here, we can check and uncheck the usernames on the right-hand side with the chart dynamically adjusting. Through this view, we can gain information regarding which user generated what authentication data, and if any data that we expect to see from a user is missing.

In addition to looking at authentication events from a user for a particular product, we can also look at these events generally, with an outlier function overlaid in order to flag on any dips or spikes in authentication events.

In query form, this looks like:

| timeslice 1d
| count(user_username) as auth_count by _timeslice
| outlier auth_count window=5,threshold=3,consecutive=2,direction=+-

And looking at the results:

Peering into authentication events 2

We can see a very slight negative outlier (a dip) in authentications on October 10th compared to other days - this represents a thread that we can pull and follow up on to determine if the dip was caused by legitimate purposes, system issues or a security incident.

In addition to generally looking at authentication events, we can dig in a bit further and look at multi-factor authentication related events as well.

Multi-factor authentication (MFA) is a critical component to modern-day networks. Normalization in Cloud SIEM works to normalize MFA events across various authentication providers, making queries like the example below possible:

| where !isNull(mfa)
| values(mfa) as mfa by user_username,metadata_product,metadata_sourceCategory,metadata_deviceEventId

Looking at the results, we see some authentication events occurring without multi factor authentication:

Peering into authentication events 3

We can expand on the above query and add some logic to flag on specific data sources that we may deem sensitive:

| where !isNull(mfa)
| if(!mfa and metadata_deviceEventId = "AwsConsoleSignIn-ConsoleLogin","true","false") as aws_mfa_missing
| if(!mfa and metadata_deviceEventId = "SignInLogs" ,"true","false") as azure_mfa_missing
| where azure_mfa_missing = "true" OR aws_mfa_missing = "true"
| values(mfa) as mfa by user_username,metadata_product,metadata_sourceCategory,metadata_deviceEventId

The above query is looking at all authentication normalized events and is returning results only if a normalized action of “mfa = false” was found in either the AWS Console logs or Azure sign in logs:

Peering into authentication events 4

From here, we can investigate further and determine whether the accounts authenticating to our cloud services are risky and require MFA enabled.

The network layer

For detection engineers and SOC analysts alike, the network layer presents a particularly thorny challenge when it comes to gaining insights from telemetry.

Many network devices and appliances are found on the network and these devices can often be from different vendors, each with a different logging format.

Here, Cloud SIEMs normalization and enrichment features shine brightly. For example, consider a scenario where we want to look at what traffic was leaving our network on non-standard web ports.

Looking at destination IP addresses alone would not be very useful, so instead we want to look at the associated Autonomous System Number (ASN) of the external IP.

In order to accomplish this task using non-normalized data, you would first have to figure out what fields were available in your telemetry from various network devices, and then craft queries in such a way to that takes into account the various permutations of the particular field you want to look at ( dstIp, dstDeviceIP, DestinationIP and so forth).

Then, you would need to create some kind of lookup to associate a particular IP address with its ASN.

The Sumo Logic platform and Cloud SIEM complete both these steps for you, making queries like the one below possible:

| where !isNull(dstPort) 
| where action = "success"
| where dstPort not in("80","443")
| values(dstDevice_ip_asnOrg) by device_hostname

With the following results:

The network layer

Any network device, appliance and even hosts that log network traffic via tools like Sysmon or endpoint detection and response (EDR) products will have their telemetry normalized. This type of dynamic reduces toil on an analyst who needs to quickly find information without worrying about renaming potentially tens of different field names.

Another use case where normalization comes in extremely handy is when looking for exfiltration or any types of spikes in data sent outbound. Without normalization, analysts who are attempting to wrangle network exfiltration need to ensure that all fields containing outbound data are accounted for, with Cloud SIEM’s normalization, once again, this is done for you.

In query form, this dynamic looks something like:

| where !isNull(bytesOut)
| where !isBlank(http_userAgent)
| bytesOut/1024/1024 as MB
| round(MB)
| sum(MB) as MBOutbound group by http_userAgent
| where MBOutbound > 0

or, looking at gigabytes instead of megabytes

| where !isNull(bytesOut)
| where !isBlank(http_userAgent)
| bytesOut/1024/1024/1024 as GB
| round(GB)
| sum(GB) as GBOutbound group by http_userAgent
| where GBOutbound > 0

And the results will look something like:

The network layer

In this example, we are grouping user agents, but depending on the type of telemetry being used, other fields such as user names or even applications can be used.

Final thoughts

Normalization is a critical aspect of detection engineering, threat hunting or any type of security operations in general. Without normalization in place, analysts and those crafting SIEM queries need to juggle, rename and otherwise look up various fields that are stemming from various sources of telemetry within their networks.

Normalization in Cloud SIEM solves this problem and provides analysts with a terrific starting point for any data wrangling efforts.

Learn more about Cloud SIEM or check out our SIEM product tours.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Anton Ovrutsky

Senior Threat Research Engineer

Anton Ovrutsky leverages his 10+ years of expertise and experience as a BSides Toronto speaker, C3X volunteer, and an OSCE, OSCP, CISSP, CSSP and KCNA certificate holder in his role at Sumo Logic's Threat Labs. He enjoys the defensive aspects of cybersecurity and loves logs and queries. When not diving into the details of security, he enjoys listening to music and cycling.

More posts by Anton Ovrutsky.

People who read this also enjoyed