Setting sourceCategory values, especially for a small set of sources, may seem trivial at first. Good sourceCategory values are however indispensable for scale and performance in the long term. This blog post discusses some best practices around sourceCategory values.

Source Categories have 3 main purposes:

1) Scoping your searches

2) Indexing (partitioning) your data

3) Controlling who sees what data (RBAC)

Our recommendation for sourceCategory values follows this nomenclature:

component1/component2/component3…

starting with the least descriptive, highest level grouping and getting more descriptive with each component, the full value describing the subset of data in detail.

For example, assume you have several different Firewall appliances, ASA and FWSM from Cisco and 7050 from Palo Alto Networks. In addition you also have a Cisco router, 800 series.

Following the above nomenclature we could set the following values (instead of simply using “FWSM”, “ASA”, etc):

Networking/Firewall/Cisco/FWSM
Networking/Firewall/Cisco/ASA
Networking/Firewall/PAN/7050
Networking/Router/Cisco/800

While the components at the beginning of the value do not add any obvious value they do provide a high level grouping of this data. This allows us to:

1) Easily and effectively define the scope of our search:

_sourceCategory=Networking/Firewall/* (all firewall data) or _sourceCategory=Networking/*/Cisco/* (all Cisco data)

With one sourceCategory specification and wild cards we can find the subset of data we need without any need for boolean logic (OR).

2) If we wanted to create a separate index for the networking data for better performance we can specify an index with the following routing expression:

_sourceCategory=Networking*

Since indexes cannot be modified (they can only be disabled and recreated with a new name and/or routing expression) we want to make sure that we do not have to modify them (and re-educate all users) unless something major changes. Using high level groups with wild cards to specify the index will self-maintain and help drive adoption of the indexes with your users.

3) Similar to the indexing, if you wanted to restrict access to this data you can now use the high level values, reducing the amount of managing these rules as you add more data.

High level groupings can be built with a variety of items, for example environment details (prod vs. dev), geographical information (east vs. west), by application, by business unit or any other value that makes sense for your data.
The order in which we use these values is determined by how you are searching the data.

For example, if most of your use cases do not need data from both prod and dev environments, you could use:

Prod/Web/Apache/Access
Dev/Web/Apache/Access
Prod/DB/MySQL/Error
Dev/DB/MySQL/Error

You can still search across both when needed but this scheme splits all your data up into prod and dev more intuitively.

If, on the other hand you do have a need to search this data together frequently, you could use:

Web/Apache/Access/Prod
Web/Apache/Access/Dev
DB/MySQL/Error/Prod
DB/MySQL/Error/Dev

This simple change completely changes your high level grouping. Both schemes allow you simply cover both use cases, the difference is what looks for intuitive for the majority of your use cases.

Good sourceCategory, bad sourceCategory

Table of contents

Balancing act: Sumo Logic vs. Splunk in the high-wire world of modern security

The privacy illusion: when deleting your data doesn’t actually delete your data

Get more out of Sumo Logic: five log search hacks you’ll actually use

Good sourceCategory, bad sourceCategory

Table of contents

People who read this also enjoyed

Balancing act: Sumo Logic vs. Splunk in the high-wire world of modern security

The privacy illusion: when deleting your data doesn’t actually delete your data

Get more out of Sumo Logic: five log search hacks you’ll actually use