Threat correlation and prioritization (what do I pay attention to in an avalanche of highlighted threats?) and threat investigation (how do I decide what happened and what to do quickly?) are extremely challenging core functions of the security defense, resulting in many cases with less than 10% of high priority threats fully investigated. The accelerating migration to cloud and modern application deployment are making these already difficult workflows untenable in traditional models, leading to questions such as how to gather and correlate all of the new sources of data at cloud scale? How to understand and triangulate new dynamic data from many layers in the stack? How to react with the pace demanded by new models of DevSecOps deployment? And how to collaborate to connect the dots across evolving boundaries and silos?
Last week a veteran of many cloud migration security projects I know described many SOCs as“groping in the dark” with these challenges and looking for a new approach despite all of the vendor claims mapped to their pains. The usual crowd of incremental enhancements (e.g. bringing cloud data into the traditional SIEM, automating manual workflows, layering more tools for specialized analytics, leveraging wisdom of crowds, etc.) leaves three dragons roaming the countryside which need to be slain for security to keep pace with the unstoppable accelerating migration to the cloud.
Dragon #1 – Siloed Security and IT Ops Investigation Workflows
A basic dilemma in security for the cloud is that often the knowledge needed to pursue an investigation to conclusion is split between two groups. Security analysts understand the process of investigation and the broad context, but often only IT ops understands the essential specific context – application behavior and customer content, for example – needed to interpret and hypothesize at many steps in a security investigation. A frequent comment bucket item goes something like, “The SOC understands the infrastructure, but they don’t know how to interpret app logs or new data sources like container orchestration.” This gap in understanding makes real time collaboration essential to prevent exploding backlogs, partial investigations, and bias toward more solvable on-prem alerts.
Aside from needing to understand unfamiliar, new, and rapidly changing data sources in a single security investigation, cloud deployments generate more frequent “Dual Ticket” cases in which it is unknown whether a security issue or an IT issue is the root cause (ex: my customer is complaining they can’t access our app – network congestion? Cloud provider outage? Server CPU overload? DDoS attack? Malware? Customer issue?) It isn’t just that two separate investigations take more time and resources to complete and integrate, often, in cloud cases, neither side can reach conclusion without the other. Working from common data isn’t enough – analytics and workflow need to be common as well to enable the seamless collaboration required.
In addition, modern cloud deployments often employ DevSecOps models in which the pace of application update, rollout, and change is measured in days or hours as opposed to months or quarters. One security threat investigation implication is that the processing of the threat resolution backlog must align so that current resources can be applied to current environments without being mired in “old” cases or chasing continuous flux in the data. This is challenge enough, but having to manage this triage across two separate backlogs in both IT and security with the usual integration taxes means operating on the scale of hours and days is extremely challenging.
While separate siloes for IT ops and security investigations were feasible and logical in on-prem classic IT, modern cloud deployments and application architecture demand a seamless back and forth workflow where at each step the skills and perspective from both IT and security are needed to properly interpret the results of queries, evidence uncovered, or unfamiliar data. Asking both sides to completely subsume the knowledge of the other is unrealistic in the short term – a much better solution is to converge their workflows so they can collaborate in real time.
Dragon #2 – Traditional Security Bias on Infrastructure vs. Application Insight
Traditional SIEMs have long been exhorted to look up the stack to the application layer, and in several instances new product areas have sprung up when they have not. In the cloud world this application layer “nice to have” becomes a “must have.” Clould providers have taken on some of the infrastructure defense previously done by individual companies, creating harder targets that cause attackers to seek softer targets. At the same time, much of the traditional infrastructure defense from the on-prem world has not yet been replicated in the cloud, so often application layer assessment is the only investigation method available.
In addition to the defensive need to incorporate the application layer, there clearly is additional insight at that layer which is unknown at the infrastructure layer (e.g. customer context, behavioral analytics, etc.). This is particularly true when it is unclear whether a security or an IT problem exists. Many point systems specialize in extracting actionable insight from this layer, but the holistic correlation and investigation of threat is more difficult, in part because of wide variations in APIs, log formats, and nomenclature. Looking forward, modern application deployment in the cloud also increases the surface area for investigation and threat assessment. For example, chained microservices create many possible transitions in variables important to investigators.
For all of these reasons, adding insight from the application layer is necessary and good for cloud deployments, but integrating this insight quickly with infrastructure insight is better. Many investigation workflows jump back and forth across these layers several times in a single step, so fully integrated workflows will be essential to leverage the assimilation of new insight.
Dragon #3 – Investigation Times Measured in 10s of Minutes and Hours
In cloud and modern application deployment, the sheer volume of incoming data will make yesterday’s data avalanche seem like a pleasant snow dusting. Also, dynamic and transient data, entities, and nomenclature make workflows straightforward (although still slow and annoying) in the old world (e.g. track changing IP addresses for a user or machine) extremely challenging in the cloud. Finally, collaboration will require new models of distributed knowledge transfer since investigation workflows will be shared across both security and IT ops.
Many SOCs are at the breaking point in traditional environments with growing backlogs of investigations and reactive triage. Achieving investigation times in minutes to keep pace in the cloud despite these additional challenges, will require breakthrough innovation in getting rapid insight in huge dynamic data sets and in scaling learning models across both humans and machines.
Slaying these dragons will not be easy or quick – new solutions and thinking will collide with comfort zones, entrenched interests, perceived roles of people and process, and more than a few “sacred cows.” Despite these headwinds – I’m optimistic looking ahead based on two core beliefs: 1) The massive economic and technological leverage of the cloud has already led to many other transition dragons of comparable ferocity being attacked with zeal (e.g. DevSecOps, Data Privacy, Regional Regulation, etc.), and 2) unlike many other transitions a broad cross section of the individuals involved in these messy transitions on the front lines have far more to gain in the leap forward of their own skills, learning, and opportunity than they have to lose. Aside from that, the increasingly public scorecard of the attackers vs. the defenders will help keep us honest about progress along the way.