Data analysis in a cyber security platform
CACI Information Intelligence are working with the Government on a prototype cyber security platform to improve capability in the area of automated network defence.
The platform ingests messages from disparate sources, including mainstream cyber security sensors and several bespoke sensors that are unique to our customer, and analyse the resulting data so a corresponding action can be automated.
Before you can respond to a cyber threat you must first detect that threat in real time, with attackers aiming to subvert detection by any means necessary, sometimes ‘real time’ can mean data spread across days, months or years, depending on the value of the target.
Our ingest pipeline was designed with these factors in mind, developing a scalable solution using message queues and data stored in elastic search which allows us to receive a large amount of fine grained data from sensors. The data received into the system is characteristically small and noisy. The natural background noise on these networks can make it difficult to decipher a legitimate cyber-attack and sensors often only report small snippets of information. The first step is to normalise the data, extracting common features from the sensors data such as devices involved, files, URLs, timings and severity.
Once all data is normalised into a common model, we seek to understand more about it by passing it through an enrichment process. In an ideal world you would perform a high level of enrichment on every message, but this is computationally expensive – especially if it requires a 3rd party service such as a DNS lookup. We aim to perform a basic level of enrichment on every message, for example, by using internal databases we can geocode external IPs to their country. An asset library of all known devices within the system is a valuable resource, adding in device information, physical location and operational status. We can optionally use 3rd party enrichment or even query back into the network using tools like Osquery on an on-demand basis. This allows us to make decisions on how to enhance the dataset while balancing the load on the system and network.
Once the ingest process has finished enrichment, we have a large pool of data to analyse. To reduce the burden on the Cyber Analyst we make use of ML techniques. Using Recommendation Engines, we can look at the previous actions performed by the user for similar messages and make a suggestion on the correct response. If the confidence of the recommendation is high, then a response can be automated by the system, for example blocking an attacker’s access to the network.
Our UI is a key part of the application and allows for Cyber Analysts to browse the data within the system. They follow threads through the data, spotting patterns that could indicate the presence of an attacker. It’s important the UI can enable the user’s workflow of pivoting on the data and following these threads. We provide tools to group the data together to provide context. The system has built-in tasks to replicate these groups and present it to the user, should a similar pattern of events happen in the future. The sequence of events can be as important as the body of the events themselves, and our system’s UI accounts for this by enabling a timeline view.
Ultimately the automated data analysis and ML applied in this project means that the caseload of the Cyber Analyst is reduced, responses to cyber threats can be made at all hours of the day with significant levels of trust and can result in the fast, automatic removal of a malicious entity from an unmonitored network.