Skip to main content

Anomaly Detection - Bitdefender TechZone

Abstract

Discover the power of Anomaly Detection in cybersecurity. Learn how Bitdefender Labs customizes machine-learning models to safeguard systems with personalized protection.

Anomaly detection is a technique used to identify patterns or observations that do not adapt to an expected behavior or norm. The goal is to identify anomalies or outliers that might indicate unusual or suspicious behavior. Anomaly detection typically involves building a model of normal behavior based on historical data. This model could be statistical, machine learning-based, or rule-based. In many cases, anomaly detection is performed in a machine-learning setting, meaning that the algorithm does not require labeled examples of normal and abnormal behavior during training. Instead, it learns what is considered normal based on the majority of the training data.

Anomaly Detection

When anomalies result from malicious actions, malicious adversaries often adapt themselves to make anomalous observations appear normal, thereby making the task of defining normal behavior more difficult. The Bitdefender Labs team created several custom machine-learning models that we use in our anomaly detection mechanism. Following regulations like GDPR, the data always stays on the computer or server and is not transferred anywhere outside the system. Our custom machine-learning models used in anomaly detection are trained individually on each customer’s system. This is not a mistake, each system in each customer’s environment has its own machine-learning model, customized for the uniqueness of that system.

The model observes the system's behavior, trying to find anomalies in the observed behavior of users, processes, and the system. Using an EDR sensor that monitors activity data and generates events related to User Logins, Network Connections, and Process Creations, we could build an Anomaly Detection system on top of EDR events. This involves correlating events, extracting, and processing features, and creating a behavioral baseline. Any deviation from the baseline can be considered potentially malicious.

Finally, it compares observer behavior with MITRE® indicators of attacks, custom indicators of attacks developed by Bitdefender Labs, and user-specific events. In time the model is adjusted continuously as the baseline of expected versus unexpected behavior changes, and these anomalies are then identified and communicated to the security teams.

Anomaly Detection Algorithms

The most common algorithms we are using in our Anomaly Detection model are Seasonal Auto-Regressive Integrated Moving Average (SARIMA), Seasonal and Trend decomposition using Loess (STL), and Moving Average. Below in this article, we will briefly describe all three algorithms.

SARIMA

SARIMA is a time series forecasting method designed to handle seasonality. The model training process begins by collecting a historical time-series dataset that represents the normal behavior of the system. Using this data, we can train a SARIMA model capable of capturing both trend and seasonality in time series data, which is then used to predict future time points. By comparing the actual value with the predicted value, the module can calculate a marker. If the marker exceeds the set threshold, mark the data point as an anomaly.

For example, if the system observes that a specific action, like launching a particular software tool such as PowerShell, typically occurs once a month but suddenly happens more frequently, a few times in one day, SARIMA might alert us to investigate this event. It could be a sign of an anomaly.

STL

STL algorithm represents the underlying long-term direction in the data. It is a time series decomposition algorithm that separates a time series into its three main components:

  • Seasonality - refers to the regular and repeating patterns in a time series that occur at fixed intervals

  • Trend - represents the underlying long-term direction in the data. It captures the overall movement or tendency of the time series.

  • Remainder - represents the variability or noise that is left in the time series after removing the seasonality and trend. It includes short-term fluctuations and irregularities that are not part of the overall pattern.

STL can identify unusual patterns or outliers in time series data. By focusing on the remainder component, which encapsulates irregularities and unexpected variations in the data. This method is effective when anomalies exhibit patterns not easily captured by the seasonality and trend components.

For example, STL can detect anomalies in events like process encrypt file, scheduled task change, procdump, a process executed, shadow copy deletes, task kill process executed, etc.

Moving Average

Moving Averages are used in anomaly detection by comparing the actual values of a time series to the expected or smoothed values predicted by the moving average. Anomalies are identified when the observed values deviate significantly from the expected values. As a general approach, we first calculate the moving average based on time series data to obtain residuals. Residuals represent the difference between the observed data and the smoothed trend. By analyzing the residuals, we can establish a threshold to determine which anomalies can be detected.

For example, an anomaly can be detected when there is an unexpectedly high volume of outgoing network traffic. This could suggest data exfiltration or unauthorized access that requires immediate investigation.

Recommended Content

To learn more about the technologies included in the Detection layer we recommend reading the next article Integrity Monitoring.