Unsupervised learning

Your SIEM flags anomalous network behaviour. A fraud engine clusters transactions and quarantines the ones that fall outside established groups. A user behaviour analytics platform profiles employees and alerts when someone deviates from pattern. Behind each of these systems is an unsupervised learning model that was never told what an attack looks like. It was told what normal looks like, and everything else is suspect. That distinction is the attack surface.

This is the next entry in the AI red teaming series. Previous articles covered supervised models: logistic regression, decision trees, Naive Bayes, support vector machines. Each of those learns from labelled data, where the training set includes explicit examples of “good” and “bad.” Unsupervised learning operates without labels. The model receives raw data and has to discover structure on its own, such as clusters, patterns, and outliers. No one tells it what a threat is, it has to infer what does not belong by first learning what does.

For a red teamer, this creates a different kind of opportunity. Supervised models have decision boundaries you can map and cross. Unsupervised models have definitions of normality you can learn and mimic. The evasion strategy shifts from “cross the boundary without triggering the classifier” to “look normal enough that the model never flags you at all.”

The three problems unsupervised learning solves

Unsupervised learning covers a broad family of algorithms, but in security-adjacent deployments, the work usually falls into three categories.

Clustering groups similar data points together based on feature similarity. K-Means, DBSCAN, and hierarchical clustering are the workhorses. In production, clustering powers customer segmentation, network traffic grouping, and malware family identification. The model decides which data points belong together. Anything that falls outside established clusters, or forms a cluster of its own, draws attention.

Dimensionality reduction compresses data from many features into fewer features while preserving the relationships that matter. Principal Component Analysis (PCA) and t-SNE are the most common implementations. Security teams use dimensionality reduction to visualise high-dimensional log data, compress feature sets before feeding them into downstream classifiers, and reduce noise in network telemetry.

Anomaly detection identifies data points that deviate from expected patterns. Isolation Forest, Local Outlier Factor, and autoencoders are common choices. This is the category most directly relevant to red teaming, because anomaly detection models are the ones actively hunting for you during an engagement.

How similarity measures shape the attack surface

Every unsupervised model relies on a definition of distance or similarity to decide which data points are alike and which are not. The choice of distance metric determines what the model can see and what it is blind to.

Euclidean distance measures the straight-line distance between two points in feature space. It is the default for K-Means and many anomaly detection algorithms. Euclidean distance treats all features equally and is sensitive to scale, which means a single feature measured in thousands (like packet sizes) will dominate the distance calculation over features measured in single digits (like connection counts). If the defender has not applied feature scaling, you can hide anomalous behaviour in the dimensions the model effectively ignores.

Cosine similarity measures the angle between two feature vectors rather than the distance between them. Two vectors pointing in the same direction have high cosine similarity regardless of their magnitude. This metric is common in text analysis and document clustering. An attacker manipulating text content (phishing emails, malicious documents) can shift cosine similarity by injecting tokens that rotate the feature vector toward the “benign” cluster, without needing to match the magnitude of legitimate documents.

Manhattan distance sums the absolute differences across all feature dimensions. It is less sensitive to outliers in individual features than Euclidean distance, but more sensitive to coordinated small deviations across many features. Red teamers operating against Manhattan distance models need to avoid changing many features simultaneously, even by small amounts, because the deviations accumulate linearly.

The practical takeaway is that the distance metric the model uses determines what evasion looks like. Before you can blend into “normal,” you need to know what the model is measuring when it calculates normal.

Clustering

K-Means is the most widely deployed clustering algorithm, and understanding how it works reveals why it is exploitable.

The algorithm starts by placing K centroids at random positions in feature space. Each data point is assigned to the nearest centroid. The centroids are then recalculated as the mean of all points assigned to them. This assignment-and-update cycle repeats until the centroids stop moving. The result is K clusters, each defined by its centroid’s position and the radius of points assigned to it.

Three properties of K-Means matter for red teaming.

First, K is chosen before the algorithm runs. The defender picks the number of clusters in advance. If network traffic is clustered into five groups and an attacker’s traffic profile falls between two existing clusters, the model will force it into whichever cluster is closest rather than creating a new group for it. This means that a moderately unusual traffic pattern gets absorbed into a legitimate cluster rather than flagged. You do not need to perfectly mimic normal traffic. You need to be closer to a legitimate centroid than you are to the boundary of detection.

Second, centroids are means. They are pulled by every data point in the cluster. If an attacker can inject data into the training pipeline (a poisoning attack), even a small number of strategically placed data points will shift a centroid’s position. Over time, this can stretch a cluster to encompass traffic patterns that would previously have been flagged. The model’s definition of “normal” quietly expands to include you.

Third, K-Means uses Euclidean distance by default. All the weaknesses of Euclidean distance described above apply directly. If the feature space includes both high-variance and low-variance features, the algorithm’s attention is concentrated on the high-variance dimensions. Anomalous behaviour in low-variance features goes unnoticed.

DBSCAN works differently. Instead of specifying the number of clusters in advance, DBSCAN finds clusters by identifying dense regions of data points and labelling sparse regions as noise. Points that do not belong to any dense region are classified as outliers. This makes DBSCAN more resistant to the “forced assignment” weakness of K-Means, because it is willing to label data as noise rather than assigning everything to a cluster. But DBSCAN has its own parameters (epsilon, the neighbourhood radius, and minPts, the minimum cluster size) that determine what counts as dense. If an attacker knows those parameters, they can operate just inside the density threshold: close enough to an existing cluster to avoid the noise label, different enough to pursue their objective.

Dimensionality reduction

PCA works by identifying the directions of maximum variance in the data and projecting all data points onto those directions. The first principal component captures the most variance, the second captures the next most, and so on. In practice, analysts keep enough components to explain 90-95% of the variance and discard the rest.

The discarded components are where the red team opportunity lives.

If a malicious feature’s variance is small relative to the dominant features in the dataset, PCA will project it onto a component that gets dropped during dimensionality reduction. The feature still exists in the raw data, but it is invisible to any downstream model that operates on the reduced feature set. An attacker who understands which features are captured by the retained components, and which are captured by the discarded ones, can concentrate their activity in the dimensions the defender has explicitly decided to ignore.

This is not a theoretical concern. Consider a network monitoring system that applies PCA to reduce 50 features of connection metadata to 10 principal components before feeding them into an anomaly detector. The reduced representation captures the broad patterns of normal traffic: packet sizes, connection durations, port distributions. But subtle features like specific byte patterns in payloads, timing jitter between requests, or low-frequency DNS queries may land on components 30 through 50, the ones that were discarded. An attacker can operate freely in those dimensions because the model has already decided they are noise.

t-SNE and UMAP are used primarily for visualisation rather than as preprocessing for detection, but they introduce a different risk. These algorithms preserve local structure (nearby points stay nearby) at the expense of global structure (the overall distances between clusters become unreliable). A security analyst using t-SNE to visually inspect cluster separation might conclude that malicious traffic is clearly separated from normal traffic when in reality the visual separation is an artefact of the algorithm’s distortion of global distances. The analyst’s confidence in their detection capability is higher than the model’s actual capability.

Anomaly detection

Anomaly detection is where unsupervised learning most directly opposes red team operations. These models are actively trying to find the data points that do not fit.

Isolation Forest works by randomly partitioning feature space with decision trees. The intuition is that anomalies are few and different, so they are easier to isolate. A data point that can be separated from the rest with very few splits is likely anomalous. Normal data points are deeply embedded in dense clusters and require many splits to isolate. The score is based on the average number of splits needed to isolate a point across many random trees.

For evasion, this means an attacker needs their traffic to be difficult to isolate. Sitting in a dense region of feature space, surrounded by many similar data points, increases the number of splits needed. An attacker who generates traffic that closely mirrors a high-volume, high-density normal traffic pattern forces the Isolation Forest to work harder to separate them. The model is not looking for specific signatures. It is looking for anything that is easy to separate. Making yourself hard to separate is the evasion objective.

Local Outlier Factor (LOF) compares the local density around a data point to the local density around its neighbours. If a point’s neighbourhood is much sparser than its neighbours’ neighbourhoods, it is flagged as an outlier. LOF is harder to evade than Isolation Forest because it uses relative density rather than absolute isolation. You cannot simply sit in a moderately dense region. You need your local density to be comparable to your neighbours’ density. This means you need to generate enough data points around your position in feature space that the neighbourhood looks populated, or you need to position yourself within an existing dense cluster where the local density ratio will not flag you.

Autoencoders take a different approach entirely. An autoencoder is a neural network that compresses input data into a lower-dimensional representation and then reconstructs it. The model is trained only on normal data. When it encounters anomalous data, it reconstructs it poorly because the anomalous patterns were not captured during training. The reconstruction error, the difference between the input and the model’s reconstruction, is the anomaly score. High reconstruction error means the model does not recognise the pattern.

Evading an autoencoder means presenting data that the model can reconstruct accurately. If the attacker’s traffic shares structural features with the normal training data (similar distributions, similar correlations between features, similar temporal patterns), the autoencoder will reconstruct it with low error. The evasion challenge is that autoencoders capture non-linear relationships between features, which makes it harder to simply adjust individual features independently. You need to maintain the statistical relationships between features, not just the feature values themselves.

Feature scaling

Unsupervised models are sensitive to feature scaling in ways that supervised models often are not. K-Means, PCA, and distance-based anomaly detectors all behave differently depending on whether features are scaled.

If a defender applies Min-Max scaling (compressing all features to a 0-1 range), every feature contributes equally to distance calculations. This eliminates the “hide in the high-variance dimension” strategy. But Min-Max scaling introduces a different vulnerability: it is sensitive to outliers in the training data. A single extreme value in the training set compresses the useful range of that feature, reducing the model’s resolution for that dimension. If an attacker can inject an extreme value during training (via poisoning), they can effectively blind the model to variation in that feature.

Standardisation (Z-score normalisation) centres features at zero mean with unit variance. This is more robust to outliers than Min-Max scaling but assumes features are approximately normally distributed. If a feature’s true distribution is heavily skewed, standardisation distorts it, and the distance calculations made on the standardised data do not accurately reflect the original data’s structure.

The practical point: the scaling method a defender uses is part of the attack surface. It determines which evasion strategies are viable and which are not.

Defence

Defending unsupervised systems requires accepting their fundamental limitation because they define normality, and an attacker who understands that definition can conform to it. No single mitigation eliminates this, but several measures raise the cost of evasion substantially.

Rotate or retrain models on fresh data regularly. If an attacker is slowly poisoning cluster centroids, frequent retraining limits how far those centroids can drift before the influence is reset. Monitor centroid positions between training cycles. Significant movement without a corresponding business reason is a signal.

Use ensemble approaches. Run multiple unsupervised algorithms with different distance metrics on the same data. An attacker who optimises their evasion for Euclidean distance in K-Means may be caught by a cosine similarity model running in parallel. The cost of evading one model is manageable. The cost of evading three simultaneously is much higher.

Validate dimensionality reduction choices. Do not accept a 95% variance threshold for PCA without examining what the discarded components contain. If security-relevant features load heavily on discarded components, the reduction is actively harmful. Periodically audit the loadings of the retained components against the features that matter for detection.

Treat feature scaling as a security decision. The choice between Min-Max, standardisation, and robust scaling (which uses median and interquartile range instead of mean and variance) affects which evasion strategies work. Robust scaling is more resistant to poisoning via extreme values. Document the scaling choice and its implications as part of the model’s security posture.

Monitor for clustering instability. If cluster assignments for the same data points change significantly between training runs, it may indicate that the data distribution is being manipulated. Stable clusters on clean data should produce consistent assignments.

The real problem with unsupervised detection

Unsupervised learning models are the backbone of behavioural detection in modern security stacks: UEBA platforms, network anomaly detection, fraud engines. They run where signatures fail and labelled data does not exist. Their strength is generality. Their weakness is the same.

A model that was never told what an attack looks like can only tell you what normal looks like. Everything else is an inference. And an adversary who studies the model’s definition of normality does not need to find a vulnerability, they need to be boring. They need to generate traffic that is statistically indistinguishable from the thousand other things the model has already decided are unremarkable.

Leave a Reply

Your email address will not be published. Required fields are marked *

RELATED

Preprocessing the spam dataset

Every text cleaning step in a spam classifier either blocks an evasion path or opens one. See how preprocessing shapes…

Bayesian spam classification: the dataset

Preparing the SMS Spam Collection dataset for Bayesian classification, covering download, extraction, loading, and cleaning through an adversarial lens.

Spam classification: Naive Bayes filters

How Naive Bayes spam filters work, why the independence assumption makes them exploitable, and how GoodWords attacks broke email filtering…

Metrics for evaluating a model

Learn how accuracy, precision, recall, and F1-score work in practice, where each metrics deceive, and how adversaries exploit the gaps…