K-means clustering

Inject fifty carefully positioned data points into a training pipeline and a K-means clustering model will shift its cluster centroids to accommodate them, no labels need to be forged, no ground truth needs to be corrupted. The model recalculates the mean of each cluster, absorbs the poisoned points, and adjusts its boundaries accordingly. It does this because the algorithm’s entire learning mechanism is arithmetic averaging, and arithmetic averages are trivially influenced by strategic outliers. The model has no mechanism to question whether the data it just absorbed belongs there.

When a supervised classifier misclassifies an attack as benign, someone can eventually check the labelled dataset and trace the error. When a K-means model absorbs malicious activity into a benign cluster, there is no label to check. The model simply decided that the attacker’s traffic looked like it belonged. The defender may never know the boundary moved.

Where K-means sits in security infrastructure

K-means clustering appears in production security systems more often than most practitioners realise. User and Entity Behaviour Analytics (UEBA) platforms use it to establish behavioural baselines, grouping users into clusters based on login times, access patterns, and resource usage. Network traffic analysis tools use it to segment flows into “normal” categories and flag outliers. Malware triage pipelines cluster samples by behavioural features to identify families without requiring manual labelling of every binary. Log aggregation systems use it to group similar events and surface anomalies.

n each of these cases, the security logic follows the same pattern where the system clusters the known activity and then treats anything that falls outside those clusters as suspicious. The implicit assumption is that the clusters represent ground truth, they do not. They represent whatever structure K-means found in whatever data it was given. If the data is compromised, the structure is compromised. There is no secondary validation.

How K-means learns

The algorithm is simple enough to describe in four steps, and that simplicity is itself an attack surface.

First, K centroids are initialised, either randomly or using K-means++ (which spreads the initial centroids more evenly to avoid poor starting positions). Second, every data point is assigned to the nearest centroid based on Euclidean distance. Third, each centroid is recalculated as the mean of all points assigned to it. Fourth, the assignment and recalculation steps repeat until the centroids stabilise or a maximum iteration count is reached.

The Euclidean distance between two data points with n features is:

d(x, y) = sqrt(sum((xi - yi)^2))

Every property of this algorithm creates a specific attack surface. The centroid is a mean, so it moves toward injected data. The assignment uses Euclidean distance, so it is dominated by high-magnitude features. The initialisation is random (or pseudo-random), so it produces inconsistent clusters across runs. The algorithm does not validate whether K is correct, so the defender’s choice of K determines what gets grouped together.

Choosing K, and why attackers care

The number of clusters is a hyperparameter. The model does not learn it. The defender chooses it, typically using one of two methods.

The elbow method runs K-means for a range of K values and plots the within-cluster sum of squares (WCSS) against K. WCSS measures how tightly packed each cluster is, meaning that lower values indicate the points within clusters are closer to their centroids. The WCSS always decreases as K increases (more clusters means less variance per cluster), so the defender looks for an “elbow” in the curve where the rate of decrease drops off sharply. The K value at the elbow is the typical choice.

Silhouette analysis is more quantitative. For each data point, a silhouette score measures how similar the point is to its own cluster compared to the nearest neighbouring cluster. Scores range from -1 (wrong cluster) through 0 (on the boundary) to 1 (well-matched). The K value that maximises the average silhouette score across all points is considered optimal.

Both methods assume clean data. Neither accounts for an adversary who has influence over the training pipeline. If an attacker can inject data before the defender runs the elbow method or silhouette analysis, the optimal K will be calculated against a corrupted dataset. The defender will confidently select a K value that has already been influenced by the attacker.

Suppose a UEBA system clusters employees into four behavioural groups as a concrete example. An attacker with access to a compromised account gradually shifts that account’s behaviour over several weeks, moving its feature values closer to a cluster that includes system administrators with broad access. When the model is retrained, the centroid of the admin cluster shifts to accommodate the attacker’s data. The account that should have been flagged as anomalous is now centroid-adjacent, It belongs.

The attack surfaces, systematically

Centroid poisoning

K-means centroids are arithmetic means. This is the single most exploitable property of the algorithm. The mean of a dataset is sensitive to every data point, weighted equally. An attacker who can inject or modify training data can shift a centroid in a predictable direction by a calculable amount.

The mechanics are direct. If a cluster contains n points and its centroid is at position c, adding m new points at position p shifts the centroid to:

c_new = (n * c + m * p) / (n + m)

The attacker controls m and p. The defender’s cluster size n is often observable or estimable. The centroid displacement is deterministic. This is not a probabilistic attack. It is arithmetic.

In practice, this means an attacker can expand a “benign” cluster to encompass their own activity, or contract a “suspicious” cluster to exclude it. The model will recalculate its boundaries and the new boundaries will be treated as authoritative.

Evasion through feature manipulation

Once an attacker understands which features K-means uses and how the clusters are structured, evasion is a distance calculation problem. The attacker crafts inputs whose feature values place them closer to a benign centroid than to any suspicious one.

K-means uses Euclidean distance, which means every feature contributes to the distance calculation proportionally to its variance. If the defender has not normalised the features, a single high-magnitude feature can dominate the distance. An attacker who knows this can manipulate that one feature to control cluster assignment while leaving all other features untouched.

If the features are normalised, the attacker’s task is harder but not fundamentally different. They must distribute their manipulation across features rather than concentrating it in one. The geometry of the problem remains: get closer to the target centroid than to any other.

Exploiting initialisation instability

Standard K-means uses random initialisation, which means the same dataset can produce different clusterings on different runs. K-means++ improves this by spreading initial centroids to reduce variance, but it does not eliminate non-determinism.

For an attacker, this means the clusters they are trying to evade may look different each time the model retrains. This is both an obstacle and an opportunity. The obstacle is that a static evasion strategy may not survive retraining. The opportunity is that unstable clusterings produce unstable alert boundaries. A defender who retrains weekly may have different “normal” baselines each week, which creates windows where anomalous behaviour is temporarily normal according to the model.

The spherical cluster assumption

K-means assumes clusters are spherical (equal variance in all directions) and roughly equal in size. Real data rarely cooperates. Network traffic patterns form elongated, irregularly shaped distributions. User behaviour data clusters into groups of vastly different sizes, where most users behave similarly and a small number of privileged accounts have distinct patterns.

When the data violates these assumptions, K-means forces spherical boundaries onto non-spherical data. This creates dead zones, which are regions of feature space that are technically inside a cluster boundary but do not actually contain normal data. An attacker whose crafted inputs land in these dead zones will be assigned to a cluster and treated as normal, even though no legitimate data point occupies that region.

Outlier sensitivity

K-means has no built-in mechanism for handling outliers. Because the centroid is a mean, a single extreme data point can drag it significantly away from the cluster’s true centre. An attacker who injects extreme outliers can distort cluster geometry without needing to inject a large volume of data.

This is particularly effective in low-density clusters. If a cluster contains twenty points, a single outlier injected at distance d from the centroid will shift it by approximately d/21. For a cluster with a thousand points, the same outlier shifts the centroid by d/1001. Attackers who know the cluster sizes can calculate exactly how many outliers are needed to achieve a target centroid displacement.

What defenders actually miss

The most common failure pattern is not a specific attack. It is the assumption that K-means output represents ground truth. A security team that builds alerting logic on top of K-means clusters, without independently validating that the clusters correspond to meaningful categories, is building on sand. The model will always produce K clusters. It will produce K clusters even if the data has no real cluster structure. It will produce K clusters even if half the data has been poisoned.

The second failure is retraining on poisoned data without recognising it. Most K-means deployments in production retrain on rolling windows of recent data. An attacker who maintains a slow, consistent data poisoning campaign over several retraining cycles can shift cluster boundaries incrementally. Each individual retraining looks normal because the change is small. The cumulative effect is a model that has silently reclassified the attacker’s behaviour as legitimate.

The third failure is treating the choice of K as a one-time decision. The optimal K for a dataset changes as the underlying data distribution changes. Defenders who set K during initial deployment and never revisit it are using a model whose structural assumptions may no longer hold.

Defences that actually work

Monitor centroid drift between retraining cycles. If a centroid moves by more than a defined threshold (calibrated to the cluster’s historical variance), flag the retraining batch for review before deploying the updated model. This catches both gradual poisoning and bulk injection.

Validate clusters against independent ground truth. If the UEBA system clusters users into four groups, a human analyst should periodically verify that those four groups correspond to meaningful behavioural categories. If cluster 3 is supposed to be “standard office workers” but now includes two accounts with domain admin privileges, something has shifted.

Use robust variants of K-means. K-medoids replaces the mean with the median data point as the centroid, which is less sensitive to outliers. Trimmed K-means excludes a percentage of the most extreme points before calculating centroids. Both are more expensive to compute but significantly harder to poison.

Normalise features before clustering, and audit the normalisation pipeline. If features are not normalised, an attacker who controls a single high-magnitude feature controls cluster assignment. If features are normalised but the normalisation statistics (mean, standard deviation) are calculated from the same data the attacker can poison, the normalisation itself becomes an attack vector. Calculate normalisation statistics from a clean holdout dataset when possible.

Run silhouette analysis after each retraining, not just during initial setup. A sudden drop in average silhouette score indicates that the cluster structure has degraded, which may signal poisoning, concept drift, or a K value that no longer fits the data.

The unsupervised gap

K-means is the first model in this series that learns without labels. Every supervised model covered previously had a built-in accountability mechanism because the training labels could be audited, the predictions could be checked against known outcomes, and the error rate could be measured. K-means has none of that, it produces structure from data and declares that structure valid. The defender’s only option is to validate externally, and external validation requires the kind of manual effort that clustering was supposed to eliminate.

That tension is not a flaw in K-means. It is the fundamental trade-off of unsupervised learning. You gain the ability to find structure without labelling. You lose the ability to verify that the structure is correct. For a red teamer, that gap is the operating environment. Every attack against K-means exploits the same root cause: the model cannot distinguish between structure it discovered and structure the attacker planted.

Type to search