{"id":412,"date":"2026-04-24T00:00:00","date_gmt":"2026-04-23T23:00:00","guid":{"rendered":"https:\/\/kosokoking.com\/?p=412"},"modified":"2026-04-18T20:37:39","modified_gmt":"2026-04-18T19:37:39","slug":"pca","status":"publish","type":"post","link":"https:\/\/kosokoking.com\/index.php\/technology\/pca\/","title":{"rendered":"PCA"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A network monitoring system ingests 200 features per connection. PCA reduces them to 15 principal components before the anomaly detector ever sees the data. That means 185 dimensions of information are gone. If your malicious traffic&#8217;s distinguishing characteristics live in those discarded dimensions, you are invisible by design.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The previous entry in this series introduced dimensionality reduction as a red team opportunity, since models that throw away information create blind spots and blind spots are operational cover. This article goes deeper into the specific mechanism. PCA is not a black box because its maths is fully deterministic, its outputs are inspectable, and the precise information it discards is calculable. For a red teamer, that means the blind spot is not just exploitable but it is also predictable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What PCA actually does<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">PCA takes a dataset with many features and compresses it into fewer features while keeping as much of the original variation as possible. The compressed features are called principal components. Each one is a linear combination of the original features, weighted by how much each original feature contributes to the overall spread of the data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first principal component captures the direction of maximum variance. The second captures the next most variance, subject to being orthogonal (perpendicular) to the first. The third is orthogonal to both, and so on. Each successive component captures less variance than the one before it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, an analyst retains enough components to explain 90-95% of the total variance and drops the rest. The retained components become the input to whatever model sits downstream, such as a classifier, a clustering algorithm, or an anomaly detector. Everything captured by the dropped components is absent from the model&#8217;s view of the world.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Variance, covariance, and eigenvectors<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Understanding why PCA keeps what it keeps requires three concepts that are not optional background because they are the mechanism.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Variance<\/strong>&nbsp;measures how spread out a feature&#8217;s values are. A feature where every data point is nearly identical has low variance. A feature with a wide range of values has high variance. PCA treats variance as a proxy for information, which means the more a feature varies, the more it presumably tells you about the differences between data points.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This assumption is worth questioning. Variance is not the same thing as relevance. A feature that fluctuates wildly but carries no meaningful signal (noise) will score higher in PCA&#8217;s ranking than a feature with subtle but consistent differences between classes. PCA cannot distinguish between informative variance and noisy variance. It measures spread, not meaning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Covariance<\/strong>&nbsp;measures how two features move together. If feature A tends to increase when feature B increases, their covariance is positive. If they move in opposite directions, it is negative. If they are independent, their covariance is near zero. PCA uses the full covariance matrix (every feature&#8217;s covariance with every other feature) to identify which directions in the combined feature space capture the most overall spread.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Eigenvectors and eigenvalues<\/strong>&nbsp;are the outputs of decomposing that covariance matrix. Each eigenvector points in a direction through the feature space. Its corresponding eigenvalue tells you how much of the total variance sits along that direction. The eigenvector with the largest eigenvalue is the first principal component. The eigenvector with the second largest eigenvalue is the second principal component. Sort them by eigenvalue, pick the top k, and you have your reduced feature set.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The eigenvalue equation is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>C * v = \u03bb * v\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Where C is the covariance matrix, v is the eigenvector, and \u03bb is the eigenvalue. Solving this equation identifies the directions (v) and their associated magnitudes (\u03bb). In practice, this is computed using eigenvalue decomposition or singular value decomposition (SVD), which is more numerically stable when the covariance matrix is large or poorly conditioned.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The algorithm, step by step<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">PCA follows a fixed sequence. Each step has implications for how the output can be attacked or predicted.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Standardise the data.<\/strong>&nbsp;Subtract the mean and divide by the standard deviation for each feature. This puts all features on the same scale. Without this step, features measured in large units (bytes transferred, for instance) would dominate the variance calculation simply because their numbers are bigger, not because they carry more information.<\/li>\n\n\n\n<li><strong>Compute the covariance matrix.<\/strong>&nbsp;Calculate the covariance between every pair of standardised features. The result is a square matrix where element (i, j) represents how features i and j co-vary. This matrix encodes the relationships between all features simultaneously.<\/li>\n\n\n\n<li><strong>Decompose the covariance matrix.<\/strong>&nbsp;Extract the eigenvectors and eigenvalues. Each eigenvector is a direction in feature space. Each eigenvalue is the variance along that direction.<\/li>\n\n\n\n<li><strong>Sort by eigenvalue.<\/strong>&nbsp;Rank the eigenvectors from highest eigenvalue to lowest. The first eigenvector captures the most variance, the last captures the least.<\/li>\n\n\n\n<li><strong>Select the top k components.<\/strong>&nbsp;Choose how many components to retain. The standard heuristic is to keep enough components to explain 90-95% of the cumulative variance. Everything else gets dropped.<\/li>\n\n\n\n<li><strong>Project the data.<\/strong>&nbsp;Multiply the original data matrix by the matrix of selected eigenvectors. The result is the lower-dimensional representation that feeds into the downstream model.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>Y = X * V\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Where X is the original data, V is the matrix of selected eigenvectors, and Y is the transformed data in the reduced space.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why this matters for red teaming<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">he adversarial implications of PCA sit in three places, which include what it keeps, what it drops, and how the decision of what to drop gets made.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The discarded dimensions are unmonitored territory<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The previous article introduced this concept. Here is the operational detail. When PCA retains 15 of 200 features&#8217; worth of variance, the 185 dimensions it discards are not just deprioritised. They are structurally absent from the downstream model. No amount of tuning the anomaly detector&#8217;s sensitivity will help it detect patterns in dimensions it never receives.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If a red teamer can determine which original features are captured by the retained components (and which are captured by the discarded ones), they can concentrate their activity in the ignored dimensions. This is not a matter of being &#8220;subtle.&#8221; It is a matter of operating in a space the defender has explicitly elected to discard.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Practically, this means understanding the loadings, which are the weights that define how much each original feature contributes to each principal component. A feature with near-zero loadings across all retained components is invisible to the downstream model. A feature with high loadings on discarded components is invisible by definition.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Variance is not importance<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">PCA&#8217;s core assumption is that variance equals information. While this is often reasonable, it is not always true.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Consider a malware classifier that uses PCA to reduce binary features before classification. If a particular API call (say, a rare call to a cryptographic library) appears in only 2% of samples, its variance is low. PCA will rank it below features that vary widely across the dataset, like file size or import count. But that rare API call might be the single most discriminative feature for identifying ransomware. PCA does not know this. It measures spread, not class separation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">An adversary who understands this gap can craft samples that are anomalous in low-variance features (which PCA will discard) while remaining normal in high-variance features (which PCA will retain). The downstream model sees a normal-looking data point. The malicious behaviour is confined to dimensions the model was never given.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The standardisation step is a lever<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Before PCA runs, the data is standardised so that each feature is centred to zero mean and scaled to unit variance. This step determines how much influence each feature has on the covariance matrix. If standardisation is done incorrectly, or if the training data used to compute the mean and standard deviation is poisoned, the resulting principal components will reflect the poisoned statistics rather than the true data distribution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">An attacker with access to the training pipeline (or the ability to inject data into it) can shift the mean or inflate the standard deviation of specific features. This alters how PCA weights those features during decomposition, potentially pushing a genuinely informative feature into the discarded components or elevating a noisy feature into the retained ones.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is subtle because the PCA step runs before the model trains, and therefore poisoning PCA does not require poisoning the model&#8217;s labels. It requires poisoning the data statistics that PCA uses to decide what to keep.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where PCA appears in security infrastructure<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">PCA is a preprocessing step, not a classifier. It rarely appears in security tooling as a visible, named component. Instead, it sits inside pipelines:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Network intrusion detection.<\/strong>&nbsp;High-dimensional connection metadata (packet sizes, inter-arrival times, protocol flags, byte distributions) is reduced to a handful of components before being fed into a clustering or classification model.<\/li>\n\n\n\n<li><strong>Malware analysis.<\/strong>&nbsp;Static features extracted from binaries (import tables, section headers, string distributions, opcode frequencies) are reduced before classification to manage the curse of dimensionality.<\/li>\n\n\n\n<li><strong>User behaviour analytics.<\/strong>&nbsp;Login times, access patterns, data volumes, and application usage are compressed into behavioural profiles that anomaly detectors evaluate.<\/li>\n\n\n\n<li><strong>Image-based systems.<\/strong>&nbsp;Facial recognition and OCR systems historically used PCA (eigenfaces being the classic example) to reduce pixel-space representations to manageable dimensions before matching.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In each case, the dimensionality reduction step is a design choice that trades information for tractability. The model downstream is faster and more stable with fewer features. But &#8220;faster and more stable&#8221; is a defender&#8217;s priority. An attacker&#8217;s priority is to identify what was sacrificed for that stability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Choosing the number of components<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The most common method for selecting k (the number of retained components) is the explained variance ratio. Analysts plot the cumulative percentage of variance explained as components are added, and choose the point where the curve reaches 90-95%.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This heuristic assumes that the remaining&nbsp;5\u201310% of variance&nbsp;is noise. While that is sometimes the case, it is just as often the exact signal an attacker is using to remain undetected.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There is no principled way to determine, from PCA alone, whether the discarded variance is noise or signal. PCA operates on the covariance structure of the data. It has no access to labels, no concept of classes, and no notion of which features are relevant to the downstream task. It is a statistical compression technique being used as a feature selector, and those are different jobs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A defender who retains 95% of variance and assumes the other 5% is disposable has made an assumption about their threat model, whether they realise it or not. They have assumed that no adversary will concentrate their activity in the low-variance subspace. That assumption is testable. And if you are reading this series, you know how to test it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Assumptions and their failure modes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">PCA makes three assumptions about the data. Each one is a potential attack surface.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Linearity.<\/strong>&nbsp;PCA finds linear combinations of features. If the real structure of the data is nonlinear (curved decision boundaries, interaction effects between features), PCA will miss it. The principal components will capture a linear approximation of the true structure, and the gap between the approximation and reality is exploitable. Kernel PCA exists to address this, but it is computationally expensive and less commonly deployed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Correlation.<\/strong>&nbsp;PCA works best when features are correlated, because correlated features can be compressed into fewer components without much information loss. If features are already independent, PCA offers no reduction and no benefit. In security datasets, feature independence often means the analyst chose features carefully. In adversarial settings, it means the attacker can perturb features independently without the perturbation being &#8220;corrected&#8221; by correlated features pulling the data point back toward normalcy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Scale sensitivity.<\/strong>&nbsp;Without standardisation, PCA is dominated by features with large numerical ranges. This is well understood and almost always addressed in practice. But the standardisation itself introduces a dependency on the training data statistics. If those statistics are compromised, the downstream PCA is compromised.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Defence considerations<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Defending PCA as a pipeline component requires treating dimensionality reduction as a security-relevant decision, not just a performance optimisation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Monitor the explained variance ratio across retraining cycles. If the distribution of variance across components shifts, it may indicate distribution drift in the data, which could be organic or adversarial. A sudden change in which components capture the most variance warrants investigation before the downstream model is retrained on the new representation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Audit the loadings matrix. Know which original features contribute to each retained component, and which features are captured only by discarded components. If a feature that is operationally significant (a known indicator of compromise, a high-fidelity detection feature) has negligible loadings on the retained components, that feature is effectively absent from the model. This is a design flaw, not a tuning problem. The fix is either to retain more components or to exclude that feature from PCA and pass it directly to the downstream model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Test the downstream model&#8217;s sensitivity to perturbations in discarded dimensions. Generate synthetic samples that are normal in the retained component space but anomalous in the discarded dimensions, and verify that the pipeline detects them. If it does not (and by construction, it should not), the gap needs compensating controls: separate monitors on the raw features, parallel detection paths, or a different reduction technique that preserves class-relevant variance rather than total variance (such as linear discriminant analysis).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Validate the standardisation statistics. If the training pipeline is accessible to untrusted data sources, the mean and standard deviation used for standardisation are poisonable. Robust alternatives (median and interquartile range, for example) are harder to shift with outlier injection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The reduction is the risk<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">PCA is good at what it does. It finds the directions of maximum spread in a dataset and compresses the data along those directions. The maths is clean, the implementation is deterministic, and the compression ratio can be dramatic. Fifty dimensions become five, and the downstream model trains faster, generalises better, and overfits less.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But every compression is a choice about what to discard. And in an adversarial context, what the defender discards is what the attacker inherits. PCA does not know which variance is signal and which is noise. It does not know which features a red teamer will target because it only knows which directions have the widest spread and keeps those, while everything else goes quiet.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How PCA works, why its discarded dimensions are unmonitored attack surface, and what red teamers need to know about exploiting dimensionality reduction.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[640,630,51,635,671,669,136,633,670,663],"class_list":["post-412","post","type-post","status-publish","format-standard","hentry","category-technology","tag-adversarial-ai","tag-ai-red-teaming","tag-cybersecurity","tag-data-poisoning","tag-dimensionality-reduction","tag-feature-engineering","tag-machine-learning","tag-ml-fundamentals","tag-principal-component-analysis","tag-unsupervised-learning"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/412","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/comments?post=412"}],"version-history":[{"count":2,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/412\/revisions"}],"predecessor-version":[{"id":414,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/412\/revisions\/414"}],"wp:attachment":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/media?parent=412"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/categories?post=412"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/tags?post=412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}