The ML OWASP Top 10

OWASP maintains a Machine Learning Security Top 10 (currently in draft) that catalogues the most significant risks facing ML-based systems. It is a separate project from the better-known OWASP Top 10 for LLM Applications, which covers prompt injection, output handling, and the generative AI stack. The ML Security Top 10 applies to the broader field: classifiers, regression models, anomaly detectors, and any deployment where a trained model makes decisions. The two lists overlap in places (supply chain risks, model manipulation), but they address different threat models and different points in the technology stack.

The ten risks map to four clusters based on where in the ML lifecycle the attack lands. That grouping is more useful than the raw numbering, because it reflects how an adversary would actually approach an ML system.

Inference-time attacks

These two risks target the moment the model processes input and produces output.

ML01 (input manipulation) covers any attack that modifies input data to cause incorrect model output. The typical mechanism involves applying small perturbations to benign input, perturbations that are imperceptible to humans but shift the model’s classification. Eykholt et al. demonstrated this concretely in their 2017 RP2 (Robust Physical Perturbations) research: black-and-white stickers placed on a stop sign caused a road sign classifier to misread it as a speed limit sign, with an 84.8% success rate in video captured from a moving vehicle. The perturbations were robust to changes in angle, distance, and lighting. This class of attack applies to any model that accepts external input, whether that is images, text, network packets, or binary features.

ML09 (output integrity) targets the model’s output rather than its input. The model classifies correctly, but the attacker intercepts and alters the result before the downstream system processes it. In a malware classification pipeline, for example, this means changing a “malicious” label to “benign” after the model has already made the correct call. The model itself is never compromised, which makes detection difficult through model-level monitoring alone. Defence here sits at the application layer: integrity checks and authenticated channels between the model and the system consuming its output.

Training-time attacks

Four risks target the training phase, where the model’s parameters and decision boundaries are shaped.

ML02 (data poisoning) involves injecting malicious or mislabelled samples into the training dataset. The goal is to shift the model’s learned boundaries so that it misclassifies specific inputs or degrades in overall accuracy. Systems that collect training data at scale from public or semi-public sources are most exposed, because the ingestion pipeline often lacks sample-level provenance verification. Gu, Dolan-Gavitt, and Garg demonstrated the severity of this with their 2017 BadNets research, in which a poisoned training pipeline installed a backdoor in a neural network that passed standard validation on clean test data but misclassified any input containing the attacker’s chosen trigger pattern.

ML08 (model skewing) is a targeted variant of data poisoning. Rather than degrading overall performance, the attacker biases the model’s output in a specific direction. The mechanism is the same (injecting manipulated training samples), but the intent is precision: creating a blind spot for a particular payload or class of input rather than broadly undermining the model.

ML10 (model poisoning) bypasses the training data and manipulates the model’s weights directly. This requires access to the model parameters, which limits the attack surface to scenarios involving insider access, compromised storage, or supply chain tampering. Arbitrary weight changes produce obvious performance degradation, but carefully crafted parameter modifications can introduce targeted misbehaviour while preserving normal accuracy on standard benchmarks.

ML07 (transfer learning attacks) exploit the widespread practice of fine-tuning pre-trained base models rather than training from scratch. If the base model has been tampered with (backdoored weights, embedded biases), that manipulation can persist through fine-tuning on clean task-specific data. The risk is structural: foundation model training costs millions in compute, so most organisations rely on open-source pre-trained weights from public repositories. Verification of those weights before fine-tuning is rarely thorough enough to catch well-designed backdoors, particularly when the malicious behaviour activates only on specific trigger inputs.

Confidentiality attacks

Three risks target the secrecy of the model itself or its training data.

ML05 (model theft) is the process of duplicating a model’s functionality through systematic querying. The attacker sends inputs to a deployed model, collects the corresponding outputs, and uses those input-output pairs to train a replica. No access to the model’s architecture or parameters is required, only API access. The primary defences are rate limiting, query pattern monitoring, and limiting the information density of model responses (returning labels rather than probability distributions).

ML03 (model inversion) reverses the model’s function. The attacker trains a separate model on the target model’s outputs to reconstruct information about its training inputs. This is a privacy risk when training data contains sensitive records, such as medical imaging or financial information. The attack becomes significantly harder when the target model returns less information per query, making output minimisation (bare labels rather than full class probabilities) a practical countermeasure.

ML04 (membership inference) determines whether a specific data sample was part of the model’s training set. Models tend to exhibit higher confidence and lower error rates on samples they were trained on, and that behavioural gap is the signal the attacker exploits. Where training data includes medical, financial, or otherwise regulated records, confirming membership constitutes a privacy breach under frameworks such as GDPR.

Supply chain attacks

ML06 (AI supply chain attacks) covers vulnerabilities across the ML ecosystem: third-party libraries, pre-trained model weights, training data sources, ML frameworks, and deployment infrastructure. The ML supply chain extends beyond traditional software dependencies to include artefacts that most software composition analysis tools do not track, specifically datasets, model checkpoints, and training configurations. A trojaned model hosted on a public repository, a compromised dataset, or a vulnerability in a training framework like PyTorch or TensorFlow all fall under this category. Standard supply chain controls (dependency pinning, integrity hashing, provenance verification) apply, but they need to be extended to cover these ML-specific artefacts.

Defensive mapping

The four clusters map directly to defensive priorities:

Inference-time (ML01, ML09): input validation, adversarial robustness testing (IBM’s Adversarial Robustness Toolbox is purpose-built for this), and integrity verification between the model and the consuming system.
Training-time (ML02, ML07, ML08, ML10): data provenance tracking, statistical monitoring of training distributions, and pre-training weight verification before fine-tuning.
Confidentiality (ML03, ML04, ML05): output minimisation (labels over probability distributions), query rate limiting and pattern detection, and differential privacy during training.
Supply chain (ML06): integrity checks covering datasets, model checkpoints, and frameworks alongside standard code dependencies.

The security controls that most organisations have in place were designed for deterministic software, not for systems whose behaviour was learned from data. The ML OWASP Top 10 names the gap. Closing it requires extending existing security programmes to cover the full ML lifecycle, from data ingestion through model deployment, with tooling and processes that most security teams are still building.

Type to search