ML system security

ML system security covers the infrastructure layer beneath the model and its application, including hardware, operating systems, and system configuration. Traditional infrastructure hardening principles apply here, but insecure model deployments and the computational cost of inference introduce risks that do not exist in conventional systems. This article covers misconfigurations, insecure deployments, resource exhaustion attacks, and the tactics adversaries use to exploit them.

What the system component includes

The system component of an ML-based system is everything that sits below the application layer. This includes the physical or virtual hardware, the operating system, network configuration, firewall rules, and the deployment infrastructure that serves the model. In cloud-hosted deployments, it also includes container orchestration platforms, storage backends for training data and model artefacts, and the CI/CD pipelines that push models from development into production.

Because this layer is structurally identical to traditional IT infrastructure, it carries the same categories of risk. The OWASP Top 10:2025 ranks security misconfiguration at number two, up from number five in 2021, and notes that 100% of tested applications showed some form of misconfiguration. ML infrastructure is no exception. If anything, it is more exposed, because ML serving frameworks, notebook environments, and model registries are often deployed by data science teams who may not apply the same hardening standards as infrastructure engineers.

Misconfiguration risks

Misconfigurations occur when security settings or system parameters are left in their default state, improperly set, or inadvertently exposed. In ML infrastructure, common examples include open network ports on model serving endpoints, weak access control lists on training data storage, exposed administrative interfaces for platforms like MLflow or Jupyter, and default credentials on dashboards and APIs.

These are often trivial to find. Automated scanning tools can identify exposed services, default credentials, and missing security headers across large IP ranges in minutes. The simplicity of discovery is what makes misconfigurations dangerous. An attacker does not need a sophisticated exploit when the administrative console is reachable from the internet and still uses the factory password.

IBM’s research on MLOps platform abuse documents how attackers obtain credentials through file shares, intranet sites, user workstations, and social engineering, then use those credentials to access MLOps platforms that manage the entire model lifecycle. Once inside, an attacker can modify training data, tamper with model artefacts, or pivot to enterprise data lakes connected to the platform.

Insecure model deployments

Insecure deployments introduce a separate category of risk beyond general infrastructure misconfiguration. When ML models are deployed without authentication on their inference endpoints, without encryption in transit, or without input validation on the data they accept, they become directly exploitable through the attack vectors covered in previous articles in this series.

A model serving endpoint exposed over HTTP without authentication allows anyone with network access to submit inference requests, extract model predictions, and potentially reverse-engineer the model’s behaviour through systematic querying. Google Cloud’s AI security guidance recommends using private endpoints based on Private Service Connect with prebuilt or custom containers, and running automated checks for insecure endpoints and misconfigured serving infrastructure before any model reaches production.

Missing input validation on the serving endpoint is equally problematic. If the endpoint accepts arbitrary payloads without size limits, type checking, or rate limiting, it is vulnerable to both adversarial ML attacks (crafted inputs designed to manipulate model behaviour) and conventional infrastructure attacks like resource exhaustion.

Resource exhaustion and denial of service

Denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks against ML systems follow the same general principle as traditional DoS, which is to overwhelm system resources until the service becomes unavailable. The difference is that ML inference is computationally expensive by nature, which means an attacker needs fewer requests to achieve the same effect.

In traditional web applications, adversaries flood the system with a high volume of HTTP requests to exhaust CPU, memory, and network bandwidth. In ML-based systems, adversaries can achieve resource exhaustion more efficiently by submitting inference requests with inputs designed to maximise processing time. For large language models, this means crafting prompts that push the model toward maximum-length output generation, or submitting inputs near the context window limit to force the model into worst-case execution paths. The OWASP LLM Top 10 lists model denial of service as a distinct risk category (LLM04) for this reason.

A concrete example is CVE-2025-48956, a DoS vulnerability in vLLM, one of the most widely used inference and serving engines for large language models. The vulnerability allowed an unauthenticated attacker to crash a vLLM server by sending a single HTTP GET request with an extremely large header value. The server’s HTTP parser attempted to load the entire header into memory, causing memory exhaustion and a crash. The vulnerability affected vLLM versions 0.1.0 through 0.10.1.1 and received a CVSS score of 7.5 (High). The fix was to enforce limits on HTTP header sizes. The fact that this limit was missing in the first place is a textbook example of an insecure deployment default.

In systems with automated scaling, resource exhaustion attacks have a secondary effect. The infrastructure attempts to scale up to handle the surge in demand, which increases operational costs even if the service itself stays available. This turns a denial-of-service attack into a denial-of-wallet attack.

Resource exhaustion can also serve as a diversionary tactic. While security teams focus on mitigating the visible effects of the DoS, adversaries can exploit vulnerabilities in another component and evade detection during the disruption.

Tactics, techniques, and procedures

Vulnerability scanning and exploitation

Adversaries use vulnerability scanners to identify outdated software components, unpatched libraries, and known CVEs in the ML serving stack. ML infrastructure is particularly susceptible to this because the frameworks evolve rapidly and organisations often run older versions of tools like TensorFlow Serving, Triton Inference Server, or vLLM that contain known vulnerabilities.

Once a vulnerable component is identified, the attacker matches it against public exploit databases (NVD, Exploit-DB, GitHub advisories) and attempts exploitation. The speed of this process means that any publicly disclosed vulnerability in an ML serving framework becomes exploitable within days if patches are not applied.

Credential attacks

Password spraying and brute-force attacks target authentication mechanisms on exposed services. Password spraying is particularly effective against ML infrastructure because many serving endpoints, notebook environments, and model management dashboards are deployed with SSH access, web-based login pages, or API keys that use weak or default credentials.

In a password spraying attack, the adversary tries a small number of commonly used passwords against a large number of accounts, staying below lockout thresholds. This is harder to detect than a brute-force attack against a single account because the failed login attempts are spread across many usernames. If the system exposes administrative interfaces to the public internet without rate limiting or multi-factor authentication, password spraying becomes a low-effort, high-reward tactic.

Brute-force attacks against encryption keys or API tokens follow a similar pattern. If the key space is small or the token is predictable, an attacker can enumerate valid values and gain access to model endpoints, training data, or system configuration.

Configuration probing

Adversaries probe server software, firewalls, and access control configurations through security testing techniques. This includes sending requests designed to trigger verbose error messages that reveal software versions, directory structures, or internal network topology. It also includes testing firewall rules by probing port ranges and testing whether access control lists permit unintended traffic.

In ML infrastructure, configuration probing often targets the gap between what the data science team deployed and what the security team expected to be deployed. Model serving endpoints, experiment tracking dashboards, and Jupyter notebook servers are frequently stood up as internal tools and later exposed to the internet without anyone updating the security posture.

Summary

The system component of an ML-based system carries every traditional infrastructure risk, from misconfigurations and default credentials through to unpatched software and exposed administrative interfaces. Insecure model deployments add ML-specific risks, particularly around unauthenticated inference endpoints and missing input validation. Resource exhaustion attacks are amplified by the computational cost of ML inference, turning a modest volume of malicious requests into a service outage or a cost escalation. The TTPs adversaries use at this layer (vulnerability scanning, credential attacks, and configuration probing) are identical to those used against any other infrastructure, which means the defences are identical too.

Type to search