Google’s Secure AI Framework (SAIF)

The OWASP ML Top 10 and LLM Top 10 are vulnerability checklists. They name specific risks and rank them. Google’s Secure AI Framework (SAIF) does something different. It maps the entire AI application lifecycle, from data ingestion through model deployment to agent orchestration, and then overlays risks, responsible parties, and controls onto that map. Where OWASP gives you a ranked list of what can go wrong, SAIF gives you a structural model of where things go wrong, who is responsible for fixing them, and what controls exist at each point.

SAIF was first published in June 2023. The framework was substantially expanded to version 2.0 in early 2026 to cover agentic AI systems. In September 2025, Google donated the SAIF data to the Coalition for Secure AI (CoSAI), an OASIS Open project with 35 industry partners, making the Risk Map available as an open industry resource.

SAIF vs OWASP

The two frameworks serve different purposes and work best together.

	OWASP ML/LLM Top 10	Google SAIF
Format	Ranked vulnerability list	Lifecycle risk map with components, risks, and controls
Scope	Specific vulnerability classes	Entire AI development pipeline
Responsibility model	No explicit ownership assignment	Maps each control to model creator, model consumer, or both
Risk coverage	10 risks per list (ML and LLM separately)	15 risks covering both ML and LLM in a single framework
Control mapping	General mitigations described per risk	Named controls mapped to specific risks with explicit ownership
Governance	Not covered	Assurance and governance controls apply to all risks

OWASP is a technical checklist for identifying what can be attacked. SAIF is an organisational framework for building secure AI applications across the full pipeline. Many of SAIF’s 15 risks map directly to OWASP entries, but SAIF adds risks that OWASP does not cover, including Unauthorised Training Data, Excessive Data Handling, Model Source Tampering, Model Deployment Tampering, Model Exfiltration, and Model Reverse Engineering.

The four areas

SAIF divides an AI application into four areas, each containing multiple components. Every risk in the framework maps to one or more of these components, showing where the risk is introduced, where it is exposed, and where it can be mitigated.

Data

Covers everything related to the data the model learns from.

Data Sources are the original repositories from which data is gathered (databases, APIs, web scrapes, sensor feeds)
Data Filtering and Processing covers cleaning, transforming, labelling, deduplication, and synthetic data generation
Training Data is the final curated dataset fed into the model during training

In traditional software, code defines behaviour. In AI, data defines behaviour. Compromising training data is the AI equivalent of modifying application source code.

Infrastructure

Covers the hardware, storage, frameworks, and deployment systems underpinning the AI pipeline.

Model Frameworks and Code is the code and libraries required to train and run the model (PyTorch, TensorFlow, JAX, etc.)
Training, Tuning, and Evaluation covers the process of teaching, adjusting, and testing the model
Data and Model Storage includes both training data storage and model storage (local checkpoints, published model hubs)
Model Serving is the systems and processes that deploy a model in production

# Where infrastructure risks sit in a typical pipeline

Data Sources --> Data Filtering --> Training Data
                                         |
                                         v
Model Frameworks/Code --> Training, Tuning, Evaluation --> Model Storage
                                                              |
                                                              v
                                                        Model Serving --> Application

Model

The central area. Covers the model itself and how inputs and outputs are handled.

The Model is the pairing of code and weights produced by training
Input Handling covers filtering, sanitising, and protecting against malicious inputs
Output Handling covers filtering, sanitising, and protecting against unwanted or dangerous outputs

Application

Covers how users and external systems interact with the model.

Application is the product or feature that uses the model (a chatbot, a code assistant, an internal tool)
Agents are services or additional models called by the AI application to complete specific tasks (tool use, plugin calls, external API interactions)

Each agent or plugin connection opens a transitive set of risks, meaning the risks multiply with each external integration.

The 15 risks

SAIF defines 15 risks across the four areas. Each risk is mapped to who can mitigate it (model creator, model consumer, or both) and which controls address it.

Risk	OWASP equivalent	Who mitigates
Data Poisoning	ML02, LLM04	Model creators
Unauthorised Training Data	No direct OWASP equivalent	Model creators
Model Source Tampering	ML03 (partially)	Model creators
Excessive Data Handling	No direct OWASP equivalent	Model creators, consumers
Model Exfiltration	ML06	Model creators, consumers
Model Deployment Tampering	No direct OWASP equivalent	Model creators
Denial of ML Service	ML05, LLM10	Model consumers
Model Reverse Engineering	ML07	Model consumers
Insecure Integrated Component	LLM03 (supply chain)	Model consumers
Prompt Injection	LLM01	Model creators, consumers
Model Evasion	ML01	Model creators, consumers
Sensitive Data Disclosure	LLM02	Model creators, consumers
Inferred Sensitive Data	No direct OWASP equivalent	Model creators, consumers
Insecure Model Output	LLM05	Model creators, consumers
Rogue Actions	LLM06 (excessive agency)	Model consumers

Several risks have no direct OWASP equivalent. These are worth understanding because they cover gaps that OWASP’s lists do not address.

Unauthorised Training Data is when a model is trained on data it is not authorised to use. This is a legal and ethical risk rather than a technical attack. In 2023, Streaming Platforms removed ‘Heart on My Sleeve,’ an AI-generated track that cloned the voices of Drake and The Weeknd without authorisation, following a copyright complaint from Universal Music Group. The risk is about compliance with privacy policies, licensing agreements, and data protection regulations.

Excessive Data Handling occurs when data collection or retention exceeds what is permitted by privacy policies. This is distinct from data poisoning or disclosure because the data itself may be legitimate, but the way it is collected, stored, or retained violates policy or regulation.

Model Source Tampering targets the model’s code, dependencies, or weights directly, either through supply chain attacks or insider access. This includes model architecture backdoors, which are backdoors embedded in the neural network architecture definition and can survive full retraining.

Inferred Sensitive Data is distinct from Sensitive Data Disclosure. In disclosure, the model reveals data it was trained on. In inference, the model provides sensitive information it never had direct access to by reasoning from patterns in training data or prompts. The model works something out that it was never explicitly told.

Model Deployment Tampering targets the serving infrastructure rather than the model itself, compromising components used to deploy models in production.

Controls

SAIF organises controls into six categories, mapped to the four areas plus two cross-cutting categories.

Category	Scope	Example controls
Data	Training pipeline	Training Data Sanitisation, Training Data Management, Privacy-Preserving Technologies (PETs), User Data Management
Infrastructure	Storage, serving, tooling	Model and Data Access Controls, Integrity Management, Inventory Management, Secure-by-Default ML Tooling
Model	Input/output handling	Input Validation and Sanitisation, Output Validation and Sanitisation, Adversarial Training and Testing
Application	User and agent interaction	Application Access Management, User Transparency and Controls, Agent User Control, Agent Permissions
Assurance	Cross-cutting	Applied to all risks, all stages of the lifecycle
Governance	Cross-cutting	Applied to all risks, all stages of the lifecycle

Each control specifies who is responsible for implementation.

# Example: Output Validation and Sanitisation

Control:       Output Validation and Sanitisation
Action:        Block, nullify, or sanitise insecure output before passing to applications or users
Implemented by: Model Creators, Model Consumers
Risk mapping:   Prompt Injection, Rogue Actions, Sensitive Data Disclosure, Inferred Sensitive Data

# Example: Agent Permissions

Control:       Agent Permissions
Action:        Apply least-privilege as the upper bound on agent permissions,
               minimise tools the agent can interact with and actions it can take
Implemented by: Model Consumers
Risk mapping:   Insecure Integrated Component, Sensitive Data Disclosure, Rogue Actions

The distinction between model creator and model consumer is practical. If HackTheBox uses Google’s Gemini for a chatbot, Google is the model creator (responsible for training data sanitisation, adversarial training) and HackTheBox is the model consumer (responsible for application access management, agent permissions, output validation in their application layer).

The Risk Map

The SAIF Risk Map is the central reference that ties everything together. For each of the 15 risks, the map shows three things.

Risk introduction is where in the pipeline the risk originates (e.g. data poisoning is introduced at data sources, data filtering, or data storage)
Risk exposure is where the risk manifests in the running system (e.g. data poisoning is exposed during training/evaluation or in the model’s outputs)
Risk mitigation is where controls can be applied to address the risk (e.g. data poisoning is mitigated through data sanitisation, access controls, and integrity management early in development)

This three-point mapping makes the Risk Map useful for threat modelling AI applications. Instead of working from a flat list of vulnerabilities, you can trace each risk through the pipeline from origin to exposure to mitigation.

SAIF 2.0 and agent security

SAIF 2.0, released in early 2026, extends the framework to cover agentic AI systems. Agents differ from standard LLM applications because they take autonomous actions, calling tools, querying APIs, modifying data, and interacting with external services on the user’s behalf.

The agent extension adds new components to the Risk Map.

Perception processes user inputs and contextual data before they reach the reasoning core, and must distinguish trusted commands from untrusted environmental data
Reasoning Core plans and iterates on multi-step actions, with the level of autonomy directly governing the severity of a security failure
Orchestration manages agent memory, tool calls, RAG content, and auxiliary models, each of which is an attack surface
Response Rendering formats agent output for display and is a critical security boundary for preventing injection through dynamic content

New controls for agents include Agent Observability (monitoring agent actions), Agent User Control (requiring user approval for state-changing actions), and Agent Permissions (least-privilege applied dynamically based on context rather than statically).

The self-assessment

Google provides an interactive Risk Self-Assessment at saif.google/risk-self-assessment. It asks questions about your AI system’s security posture covering training, tuning and evaluation, access controls, adversarial input handling, coding frameworks, and agent configurations, then generates a tailored checklist of relevant SAIF controls. The assessment runs locally (Google does not collect answers or results) and is designed for security practitioners as a starting point for conversations about AI-specific risks in their organisation.

Type to search