Google’s Secure AI Framework (SAIF)

The OWASP ML Top 10 and LLM Top 10 are vulnerability checklists. They name specific risks and rank them. Google’s Secure AI Framework (SAIF) does something different. It maps the entire AI application lifecycle, from data ingestion through model deployment to agent orchestration, and then overlays risks, responsible parties, and controls onto that map. Where OWASP gives you a ranked list of what can go wrong, SAIF gives you a structural model of where things go wrong, who is responsible for fixing them, and what controls exist at each point.

SAIF was first published in June 2023. The framework was substantially expanded to version 2.0 in early 2026 to cover agentic AI systems. In September 2025, Google donated the SAIF data to the Coalition for Secure AI (CoSAI), an OASIS Open project with 35 industry partners, making the Risk Map available as an open industry resource.

SAIF vs OWASP

The two frameworks serve different purposes and work best together.

OWASP ML/LLM Top 10Google SAIF
FormatRanked vulnerability listLifecycle risk map with components, risks, and controls
ScopeSpecific vulnerability classesEntire AI development pipeline
Responsibility modelNo explicit ownership assignmentMaps each control to model creator, model consumer, or both
Risk coverage10 risks per list (ML and LLM separately)15 risks covering both ML and LLM in a single framework
Control mappingGeneral mitigations described per riskNamed controls mapped to specific risks with explicit ownership
GovernanceNot coveredAssurance and governance controls apply to all risks

OWASP is a technical checklist for identifying what can be attacked. SAIF is an organisational framework for building secure AI applications across the full pipeline. Many of SAIF’s 15 risks map directly to OWASP entries, but SAIF adds risks that OWASP does not cover, including Unauthorised Training Data, Excessive Data Handling, Model Source Tampering, Model Deployment Tampering, Model Exfiltration, and Model Reverse Engineering.

The four areas

SAIF divides an AI application into four areas, each containing multiple components. Every risk in the framework maps to one or more of these components, showing where the risk is introduced, where it is exposed, and where it can be mitigated.

Data

Covers everything related to the data the model learns from.

  • Data Sources are the original repositories from which data is gathered (databases, APIs, web scrapes, sensor feeds)
  • Data Filtering and Processing covers cleaning, transforming, labelling, deduplication, and synthetic data generation
  • Training Data is the final curated dataset fed into the model during training

In traditional software, code defines behaviour. In AI, data defines behaviour. Compromising training data is the AI equivalent of modifying application source code.

Infrastructure

Covers the hardware, storage, frameworks, and deployment systems underpinning the AI pipeline.

  • Model Frameworks and Code is the code and libraries required to train and run the model (PyTorch, TensorFlow, JAX, etc.)
  • Training, Tuning, and Evaluation covers the process of teaching, adjusting, and testing the model
  • Data and Model Storage includes both training data storage and model storage (local checkpoints, published model hubs)
  • Model Serving is the systems and processes that deploy a model in production
# Where infrastructure risks sit in a typical pipeline

Data Sources --> Data Filtering --> Training Data
                                         |
                                         v
Model Frameworks/Code --> Training, Tuning, Evaluation --> Model Storage
                                                              |
                                                              v
                                                        Model Serving --> Application

Model

The central area. Covers the model itself and how inputs and outputs are handled.

  • The Model is the pairing of code and weights produced by training
  • Input Handling covers filtering, sanitising, and protecting against malicious inputs
  • Output Handling covers filtering, sanitising, and protecting against unwanted or dangerous outputs

Application

Covers how users and external systems interact with the model.

  • Application is the product or feature that uses the model (a chatbot, a code assistant, an internal tool)
  • Agents are services or additional models called by the AI application to complete specific tasks (tool use, plugin calls, external API interactions)

Each agent or plugin connection opens a transitive set of risks, meaning the risks multiply with each external integration.

The 15 risks

SAIF defines 15 risks across the four areas. Each risk is mapped to who can mitigate it (model creator, model consumer, or both) and which controls address it.

RiskOWASP equivalentWho mitigates
Data PoisoningML02, LLM04Model creators
Unauthorised Training DataNo direct OWASP equivalentModel creators
Model Source TamperingML03 (partially)Model creators
Excessive Data HandlingNo direct OWASP equivalentModel creators, consumers
Model ExfiltrationML06Model creators, consumers
Model Deployment TamperingNo direct OWASP equivalentModel creators
Denial of ML ServiceML05, LLM10Model consumers
Model Reverse EngineeringML07Model consumers
Insecure Integrated ComponentLLM03 (supply chain)Model consumers
Prompt InjectionLLM01Model creators, consumers
Model EvasionML01Model creators, consumers
Sensitive Data DisclosureLLM02Model creators, consumers
Inferred Sensitive DataNo direct OWASP equivalentModel creators, consumers
Insecure Model OutputLLM05Model creators, consumers
Rogue ActionsLLM06 (excessive agency)Model consumers

Several risks have no direct OWASP equivalent. These are worth understanding because they cover gaps that OWASP’s lists do not address.

Unauthorised Training Data is when a model is trained on data it is not authorised to use. This is a legal and ethical risk rather than a technical attack. In 2023, Streaming Platforms removed ‘Heart on My Sleeve,’ an AI-generated track that cloned the voices of Drake and The Weeknd without authorisation, following a copyright complaint from Universal Music Group. The risk is about compliance with privacy policies, licensing agreements, and data protection regulations.

Excessive Data Handling occurs when data collection or retention exceeds what is permitted by privacy policies. This is distinct from data poisoning or disclosure because the data itself may be legitimate, but the way it is collected, stored, or retained violates policy or regulation.

Model Source Tampering targets the model’s code, dependencies, or weights directly, either through supply chain attacks or insider access. This includes model architecture backdoors, which are backdoors embedded in the neural network architecture definition and can survive full retraining.

Inferred Sensitive Data is distinct from Sensitive Data Disclosure. In disclosure, the model reveals data it was trained on. In inference, the model provides sensitive information it never had direct access to by reasoning from patterns in training data or prompts. The model works something out that it was never explicitly told.

Model Deployment Tampering targets the serving infrastructure rather than the model itself, compromising components used to deploy models in production.

Controls

SAIF organises controls into six categories, mapped to the four areas plus two cross-cutting categories.

CategoryScopeExample controls
DataTraining pipelineTraining Data Sanitisation, Training Data Management, Privacy-Preserving Technologies (PETs), User Data Management
InfrastructureStorage, serving, toolingModel and Data Access Controls, Integrity Management, Inventory Management, Secure-by-Default ML Tooling
ModelInput/output handlingInput Validation and Sanitisation, Output Validation and Sanitisation, Adversarial Training and Testing
ApplicationUser and agent interactionApplication Access Management, User Transparency and Controls, Agent User Control, Agent Permissions
AssuranceCross-cuttingApplied to all risks, all stages of the lifecycle
GovernanceCross-cuttingApplied to all risks, all stages of the lifecycle

Each control specifies who is responsible for implementation.

# Example: Output Validation and Sanitisation

Control:       Output Validation and Sanitisation
Action:        Block, nullify, or sanitise insecure output before passing to applications or users
Implemented by: Model Creators, Model Consumers
Risk mapping:   Prompt Injection, Rogue Actions, Sensitive Data Disclosure, Inferred Sensitive Data
# Example: Agent Permissions

Control:       Agent Permissions
Action:        Apply least-privilege as the upper bound on agent permissions,
               minimise tools the agent can interact with and actions it can take
Implemented by: Model Consumers
Risk mapping:   Insecure Integrated Component, Sensitive Data Disclosure, Rogue Actions

The distinction between model creator and model consumer is practical. If HackTheBox uses Google’s Gemini for a chatbot, Google is the model creator (responsible for training data sanitisation, adversarial training) and HackTheBox is the model consumer (responsible for application access management, agent permissions, output validation in their application layer).

The Risk Map

The SAIF Risk Map is the central reference that ties everything together. For each of the 15 risks, the map shows three things.

  • Risk introduction is where in the pipeline the risk originates (e.g. data poisoning is introduced at data sources, data filtering, or data storage)
  • Risk exposure is where the risk manifests in the running system (e.g. data poisoning is exposed during training/evaluation or in the model’s outputs)
  • Risk mitigation is where controls can be applied to address the risk (e.g. data poisoning is mitigated through data sanitisation, access controls, and integrity management early in development)

This three-point mapping makes the Risk Map useful for threat modelling AI applications. Instead of working from a flat list of vulnerabilities, you can trace each risk through the pipeline from origin to exposure to mitigation.

SAIF 2.0 and agent security

SAIF 2.0, released in early 2026, extends the framework to cover agentic AI systems. Agents differ from standard LLM applications because they take autonomous actions, calling tools, querying APIs, modifying data, and interacting with external services on the user’s behalf.

The agent extension adds new components to the Risk Map.

  • Perception processes user inputs and contextual data before they reach the reasoning core, and must distinguish trusted commands from untrusted environmental data
  • Reasoning Core plans and iterates on multi-step actions, with the level of autonomy directly governing the severity of a security failure
  • Orchestration manages agent memory, tool calls, RAG content, and auxiliary models, each of which is an attack surface
  • Response Rendering formats agent output for display and is a critical security boundary for preventing injection through dynamic content

New controls for agents include Agent Observability (monitoring agent actions), Agent User Control (requiring user approval for state-changing actions), and Agent Permissions (least-privilege applied dynamically based on context rather than statically).

The self-assessment

Google provides an interactive Risk Self-Assessment at saif.google/risk-self-assessment. It asks questions about your AI system’s security posture covering training, tuning and evaluation, access controls, adversarial input handling, coding frameworks, and agent configurations, then generates a tailored checklist of relevant SAIF controls. The assessment runs locally (Google does not collect answers or results) and is designed for security practitioners as a starting point for conversations about AI-specific risks in their organisation.

Leave a Reply

Your email address will not be published. Required fields are marked *

RELATED

The OWASP Top 10 for LLM applications

A reference walkthrough of all ten OWASP LLM Application risks for 2025, with code examples, real-world incidents, and a defensive…

Manipulating a model

How input manipulation and data poisoning bend ML classifiers (Model) with minimal effort, and why standard accuracy metrics miss the…

Training and evaluating a malware classifier

Training a byteplot CNN on Malimg to 88.54% accuracy, then see why overall accuracy on an imbalanced dataset misleads and…

Building a malware classifier on ResNet50

Transfer learning turns a frozen ImageNet backbone into a ResNet50 malware classification model on the Malimg dataset, and shows where…