Introduction to prompt injection

Prompt injection is a vulnerability class in which an attacker manipulates an LLM’s input to override its intended behaviour, bypass safety controls, or extract information that should not be accessible through the prompt interface. It is ranked as LLM01:2025 in the OWASP Top 10 for LLM Applications, the highest-severity risk category. This article covers the architectural reason prompt injection exists, how system and user prompts interact, how multi-turn conversation context works, and how multimodal inputs extend the attack surface.

System prompts and user prompts

Most real-world LLM deployments use two types of prompt. The system prompt contains the guidelines and rules that govern the model’s behaviour, and the user prompt is the actual input from the person interacting with the system. Consider a customer support chatbot. Its system prompt might look similar to this.

You are a friendly customer support chatbot.
You are tasked to help the user with any technical issues regarding our platform.
Only respond to queries that fit in this domain.
This is the user's query:

The system prompt attempts to restrict the model to a specific task. The user prompt is whatever the customer types into the chat window.

The root cause of prompt injection

The core issue is that LLMs do not have separate input channels for system prompts and user prompts. The model operates on a single text input. To make the model follow both the system instructions and the user query, the two are concatenated into one prompt before being fed to the model.

You are a friendly customer support chatbot.
You are tasked to help the user with any technical issues regarding our platform.
Only respond to queries that fit in this domain.
This is the user's query:

Hello World! How are you doing?

From the model’s perspective, this is one continuous stream of tokens. There is no inherent mechanism that tells the model which tokens are instructions from the developer and which are input from the user. The model processes all of them through the same attention layers, and any part of the input can influence the output.

This is the architectural root cause of prompt injection. Because the model cannot reliably distinguish system instructions from user input, an attacker can craft a user prompt that mimics or overrides the system prompt’s instructions. The OWASP community description of LLM01 states that prompt injection vulnerabilities exist because of how models process prompts, and how input may force the model to incorrectly pass prompt data to other parts of the model. Unlike SQL injection, where parameterised queries provide a reliable structural defence, there is no equivalent separation mechanism for natural language prompts.

This distinction matters for red teamers. Prompt injection is not a bug in a specific implementation that can be patched. It is a property of how current LLM architectures process input. Defences can reduce the attack surface and raise the difficulty, but they cannot eliminate the vulnerability entirely as long as the model treats all input as a single text stream.

Multi-turn conversation context

LLM-based applications typically support back-and-forth conversations where the model appears to remember earlier messages. This is achieved by including previous messages as additional context in each new prompt.

For example, the first prompt in a conversation might be structured as follows.

You are ChatGPT, a helpful chatbot. Assist the user with any legal requests.

USER: How do I print "Hello World" in Python?

When the user sends a follow-up message, the previous exchange is included to provide context.

You are ChatGPT, a helpful chatbot. Assist the user with any legal requests.

USER: How do I print "Hello World" in Python?
ChatGPT: To print "Hello World" in Python, simply use the print() function like
this: print("Hello World")

USER: How do I do the same in C?

The model can infer from the context that “the same” refers to printing “Hello World”, even though the second prompt does not state this explicitly. This context-passing mechanism is what makes conversational LLM applications feel coherent.

From a security perspective, multi-turn context introduces additional attack surface. If an attacker’s prompt injection payload is included in the conversation history, it persists across subsequent turns and continues to influence the model’s behaviour. The exact structure of multi-turn prompts (how different actors and messages are separated, what role labels are used, whether special tokens demarcate boundaries) varies between implementations and is often kept secret in production deployments.

Multimodal attack surfaces

This article focuses on text-based prompt injection, but it is important to understand that multimodal models extend the attack surface significantly. Models that process images, audio, and video alongside text are vulnerable to prompt injection payloads delivered through any of those input types.

Image-based prompt injection is the most studied multimodal variant. An attacker embeds text instructions directly into an image, either as visible text that the model’s vision encoder reads, or as adversarial perturbations optimised to influence the model’s internal representations. A Cloud Security Alliance research note on image-based prompt injection documents that current vision-language models do not distinguish between the visual content a user intends to show the model and instructions embedded in that content. Once processed by the vision encoder, adversarial instructions enter the same instruction-following pathway as legitimate text prompts.

Audio-based injection works on a similar principle. Malicious instructions can be encoded into audio inputs, either as spoken commands or as adversarial noise patterns below the human hearing threshold. Research on multimodal prompt injection notes that audio-based attacks have achieved success rates above 86% against models like Phi-4-Multimodal and Qwen2.5-Omni, with the adversarial audio remaining inaudible to human listeners.

Video-based injection combines visual and audio vectors. Adversarial instructions can be embedded in individual frames, in audio tracks, or spread across both. Peer-reviewed surveys of the 2022 to 2025 literature note that video-based prompt injection is less extensively studied than image-based attacks, but the theoretical attack surface is broad because video combines multiple modalities that the model processes through separate encoders before merging them into a shared representation.

The key takeaway for text-focused red teaming is that defences designed to detect text-based prompt injection (input sanitisation filters, injection classifiers, safety alignment training) do not transfer to other modalities. A model that is resilient to text-based injection may be fully vulnerable to the same payload delivered as text within an image.

Summary

Prompt injection exists because LLMs process system prompts and user prompts as a single text stream with no reliable boundary between them. Multi-turn conversation context compounds this by persisting earlier inputs across subsequent turns. Multimodal models extend the attack surface to images, audio, and video, where text-based defences do not apply. The vulnerability is architectural rather than implementation-specific, which is why OWASP ranks it as the number one risk in the LLM Top 10.

Type to search