AI Fundamentals to Red Team AI systems

You cannot attack a system you have not bothered to understand. That principle holds whether you are staring at a misconfigured Active Directory forest or a large language model hallucinating its way through a prompt injection. The tooling differs but the principle does not. This is the first post in a series where I learn to red team AI, in public, from the ground up. Not by skipping to the exploits and pretending I already know the theory, but by doing the unglamorous work of understanding how these systems actually function before I try to break them. If you want to come along for the ride, this is where we start.

Why Practitioners Need This

Most red teamers and security professionals interact with AI daily, whether through tooling, detection pipelines, or the LLMs now embedded in half the SaaS products we manage. But ask the average pentester to explain the difference between a neural network and a machine learning model, and the answer gets vague fast.

That vagueness is a liability. AI red teaming is not prompt engineering with attitude. It requires understanding what sits beneath the interface: how models learn, what they optimise for, where their assumptions break, and why certain architectures are more susceptible to specific attack classes than others.

So let us build the map before we try to navigate the territory.

Artificial Intelligence

Artificial Intelligence is the broadest term in this stack. It describes any system designed to perform tasks that would ordinarily require human cognition: understanding language, recognising objects, making decisions, solving problems. AI is the ambition. Everything else is a method for achieving it.

The field spans several disciplines:

Natural Language Processing (NLP): enabling machines to parse, interpret, and generate human language
Computer Vision: giving machines the ability to interpret images and video
Robotics: building systems that act autonomously in physical environments
Expert Systems: encoding human decision-making logic into rule-based frameworks

From a red teaming perspective, the important thing to internalise here is scope. AI is not one technology. It is a collection of approaches, each with its own attack surface. An expert system built on hand-coded rules breaks differently from a transformer model trained on terabytes of internet text. Knowing which flavour of AI you are facing determines which adversarial techniques even apply.

AI already operates in domains where the stakes are not theoretical. In healthcare, it drives diagnostic models and drug discovery pipelines. In finance, it flags fraudulent transactions and informs trading strategies. In cybersecurity (our home turf), it powers threat detection, malware classification, and behavioural analytics. Every one of those use cases carries consequences when the model gets it wrong, or when someone makes it get it wrong deliberately.

Machine Learning

Machine Learning is a subfield of AI, and it is where things start to get mechanically interesting. Rather than programming explicit rules (“if X, then Y”), ML systems learn patterns from data and use those patterns to make predictions or decisions on inputs they have never seen before.

This distinction matters for adversarial work. A rule-based system fails when you find an edge case the developer did not anticipate. An ML system fails when you manipulate the patterns it has learned to trust. Different failure mode. Different attack surface.

ML splits into three broad categories, and each one has its own adversarial implications:

Supervised Learning

The model trains on labelled data, where every input has a known correct output. Think image classification (this is a cat, this is a dog), spam detection, or fraud prevention. The model learns the mapping between features and labels, then applies that mapping to new data.

Why it matters for red teaming: supervised models are only as reliable as their training data. Poison the labels, skew the distribution, or craft inputs that sit in the decision boundary between classes, and the model’s confidence becomes a vulnerability rather than a feature.

Unsupervised Learning

No labels. The model finds structure in raw data on its own: customer segmentation, anomaly detection, dimensionality reduction. It clusters, groups, and identifies patterns without being told what to look for.

Why it matters for red teaming: unsupervised models that power anomaly detection are the backbone of many security tools. Understanding how they define “normal” is the first step to operating beneath their threshold.

Reinforcement Learning

The model learns through interaction with an environment, receiving rewards for desirable outcomes and penalties for undesirable ones. This is how game-playing agents, robotics controllers, and autonomous driving systems are trained.

Why it matters for red teaming: reinforcement learning agents can be manipulated by altering the reward signal or the environment they interact with. If you control what the agent perceives as success, you control the agent.

ML is the engine that makes modern AI functional. It provides the learning and adaptation capability that underpins everything from your email spam filter to the model behind ChatGPT. Every application of ML, across healthcare, finance, marketing, cybersecurity, and transport, represents a system that learned its behaviour from data. And anything that learned from data can be taught the wrong lessons.

Deep Learning

Deep Learning is where ML meets neural networks with serious depth. Multiple layers of interconnected nodes learn increasingly abstract representations of data, extracting features automatically rather than relying on manual engineering.

This is the subfield responsible for most of the AI capabilities making headlines: image generation, language models, speech recognition, autonomous systems. If you are going to red team modern AI, deep learning is the architecture you will encounter most often.

Three characteristics define it:

Hierarchical feature learning. Each layer in a deep network captures a different level of abstraction. In an image model, early layers detect edges and textures. Deeper layers recognise shapes, objects, faces. This layered abstraction is powerful, but it also means adversarial perturbations can target specific layers to produce unintuitive failures.

End-to-end learning. Deep learning models map raw input directly to output. No manual feature engineering in between. The model decides what matters. This is efficient, but it also means the model’s internal logic is opaque. You cannot easily audit what features it relies on, which makes adversarial manipulation harder to detect.

Scalability. These models thrive on data and compute. More of both generally means better performance. But scale also means larger attack surfaces, more training data to potentially poison, and more parameters whose behaviour is difficult to predict or explain.

The Architectures You Will See

Convolutional Neural Networks (CNNs): the workhorses of computer vision. They use convolutional layers to detect spatial patterns, making them dominant in image classification, object detection, and segmentation.
Recurrent Neural Networks (RNNs): designed for sequential data like text and speech. They maintain internal state across time steps, allowing them to process context. Largely superseded in NLP by transformers, but still relevant in certain domains.
Transformers: the architecture behind GPT, Claude, Gemini, and most modern language models. They use self-attention mechanisms to handle long-range dependencies in data, and they have become the dominant architecture in NLP, code generation, and increasingly in vision tasks too.

If you are following this series, transformers are where we will spend most of our time. They are the architecture powering the systems most people think of when they say “AI” today, and they are the primary target for adversarial AI research: prompt injection, jailbreaking, data extraction, alignment bypasses. All of it runs through transformers.

How AI, ML, and DL Relate

Think of these three as concentric circles, not competing technologies.

AI is the goal: build systems that exhibit intelligent behaviour. ML is the primary method: let systems learn from data rather than coding rules by hand. DL is the most powerful current implementation of ML: use deep neural networks to learn complex representations automatically.

In practice, they work together. Autonomous driving systems combine ML for sensor fusion with deep CNNs for object recognition and reinforcement learning for decision-making. NLP systems use transformer-based deep learning models trained with supervised and unsupervised techniques. Robotics blends reinforcement learning with deep networks to handle dynamic environments.

For the adversarial practitioner, the relationship matters because vulnerabilities cascade. A weakness at the ML level (data poisoning, for example) can compromise the deep learning model built on top of it, which in turn undermines the AI system that depends on that model’s predictions. Understanding the stack means understanding where to apply pressure.

What Comes Next

This was the conceptual foundation. Necessary, but only the starting line.

In the posts that follow, we will move from theory into territory that gets progressively more hands-on: how models are trained and where that process introduces risk, how adversarial examples work at a mechanical level, what prompt injection actually exploits in a transformer’s architecture, and how to build a practical red teaming methodology for AI systems.

I am learning this as I go, documenting the process honestly rather than pretending I arrived with all the answers. If that sounds like a journey worth taking, stick around.

The most dangerous assumption in AI security right now is that understanding how to use these systems is the same as understanding how they fail. It is not. And the gap between those two things is where the interesting work lives.

Type to search