Decision trees

A decision tree trained on your fraud detection pipeline will tell an attacker precisely which features to manipulate, in what order, and by how much. Every split threshold is a written instruction. Every leaf node is a guaranteed outcome. No other model architecture hands over its entire reasoning chain this willingly.

This is the fifth entry in the AI red teaming series, where we work through machine learning fundamentals from the perspective of someone who needs to understand them well enough to break them. Decision trees are where that project gets interesting, because the same property that makes these models interpretable to defenders makes them fully legible to adversaries.

How a decision tree thinks

A decision tree is a supervised learning model that classifies data by asking a sequence of yes/no questions about input features. Each question splits the data into subsets. The process repeats recursively until every subset is pure enough to assign a label, or until a stopping condition triggers.

The structure has three components. The root node contains the full dataset and asks the first question. Internal nodes ask subsequent questions, each splitting data further. Leaf nodes are the endpoints and each one holds a class prediction.

The model’s intelligence is defined by how it chooses the feature for each node and where it sets the split threshold. Get those right, and the tree classifies accurately. Get those wrong, and you have an expensive flowchart.

Choosing the split

The algorithm selects splits by measuring how “mixed” a subset is. Two metrics dominate, which are Gini impurity and entropy. Both measure the same underlying concept (subset homogeneity) from slightly different angles.

Gini impurity measures the probability of misclassifying a randomly chosen element:

Gini(S) = 1 - Σ (pi)²

Where pi is the proportion of class i in the set. A Gini score of 0 means the subset is perfectly pure: every element belongs to the same class. A score of 0.5 (for a binary problem) means the split achieved nothing. The classes are evenly distributed and the model learned nothing useful from this feature.

Take a dataset with 30 instances of class A and 20 of class B:

Gini(S) = 1 - (0.6² + 0.4²) = 1 - (0.36 + 0.16) = 0.48

That 0.48 tells you the set is still fairly mixed. The algorithm will keep looking for a split that drives the Gini score closer to zero.

Entropy measures disorder using information theory:

Entropy(S) = - Σ pi * log₂(pi)

The same dataset gives:

Entropy(S) = - (0.6 * log₂(0.6) + 0.4 * log₂(0.4)) ≈ 0.971

Entropy of 0 means perfect purity. Entropy of 1.0 (for a binary problem) means maximum uncertainty. The algorithm’s job is to find splits that collapse this uncertainty as fast as possible.

Information gain

Information gain is the reduction in entropy that a particular split achieves. The algorithm calculates it for every candidate feature and picks the winner:

Information Gain(S, A) = Entropy(S) - Σ ((|Sv| / |S|) * Entropy(Sv))

Where Sv is the subset of data where feature A takes value v. The feature with the highest information gain becomes the next node in the tree.

Consider a feature F that splits 50 instances into two groups. Group 1 (F=1) has 30 instances: 20 of class A, 10 of class B. Group 2 (F=2) has 20 instances: 10 of each class. The information gain from splitting on F works out to roughly 0.02. That is a weak split. The algorithm would prefer a feature that separates the classes more cleanly.

This calculation happens at every node, for every feature. The tree grows greedily, always picking the locally optimal split. It never backtracks. This greedy construction is fast, but it means the tree can miss globally better configurations. Red teamers can use this predictability.

When the tree stops growing

Three conditions typically halt the recursion:

Maximum depth reached. A hard limit on how many layers deep the tree can grow. Shallow trees underfit. Deep trees memorise.
Minimum samples per node. If a node contains fewer data points than a threshold, it becomes a leaf. This prevents the model from building rules around individual examples.
Pure nodes. If every data point in a node belongs to the same class, there is nothing left to split.

These stopping conditions are the primary defence against overfitting. They are also configuration parameters that defenders must set correctly. In practice, most teams rely on default values. Defaults are predictable, and predictable configurations are exploitable.

Why red teamers should care

Decision trees are the base learner inside the most commonly deployed ML models in production security systems. XGBoost, LightGBM, and random forests are all ensembles of decision trees. They power fraud detection, credit scoring, intrusion detection, and malware classification across the industry. Understanding the base learner is prerequisite to attacking the ensemble.

But individual decision trees have properties that make them uniquely interesting from an adversarial perspective.

The model is its own documentation

A trained decision tree is a set of if/else rules. Extract it (via model theft, insider access, or a model extraction attack against an API), and you have the complete decision logic. No gradients to approximate. No hidden representations to probe. The tree tells you: if feature X is above threshold T, go left; otherwise, go right. Follow the path to the leaf, and you know the classification.

This is the opposite of a neural network, where understanding the decision boundary requires iterative probing. With a decision tree, you read the boundary directly.

Decision boundaries are axis-aligned

Each split in a decision tree is perpendicular to a single feature axis. The model asks “is feature X above or below value Y?” It never asks “is the combination of features X and Z above some diagonal threshold?” This means adversarial perturbations only need to shift a single feature past a single threshold to change the outcome.

If a fraud detection tree splits on transaction amount at £4,999, every transaction at £5,001 goes down a different branch. The attacker does not need to understand the full model. They need to know the threshold for one feature and adjust by two pounds.

Feature importance is a target list

The features at the top of the tree (closest to the root) are the ones the model considers most informative. From a red teaming perspective, this is a prioritised target list. Manipulating the root feature has the highest chance of flipping the classification, because it controls which major branch of the tree the input enters.

If the root node splits on “number of login failures in the last hour”, and a red teamer wants to evade an account-takeover detection model, they know exactly which behavioural signal to suppress first.

Overfitting leaks training data

A deep, unpruned decision tree memorises its training set. Each leaf node may correspond to a tiny cluster of training examples, sometimes just one. This memorisation makes the model vulnerable to membership inference attacks: given an input, you can determine whether it was in the training data by checking whether the tree classifies it with unusually high confidence (routing it to a pure leaf node).

In security-sensitive applications (medical data, financial records, user behaviour logs), this leakage has real consequences beyond model accuracy.

The classic example, weaponised

The standard textbook demonstration uses weather data to predict a binary outcome based on the four features of outlook, temperature, humidity, and wind. It is deliberately simple, and that simplicity is the point.

If the root node splits on Outlook and the “Overcast” branch leads directly to a “Yes” leaf, the model has learned a hard rule meaning overcast weather always results in play. An adversary who wants to force a “Yes” prediction only needs to set Outlook to Overcast and then no other feature matters. The tree has revealed that its entire classification for one branch depends on a single input value.

Scale this to a production model with 50 features and 20 levels of depth, and the logic is the same. The tree is more complex, but every path through it is still a deterministic sequence of single-feature comparisons. The attacker’s job is to find the shortest path to the desired leaf node and manipulate the minimum number of features to traverse it.

What decision trees get right

Decision trees have minimal assumptions about the data. They handle non-linear relationships, do not require feature normalisation, and are reasonably robust to outliers (since splits are based on ordering, not distance). For a security team building a first-pass detection model, these properties are genuinely useful.

The interpretability is also a defensive asset. Unlike a neural network that flags a transaction as fraudulent for reasons buried in a weight matrix, a decision tree gives you an audit trail: this transaction was flagged because the amount exceeded £10,000, the merchant category was “cryptocurrency exchange”, and the account was less than 30 days old. That audit trail is reviewable, explainable, and legally defensible.

The problem is that the same audit trail is available to anyone who can access the model. Interpretability is a double-edged property.

Defending the tree

If you are deploying decision trees or tree-based ensembles in a security-critical pipeline, three mitigations matter most:

Restrict model access. If the model is served via an API, rate-limit queries and monitor for systematic probing patterns (inputs that walk along feature thresholds are a strong signal of a model extraction attempt). Log prediction confidence scores separately from predictions and watch for clients that consistently query near decision boundaries.
Randomise thresholds where tolerance permits. If a split threshold sits at exactly £5,000 and the business logic allows a margin of £100 either side, add controlled noise to the threshold at inference time. This does not eliminate the vulnerability, but it forces an attacker to probe more aggressively to pin down the exact boundary.
Prune aggressively. Deep trees are more exploitable because they encode more specific (and more useful) information about the training data. Set maximum depth deliberately rather than relying on defaults. Use cross-validation to find the shallowest tree that maintains acceptable performance. Every layer you remove is information an attacker does not get.

Generic advice like “implement defence in depth” does not help here. The threat is specific because the model encodes its logic as readable rules, and anyone with access can read them. The defence must address that specific exposure.

The real lesson

Decision trees are the only mainstream ML architecture where “understanding the model” and “knowing how to attack it” are the same activity. Reading the tree is the attack. Every other model type forces an adversary to approximate the decision boundary through repeated queries, gradient estimation, or surrogate modelling. Decision trees skip that step entirely.

Red teamers studying AI fundamentals must internalise the fact that interpretability does not equate to a security property, rather, It is a transparency property. Whether that transparency serves you or your adversary depends entirely on who has access to the model.

Type to search