Malware classification

A PE binary converted to a grayscale image retains enough structural fingerprint to classify its malware family with over 99% accuracy. The classifier never executes the binary, never parses its import table, never observes its runtime behaviour. It reads byte values as pixel intensities, finds visual patterns that correlate with family membership, and assigns a label. For an attacker, this means the model’s entire classification logic is anchored to byte-level texture, and byte-level texture is something you can manipulate without changing what the binary does when it runs.

Earlier in this series, we trained and evaluated a random forest classifier on network traffic data and examined how confusion matrix outputs map the exact class-level weaknesses an adversary can exploit. Over the upcoming sections, we will train a convolutional neural network to classify malware samples based on their visual representation as grayscale images, using the approach described by Bensaoud, Abudawaood, and Kalita in their 2020 paper on CNN-based malware image classification. Before we write any code, we need to understand why this approach exists, how it works, and where it breaks.

Why malware gets grouped into families

Malware does not exist as a collection of unique, unrelated binaries. Samples share code, reuse infrastructure, borrow techniques, and evolve from common ancestors. When analysts group related samples, the grouping is called a malware family. Malpedia, maintained by Fraunhofer FKIE, catalogues these family relationships alongside unpacked reference samples, YARA rules, and attribution metadata. Families like Emotet and WannaCry are well-known examples, but the database tracks hundreds of families across Windows, Linux, Android, and macOS.

The classification problem is straightforward in principle: given an unknown binary, determine which family it belongs to. In practice, that classification has traditionally required a combination of static analysis (disassembling the binary, examining strings, inspecting the import address table) and dynamic analysis (executing the sample in a sandbox and observing its behaviour). Both approaches are time-consuming. Static analysis requires reverse engineering expertise and breaks down against obfuscation. Dynamic analysis requires sandboxing infrastructure and can be evaded by malware that detects virtualised environments.

This is why ML-based classification is attractive. If a model can learn the structural patterns that distinguish one family from another, it can classify new samples in milliseconds rather than hours. The features used for classification vary across approaches. Some models operate on raw byte sequences, others on API call traces, others on import table metadata. The approach we will implement uses none of these. It classifies malware based on what the binary looks like when you render it as an image.

Turning binaries into images

The idea was first proposed by Nataraj et al. in 2011 at the IEEE Symposium on Visualization for Cyber Security. The process is mechanical. Read the binary as a raw byte stream, where each byte is an integer between 0 and 255. Arrange those bytes into a two-dimensional array with a fixed width (determined by the file size) and allow the height to vary. The result is a grayscale image where 0 maps to black and 255 maps to white.

This might sound like an arbitrary transformation, but it works because PE binaries have internal structure that survives the conversion. The .text section (executable code) produces a visually distinct texture from the .data section (initialised data), which looks different again from the .rsrc section (resources like icons and strings). Malware families that share code produce images with visually similar textures in their code sections. Families that embed similar payloads produce similar patterns in their data sections. A CNN does not need to understand what these sections are. It learns that samples labelled “Alueron” share a visual fingerprint that samples labelled “Fakerean” do not.

Nataraj et al. achieved 97.18% accuracy on their original dataset using GIST features (a compact representation of the image’s spatial frequency structure) and a k-nearest neighbour classifier. The 2020 Bensaoud et al. paper replaced the GIST-KNN pipeline with deep CNNs and pushed accuracy to 99.24% using Inception V3. The improvement came from the CNN’s ability to learn hierarchical features directly from the pixel data rather than relying on a fixed, hand-designed feature extractor.

The practical advantage for a learning environment is significant. We never need to handle live malware binaries. The dataset we will use in the next entry, Malimg, contains 9,339 samples already converted to PNG images across 25 malware families. We are training on images, not executables, which means there is zero risk of accidental infection during experimentation.

What the classification features actually capture

The features a CNN learns from malware images are worth examining, because they determine exactly what the classifier can and cannot distinguish. The first convolutional layers learn low-level textures: the granularity of byte patterns in code sections, the uniformity of zero-padded regions, the noise-like appearance of encrypted or compressed data. Deeper layers combine these textures into spatial arrangements that correspond to the binary’s section layout and the structural patterns specific to each family’s build process, compiler choices, and embedded resources.

This is a genuinely useful signal for classification. Malware authors within the same family typically compile with the same toolchain, embed resources in the same order, and structure their payloads consistently across variants. Those compilation and packaging habits produce visual signatures that are remarkably stable across samples within a family, even when the functional code changes between variants. The Bensaoud et al. paper found that six different CNN architectures all achieved above 96% accuracy on the same dataset, which suggests the visual signal is strong enough that the choice of model matters less than the quality of the representation.

But those features also reveal the classifier’s fundamental assumption, and its fundamental vulnerability. The model is learning byte-level statistical texture, not functional behaviour. It cannot distinguish between a byte pattern that executes a keylogger and a byte pattern that renders an icon, except by their visual appearance in the grayscale image. If two different operations happen to produce similar pixel intensities at similar spatial positions, the model treats them as equivalent.

What an attacker sees in this architecture

The image-based classification approach has a specific vulnerability that traditional static and dynamic analysis do not share. The model never examines what the binary does. It examines what the binary looks like as a byte stream rendered as pixels. This means the classification is based on statistical texture, not on functional behaviour, and the gap between those two things is where evasion lives.

The ATMPA attack takes this further by applying FGSM and Carlini-Wagner perturbations directly to the grayscale image representation. The generated adversarial image evades the classifier, but the limitation is that converting the perturbed image back to a functional binary is non-trivial. Arbitrary pixel changes do not map cleanly to valid x86 instructions. This constraint is specific to the image-based approach: perturbations in pixel space must also be valid in binary space, which significantly narrows the attacker’s manipulation budget compared to adversarial attacks in other image classification domains.

More practical evasion techniques target the byte regions that the CNN attends to most heavily. Appending benign-looking byte sequences to sections the classifier weights highly (typically the .text section boundary regions and the PE header) can shift the visual fingerprint toward a different family classification without affecting execution. Koch and Begoli’s 2024 survey of adversarial binary instrumentation methods found that appending 10,000 gradient-optimised bytes achieved a 60% evasion rate against CNN-based detectors, compared to less than 20% for random byte appending. The difference between random and gradient-optimised confirms that these classifiers have learnable, exploitable decision boundaries, which is exactly what the CNN entry in this series predicted about convolutional architectures in general.

The packing problem

The dataset we will use in the upcoming entries contains unpacked malware. In production, most malware arrives packed, meaning the actual payload is compressed or encrypted and only unpacked at runtime. A packed binary’s grayscale image looks nothing like its unpacked version, because the byte-level texture is dominated by the packer’s wrapper rather than the malware’s actual code. UPX-packed samples from different families can look nearly identical as images, because UPX’s decompression stub produces the same visual signature regardless of what it contains.

This is a fundamental limitation of the image-based approach as a standalone classifier. It works well on curated, unpacked datasets. It degrades sharply when confronted with packed binaries, because the visual features the model learned during training no longer correspond to the features present at inference time. Recent work by Alkhateeb et al. (2025) explored using grayscale byte-plot representations specifically to detect whether a binary is packed, treating the packing detection itself as a classification problem rather than attempting to classify the underlying family through the packing layer.

For our purposes, this limitation is acceptable. We are building a classifier to understand the technique and its adversarial properties, not deploying it as a production detection system. But knowing where the approach fails in the real world is part of the red teaming value. A classifier that only works on unpacked samples is a classifier that any packer defeats, and packing is the oldest evasion technique in the malware author’s toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *

RELATED

Training and evaluation

The latest entry in the AI red teaming series trains a random forest on NSL-KDD and shows how evaluation metrics…

Preprocessing and splitting the dataset

Preparing the NSL-KDD dataset for random forest anomaly detection, from binary and multi-class targets to encoding, feature selection, and honest…

Network anomaly detection

Train a random forest on the NSL-KDD dataset for network anomaly detection, with every data loading step examined through an…

Training and evaluating your first spam classifier

Build, tune, and evaluate a Naive Bayes spam classifier with scikit-learn, then examine what the model reveals to an adversary…