Preprocessing the spam dataset
Every text cleaning step in a spam classifier either blocks an evasion path or opens one. See how preprocessing shapes what the model can and cannot see.
Every text cleaning step in a spam classifier either blocks an evasion path or opens one. See how preprocessing shapes what the model can and cannot see.
Preparing the SMS Spam Collection dataset for Bayesian classification, covering download, extraction, loading, and cleaning through an adversarial lens.
How Naive Bayes spam filters work, why the independence assumption makes them exploitable, and how GoodWords attacks broke email filtering wide open.
Learn how accuracy, precision, recall, and F1-score work in practice, where each metrics deceive, and how adversaries exploit the gaps they leave behind.
How cleaning, validation, and imputation decisions in data preprocessing pipelines create exploitable assumptions in models.
Entry 14 in the AI red teaming series. How datasets structure, quality assumptions, and preprocessing pipelines create attack surfaces for data poisoning.
Python Libraries: How scikit-learn and PyTorch work, and why their APIs are the operational foundation for adversarial machine learning.
Get started with JupyterLab: interactive notebooks for data analysis. Create, visualise, and document your Python work.
Three common problems when setting up PyTorch and Miniconda on M-series Macs for AI, from the conda zsh mismatch to CUDA commands that don't belong on your Mac.
The latest entry in the AI red teaming series breaks down how diffusion models work and maps five distinct attack surfaces across the generation pipeline.