{"id":475,"date":"2026-06-06T00:00:00","date_gmt":"2026-06-05T23:00:00","guid":{"rendered":"https:\/\/kosokoking.com\/?p=475"},"modified":"2026-05-25T20:57:34","modified_gmt":"2026-05-25T19:57:34","slug":"training-and-evaluating-your-first-spam-classifier","status":"publish","type":"post","link":"https:\/\/kosokoking.com\/index.php\/technology\/training-and-evaluating-your-first-spam-classifier\/","title":{"rendered":"Training and evaluating your first spam classifier"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A Multinomial Naive Bayes classifier trained on a few thousand SMS messages can hit an F1-score that looks production-ready in under a minute. That speed is precisely what makes it a useful teaching model and precisely what makes it a dangerous one to trust without scrutiny. Spam filtering was one of the first domains where adversarial machine learning was <a href=\"https:\/\/blog.jgc.org\/2023\/07\/how-to-beat-adaptivebayesian-spam.html\" title=\"\">studied<\/a> seriously, with researchers at the 2004 MIT Spam Conference demonstrating that one ML filter could learn to defeat another by automatically selecting which words to inject into a message. If you are going to build a classifier, you need to understand how it learns, how it breaks, and how an attacker sees the gap between the two.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This entry picks up from our earlier work on data preprocessing and feature extraction. We have clean, stemmed text and a CountVectorizer ready to produce numerical features. Now we train, tune, evaluate, and persist a working spam detection model, and we examine what the results actually tell us about robustness.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Building the pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Scikit-learn&#8217;s&nbsp;<code>Pipeline<\/code>&nbsp;object chains the vectorisation and classification steps into a single callable unit. The convenience matters less than the consistency. It ensures that the same transformation is applied identically during training and inference, which eliminates a common source of silent bugs where the vectoriser is fitted on different data or with different parameters at prediction time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.model_selection import train_test_split, GridSearchCV\nfrom sklearn.naive_bayes import MultinomialNB\nfrom sklearn.pipeline import Pipeline\n\n# Build the pipeline by combining vectorisation and classification\npipeline = Pipeline(&#91;\n    (\"vectorizer\", vectorizer),\n    (\"classifier\", MultinomialNB())\n])\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The pipeline takes two named steps. The first is the&nbsp;<code>CountVectorizer<\/code>&nbsp;instance we built in the previous entry, which converts preprocessed text into a sparse matrix of token counts. The second is a&nbsp;<code>MultinomialNB<\/code>&nbsp;classifier, which applies Bayes&#8217; theorem under the assumption that features (word counts) follow a multinomial distribution. That assumption is a simplification, words in natural language are not statistically independent, but in practice it works surprisingly well for text classification because the conditional probability estimates remain useful even when the independence assumption is violated.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hyperparameter tuning with GridSearchCV<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A classifier with default parameters is a baseline, not a finished model.&nbsp;<code>GridSearchCV<\/code>&nbsp;automates the search for better configurations by evaluating every combination of specified parameter values against k-fold cross-validation, then selecting the combination that scores highest on a chosen metric.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For&nbsp;<code>MultinomialNB<\/code>, the parameter worth tuning is&nbsp;<code>alpha<\/code>, the Laplace smoothing factor. This value controls what happens when the classifier encounters a word during prediction that appeared in one class but not the other during training. Without smoothing (<code>alpha=0<\/code>), that word would produce a zero probability, which would dominate the entire classification regardless of every other word in the message. Smoothing adds a small pseudo-count to every feature, preventing any single absent word from collapsing the prediction. Too much smoothing, however, flattens the probability distributions and washes out the signal that distinguishes spam from ham.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Define the parameter grid for hyperparameter tuning\nparam_grid = {\n    \"classifier__alpha\": &#91;0.01, 0.1, 0.15, 0.2, 0.25, 0.5, 0.75, 1.0]\n}\n\n# Perform the grid search with 5-fold cross-validation and F1-score as metric\ngrid_search = GridSearchCV(\n    pipeline,\n    param_grid,\n    cv=5,\n    scoring=\"f1\"\n)\n\n# Fit the grid search on the full dataset\ngrid_search.fit(df&#91;\"message\"], y)\n\n# Extract the best model identified by the grid search\nbest_model = grid_search.best_estimator_\nprint(\"Best model parameters:\", grid_search.best_params_)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">We score on F1 rather than accuracy for a specific reason. In a typical SMS dataset, spam accounts for roughly 13 percent of messages. A classifier that labels everything as ham would achieve 87 percent accuracy while catching zero spam. F1 balances precision (of the messages flagged as spam, how many actually were) and recall (of all the actual spam, how much did the classifier catch), which makes it a far more honest metric for imbalanced classes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The five-fold cross-validation means the dataset is split into five parts, and the model is trained on four parts and tested on the remaining one, rotating through all five. This gives a more reliable estimate of performance than a single train-test split, which can be skewed by an unlucky partition.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Evaluating on unseen messages<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The confusion matrix from our tuned model tells a clean story: 889 true negatives, 140 true positives, 5 false positives, and 0 false negatives.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" width=\"642\" height=\"552\" src=\"https:\/\/kosokoking.com\/wp-content\/uploads\/2026\/05\/image.png\" alt=\"Spam Classifier Evaluation\" class=\"wp-image-476\" style=\"width:827px;height:auto\" srcset=\"https:\/\/kosokoking.com\/wp-content\/uploads\/2026\/05\/image.png 642w, https:\/\/kosokoking.com\/wp-content\/uploads\/2026\/05\/image-300x258.png 300w\" sizes=\"(max-width: 642px) 100vw, 642px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Those numbers look excellent, and that is exactly the moment to be suspicious. Zero false negatives means the classifier caught every spam message in the test set, but the test set was drawn from the same distribution as the training data. Real-world spam does not hold still. Spammers adapt their language, inject legitimate-sounding words, and mutate their templates specifically to evade classifiers like this one.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To get a better sense of how the model behaves on messages it has never seen, we can feed it a small batch of hand-crafted examples that reflect the kinds of messages a deployed classifier would encounter.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Example SMS messages for evaluation\nnew_messages = &#91;\n    \"Congratulations! You've won a $1000 Walmart gift card. Go to http:\/\/bit.ly\/1234 to claim now.\",\n    \"Hey, are we still meeting up for lunch today?\",\n    \"Urgent! Your account has been compromised. Verify your details here: www.fakebank.com\/verify\",\n    \"Reminder: Your appointment is scheduled for tomorrow at 10am.\",\n    \"FREE entry in a weekly competition to win an iPad. Just text WIN to 80085 now!\",\n]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Before these messages can enter the model, they must pass through the same preprocessing pipeline used during training. This is a non-negotiable requirement. If the training data was lowercased, stripped of non-alphabetic characters, tokenised, stop-word filtered, and stemmed, then the evaluation data must be too. A mismatch here does not produce an error; it produces silently wrong predictions, which is worse.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\nimport re\n\ndef preprocess_message(message):\n    message = message.lower()\n    message = re.sub(r\"&#91;^a-z\\s$!]\", \"\", message)\n    tokens = word_tokenize(message)\n    tokens = &#91;word for word in tokens if word not in stop_words]\n    tokens = &#91;stemmer.stem(word) for word in tokens]\n    return \" \".join(tokens)\n\n# Preprocess the evaluation messages\nprocessed_messages = &#91;preprocess_message(msg) for msg in new_messages]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With the messages preprocessed, we extract the vectoriser and classifier from the pipeline separately. This is useful when you want to inspect intermediate representations or debug prediction behaviour, because it lets you see exactly what the vectoriser produced before the classifier made its decision.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Transform preprocessed messages into feature vectors\nX_new = best_model.named_steps&#91;\"vectorizer\"].transform(processed_messages)\n\n# Predict with the trained classifier\npredictions = best_model.named_steps&#91;\"classifier\"].predict(X_new)\nprediction_probabilities = best_model.named_steps&#91;\"classifier\"].predict_proba(X_new)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The output gives us both a binary label and the probability estimates behind it, which is where things get interesting from a red team perspective.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for i, msg in enumerate(new_messages):\n    prediction = \"Spam\" if predictions&#91;i] == 1 else \"Not-Spam\"\n    spam_probability = prediction_probabilities&#91;i]&#91;1]\n    ham_probability = prediction_probabilities&#91;i]&#91;0]\n\n    print(f\"Message: {msg}\")\n    print(f\"Prediction: {prediction}\")\n    print(f\"Spam Probability: {spam_probability:.2f}\")\n    print(f\"Not-Spam Probability: {ham_probability:.2f}\")\n    print(\"-\" * 50)\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>Message: Congratulations! You've won a $1000 Walmart gift card. Go to http:\/\/bit.ly\/1234 to claim now.\nPrediction: Spam\nSpam Probability: 1.00\nNot-Spam Probability: 0.00\n--------------------------------------------------\nMessage: Hey, are we still meeting up for lunch today?\nPrediction: Not-Spam\nSpam Probability: 0.00\nNot-Spam Probability: 1.00\n--------------------------------------------------\nMessage: Urgent! Your account has been compromised. Verify your details here: www.fakebank.com\/verify\nPrediction: Spam\nSpam Probability: 0.94\nNot-Spam Probability: 0.06\n--------------------------------------------------\nMessage: Reminder: Your appointment is scheduled for tomorrow at 10am.\nPrediction: Not-Spam\nSpam Probability: 0.00\nNot-Spam Probability: 1.00\n--------------------------------------------------\nMessage: FREE entry in a weekly competition to win an iPad. Just text WIN to 80085 now!\nPrediction: Spam\nSpam Probability: 1.00\nNot-Spam Probability: 0.00\n--------------------------------------------------\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The classifier correctly identifies all five messages, and the probability estimates show high confidence. But notice the third message, the phishing attempt, sits at 0.94 rather than 1.00. That six percent uncertainty is worth paying attention to. The message uses language (&#8220;account&#8221;, &#8220;verify&#8221;, &#8220;details&#8221;) that legitimately appears in ham messages from banks and service providers, which pulls the probability toward ham even though the overall message is clearly malicious. An attacker who understood this could add more legitimate-sounding words to push that probability below the classification threshold.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Saving the model with joblib<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Once a model is trained, tuned, and evaluated, retraining it from scratch every time the application starts is wasteful.&nbsp;<code>joblib<\/code>&nbsp;serialises the entire pipeline, including the fitted vectoriser&#8217;s vocabulary and the classifier&#8217;s learned probability tables, into a binary file that can be reloaded instantly.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import joblib\n\n# Save the trained model to a file for future use\nmodel_filename = 'spam_detection_model.joblib'\njoblib.dump(best_model, model_filename)\nprint(f\"Model saved to {model_filename}\")\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Serialisation here means converting the in-memory Python objects into a binary format that preserves their complete state. When&nbsp;<code>joblib<\/code>&nbsp;saves a scikit-learn pipeline, it captures the vectoriser&#8217;s learned vocabulary mapping, the classifier&#8217;s conditional probability tables, the smoothing parameter, and every other fitted attribute.&nbsp;<code>joblib<\/code>&nbsp;is preferred over Python&#8217;s built-in&nbsp;<code>pickle<\/code>&nbsp;for scikit-learn models because it handles large NumPy arrays more efficiently, using memory mapping and compression to reduce file size and load time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Loading the model back is a single call, and the restored object is immediately ready to predict.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Load the saved model\nloaded_model = joblib.load(model_filename)\n\n# Preprocess new messages before prediction\nnew_data_processed = &#91;preprocess_message(msg) for msg in new_messages]\n\n# Make predictions on the preprocessed data\npredictions = loaded_model.predict(new_data_processed)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">There is one important caveat. The serialised model captures the vectoriser and classifier, but it does not capture the preprocessing function. If you change&nbsp;<code>preprocess_message<\/code>&nbsp;after saving the model (perhaps by adding a new stop word or switching from Porter to Snowball stemming), the loaded model will still expect text in the old format. In production, the preprocessing logic either needs to be versioned alongside the model or wrapped into a custom scikit-learn transformer and included in the pipeline itself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What the red team sees<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This classifier works. It catches obvious spam with high confidence, correctly passes legitimate messages, and even handles the ambiguous phishing case with reasonable probability estimates. But from an adversarial perspective, the model has several properties worth noting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Multinomial Naive Bayes is a linear classifier. Its decision boundary is a hyperplane in the feature space, which means an attacker who can estimate which words carry high ham probability can craft messages that cross that boundary by injecting a handful of carefully chosen tokens. This is not theoretical. Researchers demonstrated exactly this technique against SpamBayes in 2006, showing that an attacker with access to just one percent of the training data could render the filter useless.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The&nbsp;<code>predict_proba<\/code>&nbsp;output we examined earlier is itself a signal. If an attacker can query the model (and in many deployed systems, the spam\/not-spam decision is visible to the sender through delivery receipts or bounce messages), they can probe the boundary iteratively, adjusting their message until it slips through.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The model also has no concept of message structure, sender reputation, or temporal patterns. It sees a bag of words and nothing else. A message that says &#8220;free win prize claim now&#8221; and a message that embeds those same words inside an otherwise legitimate paragraph produce different feature vectors, but the classifier&#8217;s only defence is the relative counts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These are not reasons to discard Naive Bayes. They are reasons to understand what it is actually doing, so that when we start examining adversarial attacks in later entries, we have a concrete model to attack rather than an abstract concept to theorise about.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The spam classifier you just built is your first target.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Build, tune, and evaluate a Naive Bayes spam classifier with scikit-learn, then examine what the model reveals to an adversary in this AI red teaming entry.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[668,630,51,136,745,650,726,738,761,760],"class_list":["post-475","post","type-post","status-publish","format-standard","hentry","category-technology","tag-adversarial-machine-learning","tag-ai-red-teaming","tag-cybersecurity","tag-machine-learning","tag-model-evaluation","tag-naive-bayes","tag-python","tag-scikit-learn","tag-spam-detection","tag-text-classification"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/475","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/comments?post=475"}],"version-history":[{"count":1,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/475\/revisions"}],"predecessor-version":[{"id":477,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/475\/revisions\/477"}],"wp:attachment":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/media?parent=475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/categories?post=475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/tags?post=475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}