{"id":383,"date":"2026-04-14T00:00:00","date_gmt":"2026-04-13T23:00:00","guid":{"rendered":"https:\/\/kosokoking.com\/?p=383"},"modified":"2026-04-12T15:25:20","modified_gmt":"2026-04-12T14:25:20","slug":"the-maths-behind-the-models","status":"publish","type":"post","link":"https:\/\/kosokoking.com\/index.php\/multifarious\/the-maths-behind-the-models\/","title":{"rendered":"The maths behind the models"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Every anomaly detector, classifier, and language model you will encounter in this series runs on the same small set of mathematical operations. You do not need to derive them from scratch. You need to recognise them when they appear in a model&#8217;s documentation, loss function, or configuration, and understand what they are doing to your data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This reference exists for that purpose. It is a companion to the broader AI for Security series on <a href=\"http:\/\/Kosokoking.com\" title=\"\">Kosokoking<\/a>. Bookmark it. When a later article drops a symbol you have not seen in years, come back <a href=\"https:\/\/kosokoking.com\/index.php\/category\/multifarious\/\" title=\"\">here<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Notation you will see in every paper<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before anything else, the notation. AI papers and model documentation reuse the same handful of conventions, and misreading one subscript can send you down the wrong path entirely.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Subscript notation (x_t)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A subscript indexes a variable by position, time step, or category. In security AI work, you will encounter this constantly in sequential data:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>x_t = the value of x at time step t\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">When a network intrusion detection model processes packet sequences, each&nbsp;<code>x_t<\/code>&nbsp;is a feature vector for the t-th packet in the flow. The subscript tells you where you are in the sequence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Superscript notation (x^n)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Superscripts denote exponents.&nbsp;<code>x^2<\/code>&nbsp;is x multiplied by itself. This appears everywhere in distance calculations, error functions, and polynomial feature engineering:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>x^2 = x * x\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">When your anomaly detection model computes squared error between predicted and observed traffic volume, that is superscript notation at work.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Summation (\u03a3)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The summation symbol tells you to add up a sequence of terms:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u03a3_{i=1}^{n} a_i = a_1 + a_2 + ... + a_n\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Loss functions are summations. When a malware classifier computes cross-entropy loss across a batch of 256 samples, it is summing the individual losses for each sample to produce one number the optimiser can act on.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Norms (||&#8230;||)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A norm measures the size of a vector. The Euclidean norm (L2) is the most common:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>||v|| = sqrt(v_1^2 + v_2^2 + ... + v_n^2)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Two other norms appear frequently:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>||v||_1 = |v_1| + |v_2| + ... + |v_n|       (L1 norm \/ Manhattan distance)\n||v||_\u221e = max(|v_1|, |v_2|, ..., |v_n|)     (L-infinity norm)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In practice: when a UEBA (user and entity behaviour analytics) system flags an account because its activity vector is &#8220;far&#8221; from its historical baseline, it is computing a norm. L1 and L2 norms also appear in model regularisation, where they penalise large weights to prevent overfitting. L1 regularisation tends to zero out irrelevant features entirely, which is useful when you want a sparse model that tells you which log fields actually matter.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Logarithms and exponentials<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These two function families underpin information theory, probability, and nearly every loss function you will encounter.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Logarithm base 2 (log2)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>log2(8) = 3    (because 2^3 = 8)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Log base 2 measures information in bits. Entropy, the core metric in information theory, is computed with log2. When a decision tree in a threat classification model chooses which feature to split on, it picks the feature that maximises information gain, measured in bits. If your IDS model reports an entropy value for DNS query distributions, it is using log2.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Natural logarithm (ln)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>ln(e^2) = 2\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The natural logarithm uses Euler&#8217;s number (e \u2248 2.718) as its base. Cross-entropy loss, the standard loss function for classification models, uses natural logarithms. When your phishing detection model outputs a probability of 0.95 that an email is malicious, the loss function computes&nbsp;<code>-ln(0.95)<\/code>&nbsp;to penalise the model proportionally to its confidence. A wrong prediction with high confidence produces a large loss. That feedback is what forces the model to calibrate.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Exponential function (e^x)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>e^2 \u2248 7.389\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The exponential function is the inverse of the natural logarithm. It appears in the softmax function, which converts raw model outputs (logits) into probabilities. When your malware classifier outputs a vector of scores for [benign, trojan, ransomware, worm], softmax exponentiates each score and normalises them so they sum to 1. The exponential amplifies differences: a logit of 5.0 versus 4.0 becomes a much larger gap after exponentiation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Exponential function, base 2 (2^x)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>2^3 = 8\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Base-2 exponentials appear in binary encoding, hash function analysis, and information-theoretic metrics. When assessing password strength or brute-force resistance, you express the search space as 2^n where n is the number of bits of entropy. A 128-bit key means 2^128 possible values. That number is why brute force does not work against properly implemented symmetric encryption.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Vectors, matrices, and the operations that run neural networks<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If logarithms power the loss functions, linear algebra powers everything else. Every layer in a neural network is a matrix operation. Understanding this section means understanding what the model is actually doing to your data at each step.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Matrix-vector multiplication (A * v)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A * v = &#91;&#91;1, 2], &#91;3, 4]] * &#91;5, 6] = &#91;17, 39]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This is the fundamental operation of a neural network layer. The matrix A contains the learned weights. The vector v is your input (or the output of the previous layer). The multiplication transforms v into a new representation. When a network-based IDS processes a feature vector representing a single network flow, the first layer multiplies that vector by its weight matrix. The result is a new vector that encodes learned patterns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Matrix-matrix multiplication (A * B)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A * B = &#91;&#91;1, 2], &#91;3, 4]] * &#91;&#91;5, 6], &#91;7, 8]] = &#91;&#91;19, 22], &#91;43, 50]]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Batched operations. Instead of processing one input vector at a time, models process matrices where each row is a separate input. When your SIEM&#8217;s ML pipeline ingests a batch of 512 log entries simultaneously, it is performing matrix-matrix multiplication: the weight matrix multiplied by the input batch matrix.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Transpose (A^T)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = &#91;&#91;1, 2], &#91;3, 4]]\nA^T = &#91;&#91;1, 3], &#91;2, 4]]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Transposition swaps rows and columns. It appears in attention mechanisms (the backbone of transformer models), dot product calculations, and data reshaping. When a transformer-based log analysis model computes self-attention, it transposes the key matrix before multiplying it with the query matrix. That transpose is what allows the model to compute similarity scores between every pair of positions in the input sequence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Inverse (A^{-1})<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = &#91;&#91;1, 2], &#91;3, 4]]\nA^{-1} = &#91;&#91;-2, 1], &#91;1.5, -0.5]]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The inverse of a matrix A is the matrix that, when multiplied by A, produces the identity matrix. In security analytics, matrix inversion appears in Mahalanobis distance calculations, which measure how far a data point is from a distribution while accounting for correlations between features. If your anomaly detection model uses Mahalanobis distance to flag unusual authentication patterns, it is inverting the covariance matrix of normal behaviour.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Determinant (det(A))<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = &#91;&#91;1, 2], &#91;3, 4]]\ndet(A) = 1*4 - 2*3 = -2\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The determinant is a scalar value that tells you whether a matrix is invertible (non-zero determinant) and how the matrix scales space. A determinant of zero means the matrix collapses at least one dimension of information, which signals that your features are linearly dependent. If you are building a feature set for a threat classifier and the feature matrix has a near-zero determinant, some of your features are redundant. Remove them before training.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Trace (tr(A))<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = &#91;&#91;1, 2], &#91;3, 4]]\ntr(A) = 1 + 4 = 5\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The trace is the sum of the diagonal elements. It equals the sum of the eigenvalues and appears in matrix decomposition methods and some regularisation techniques. In covariance analysis for anomaly detection, the trace of the covariance matrix gives you the total variance across all features, a quick measure of how spread out normal behaviour is.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Eigenvalues and eigenvectors<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These concepts feel abstract until you see what they do in practice. They decompose a transformation into its fundamental directions and magnitudes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Eigenvalue (\u03bb) and eigenvector (v)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A * v = \u03bb * v\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">An eigenvector of a matrix A is a vector whose direction does not change when A is applied to it. It only gets scaled by the eigenvalue \u03bb. In security AI, the primary application is principal component analysis (PCA). When you have 200 features extracted from network flow data, PCA uses eigenvalues and eigenvectors of the covariance matrix to identify which directions in that 200-dimensional space capture the most variance. You keep the top eigenvectors (the ones with the largest eigenvalues) and discard the rest. The result is a lower-dimensional representation that retains the signal and drops the noise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is how some IDS systems reduce a massive feature space into something a model can process efficiently without losing the patterns that distinguish normal traffic from attack traffic.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Probability and statistics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Machine learning models are probabilistic. They do not output certainty. They output distributions, likelihoods, and confidence intervals. Understanding this section is understanding what your model&#8217;s output actually means.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conditional probability (P(x | y))<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>P(Output | Input)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The probability of x given that y is true. Every classification model outputs a conditional probability.&nbsp;<code>P(malicious | features)<\/code>is what your phishing detector computes: the probability that an email is malicious given the observed features (sender reputation, URL structure, header anomalies, language patterns). Bayesian spam filters were among the earliest security applications of conditional probability, and the principle has not changed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Expectation (E[X])<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>E&#91;X] = \u03a3 x_i * P(x_i)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The expected value is the probability-weighted average of all possible outcomes. In reinforcement learning for automated penetration testing, the agent selects actions that maximise expected reward. In risk quantification, expected loss is the product of probability and impact across all threat scenarios.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Variance (Var(X)) and standard deviation (\u03c3)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>Var(X) = E&#91;(X - E&#91;X])^2]\n\u03c3(X) = sqrt(Var(X))\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Variance measures how spread out a distribution is. Standard deviation is the square root of variance and is expressed in the same units as the data, which makes it more interpretable. In anomaly detection, these are baseline metrics. If the mean number of failed login attempts per hour is 12 with a standard deviation of 3, an hour with 45 failures is more than 10 standard deviations from normal. Your model should flag that. If it does not, the problem is not the maths. It is the threshold.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Covariance (Cov(X, Y))<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>Cov(X, Y) = E&#91;(X - E&#91;X])(Y - E&#91;Y])]\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Covariance measures how two variables move together. Positive covariance means they increase together. Negative means one increases as the other decreases. In security analytics, understanding covariance between features helps you spot redundancy and correlation. If bytes_sent and bytes_received are highly covariant in your training data, a model may over-weight that relationship and miss attacks that break the pattern.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correlation (\u03c1)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>\u03c1(X, Y) = Cov(X, Y) \/ (\u03c3(X) * \u03c3(Y))\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Correlation normalises covariance to a range of -1 to 1, making it comparable across features with different scales. A correlation of 0.98 between two features means they carry nearly identical information. In feature engineering for security models, correlation analysis is how you prune redundant inputs before they waste model capacity.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sets and classification logic<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Set theory maps directly onto how detection systems categorise events.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cardinality (|S|)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>S = {1, 2, 3, 4, 5}\n|S| = 5\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The number of elements in a set. In security: the cardinality of your IoC (indicator of compromise) set, the number of unique source IPs in an alert cluster, the size of a user&#8217;s typical application set for behavioural baselining.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Union (\u222a)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = {1, 2, 3}, B = {3, 4, 5}\nA \u222a B = {1, 2, 3, 4, 5}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">All elements in either set. When correlating alerts from two detection engines, the union of their alert sets gives you the total coverage. If engine A flags {event1, event2, event3} and engine B flags {event3, event4, event5}, the union is five distinct events.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Intersection (\u2229)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>A = {1, 2, 3}, B = {3, 4, 5}\nA \u2229 B = {3}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Elements common to both sets. The intersection of two detection engines&#8217; alert sets tells you where they agree. High intersection means redundancy. Low intersection means complementary coverage. Both are useful to know.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Complement (A^c)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>U = {1, 2, 3, 4, 5}, A = {1, 2, 3}\nA^c = {4, 5}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Everything not in A. In detection terms: if A is the set of known-good processes on a host, A^c relative to all observed processes is your set of unknowns. Allowlisting is complement logic applied to endpoint security.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Functions you will see in model architectures<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">max and min<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>max(4, 7, 2) = 7\nmin(4, 7, 2) = 2\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The ReLU activation function, used in most modern neural networks, is defined as&nbsp;<code>max(0, x)<\/code>. If the input is negative, the output is zero. If positive, it passes through unchanged. This simple operation is what gives neural networks their non-linearity. Without it (or a function like it), stacking layers would be pointless because any number of linear transformations collapse into a single linear transformation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The min function appears in clipping operations, learning rate schedules, and threshold logic.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Function notation (f(x))<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>f(x) = x^2 + 2x + 1\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">A function maps an input to an output. Every neural network is a function, a composition of many smaller functions (layers).&nbsp;<code>f(x) = model(input_features)<\/code>&nbsp;is the abstraction that unifies everything in this reference. The maths above describes what happens inside that function.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Reciprocal (1\/x)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>1\/5 = 0.2\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Division by a value. Appears in learning rate calculations, normalisation (dividing by the number of samples or the norm of a vector), and attention score scaling. In the transformer attention mechanism, raw dot product scores are divided by&nbsp;<code>sqrt(d_k)<\/code>&nbsp;(the square root of the key dimension) to prevent the scores from growing too large. That division is a reciprocal operation, and without it, the softmax would saturate and gradients would vanish.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comparison operators<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These are straightforward but appear constantly in threshold-based detection logic and conditional model behaviour.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Operator<\/th><th>Meaning<\/th><th>Security example<\/th><\/tr><\/thead><tbody><tr><td><code>&gt;=<\/code><\/td><td>Greater than or equal to<\/td><td>Alert if&nbsp;<code>risk_score &gt;= 0.85<\/code><\/td><\/tr><tr><td><code>&lt;=<\/code><\/td><td>Less than or equal to<\/td><td>Suppress if&nbsp;<code>confidence &lt;= 0.3<\/code><\/td><\/tr><tr><td><code>==<\/code><\/td><td>Equal to<\/td><td>Match if&nbsp;<code>protocol == \"DNS\"<\/code><\/td><\/tr><tr><td><code>!=<\/code><\/td><td>Not equal to<\/td><td>Flag if&nbsp;<code>expected_hash != observed_hash<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Every detection rule you write uses these. Every model threshold is a comparison operator applied to a probability or score.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where this connects<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">None of these operations exist in isolation. A single forward pass through a malware classification model chains them together: matrix multiplications transform input features, ReLU (max) introduces non-linearity, softmax (exponentials and reciprocals) produces probabilities, cross-entropy loss (logarithms and summation) measures error, and the gradient (derivatives, not covered here, but coming in a later article) tells the optimiser which weights to adjust.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Understanding each piece means you can read a model&#8217;s architecture and know what it is doing to your data at every step. That is the difference between configuring a tool and understanding it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A working reference for the maths behind AI security tools. Covers linear algebra, probability, and information theory, grounded in real detection use cases.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[628,626,131,51,627,136,625,137,629,390],"class_list":["post-383","post","type-post","status-publish","format-standard","hentry","category-multifarious","tag-ai-for-security-series","tag-anomaly-detection","tag-artificial-intelligence","tag-cybersecurity","tag-linear-algebra","tag-machine-learning","tag-mathematics","tag-neural-networks","tag-probability","tag-threat-detection-2"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/383","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/comments?post=383"}],"version-history":[{"count":2,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/383\/revisions"}],"predecessor-version":[{"id":385,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/posts\/383\/revisions\/385"}],"wp:attachment":[{"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/media?parent=383"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/categories?post=383"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kosokoking.com\/index.php\/wp-json\/wp\/v2\/tags?post=383"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}