Matrix-Matrix Product for ML Classification

Resumen

The matrix-matrix product is the operation that powers deep transformations in machine learning models, from image classifiers to recommendation systems. You will learn how to multiply two matrices in NumPy, why dimensions must align, and how the transpose fixes shape mismatches so you can score multiple data points against multiple categories in a single step.

What is the matrix-matrix product and when can you use it?

Think of it as the natural extension of the matrix-vector product you saw before. Instead of predicting one outcome at a time, you process a batch of inputs against a batch of categories at once.

The rule is simple: the number of columns of matrix A must equal the number of rows of matrix B. If A has shape M by N and B has shape N by P, the resulting matrix C will have shape M by P. The outer dimensions define the result; the inner ones must match [0:48].

What is the matrix-matrix product? It is an operation that multiplies two matrices A and B when the columns of A equal the rows of B, producing a new matrix whose shape is defined by the outer dimensions.

How can you classify movies into genres with matrix multiplication?

Imagine you have three movies, each described by two numerical features: action level and comedy level. A trained model holds its knowledge inside a weights matrix, where each column represents the importance of those features for a genre like adventure, family, or romance [1:30].

In Google Colab, you start by importing NumPy and creating the features matrix:

python import numpy as np

peliculas_features = np.array([ [5, 2], [1, 5], [4, 4] ])

pesos_generos = np.array([ [1.1, 0.4, 0.1], [0.2, 0.5, 1.2] ])

The features matrix has shape 3 by 2 (three movies, two features). The weights matrix has shape 2 by 3 (two features, three genres). Since the inner dimensions match, multiplication is possible [3:50].

How do you read the resulting score matrix?

The operation peliculas_features.dot(pesos_generos) returns a 3 by 3 matrix. Each row corresponds to a movie and each column to a genre. For the first movie, the values are 5.9, 3.0, and 2.9, so the model is most confident it belongs to adventure, which makes sense because that movie had a lot of action [5:20].

This is what is called a score matrix or confidence level. In a single operation, you applied the model's weights to a batch of inputs and got an interpretable result for every movie at once.

Why does the transpose matter when multiplying two matrices?

Shape mismatches are one of the most frequent issues in machine learning code. The transpose, written as .T in NumPy, flips rows and columns so dimensions can align.

Picture a recommendation problem: you have a matrix of three users and a matrix of two movies, both described by four features (action, comedy, drama, science fiction). The goal is a similarity matrix that tells you how much each user would like each movie [7:00].

python usuarios = np.array([ [0.9, 0.2, 0.1, 0.8], [0.1, 0.8, 0.7, 0.2], [0.8, 0.7, 0.1, 0.1] ])

peliculas = np.array([ [0.8, 0.1, 0.1, 0.9], [0.2, 0.9, 0.8, 0.1] ])

Users has shape 3 by 4 and movies has shape 2 by 4. The inner dimensions (4 and 2) do not match, so multiplying them directly raises an error [9:10].

How does the transpose fix the dimension problem?

Applying peliculas.T turns the 2 by 4 matrix into a 4 by 2 matrix. Now the columns of users (4) match the rows of the transposed movies (4), so the dot product works:

python matriz_similitud = usuarios.dot(peliculas.T)

The result is a 3 by 2 matrix where each row is a user and each column is a movie. The value at position (0, 0) is 1.47, the highest in the matrix, because user one and movie one share strong action and science fiction scores. The pair user one with movie two scores only 0.52, which is consistent with their low overlap [10:30].

Why use the transpose in matrix multiplication? Because it flips rows and columns so the inner dimensions of two matrices align, letting you compute the dot product between datasets that were originally stored in incompatible shapes.

What does this teach you about how models actually work?

You simulated a full classification pipeline and a full recommendation pipeline using nothing but matrix multiplication and the transpose. The model's knowledge lives in a weights matrix, and a single dot product applies that knowledge to an entire batch of data, returning interpretable affinity scores.

A practical exercise to lock in the concept: create a data matrix A of shape 4 by 3 with any context you like, and a weights matrix B of shape 3 by 2. Compute A times B and share the shape of the resulting matrix in the comments.

Until now, those weights have been predefined. The next step is to discover how a model finds them on its own by reversing the transformation, and that is exactly what comes next.