Determinant and Rank to Diagnose Your Matrix

Cursos Empresas Blog Live Conf Precios

Contenido del curso

Introducción al Álgebra Lineal para Machine Learning

Operaciones con Vectores y Matrices

Multiplicación de Matrices

Construcción de un Modelo de Regresión Lineal

Tomar examen

Determinant and Rank to Diagnose Your Matrix

Resumen

Before trusting the weights of a linear regression model, you need to know if the matrix behind it is healthy. Learning to diagnose a matrix with determinant and rank lets you anticipate failures in the normal equation and protect your code from broken inversions, especially when working with real datasets in NumPy.

Why does the normal equation sometimes fail?

In the previous class we found the weights for a linear regression model and everything worked. But the real question is why it worked. The answer lives inside matrix A, the one we build from the normal equation, and whether it can actually be inverted.

To evaluate that, you only need two numbers you already know from the fundamentals of linear algebra: the determinant and the rank. Together they tell you if your system has a unique solution or if it is about to collapse.

What is the determinant of a matrix? It is a number that describes how a matrix scales space. In the normal equation, it works as a quick invertibility test: if it is different from zero, the matrix is invertible.

What do determinant and rank tell you about your data?

The determinant gives you a fast yes or no answer. If it is non zero, your matrix is invertible and the system has a unique solution. If it is zero, that is a red flag: the inverse method will fail and your model will not return reliable weights.

The rank goes deeper. It tells you how many truly independent features live inside your data. If you have three columns but the rank is two, one of those columns is redundant and adds no new information.

The link between both is direct:

If features are redundant, the determinant becomes zero.
If the determinant is zero, the rank drops below the number of columns.
If the rank equals the number of columns, your matrix is healthy and invertible.

How do I calculate determinant and rank in NumPy?

The diagnosis starts with a healthy version of your data. In Colab, you rename the matrix from the previous class as A_saludable to mark it as the healthy baseline, and then you run two functions from NumPy.

Diagnosing a healthy matrix

For the determinant you use np.linalg.det, and for the rank you use np.linalg.matrix_rank. Both receive the matrix directly:

python det_saludable = np.linalg.det(A_saludable) print(det_saludable)

rango_saludable = np.linalg.matrix_rank(A_saludable) print(rango_saludable)

The output shows a determinant clearly different from zero and a rank of three. That makes sense: the matrix has three columns, one of ones added through X_bias plus the two original features, and all of them carry independent information.

Breaking the matrix on purpose

Now you can enferm the dataset, meaning you intentionally add a redundant column to see how the diagnosis changes. The trick is to duplicate the rooms column by multiplying it by two, then concatenating it with np.c_:

python habitaciones_doble = X[:, 1] * 2 X_enfermo = np.c_[X, habitaciones_doble] X_enfermo_bias = np.c_[np.ones((4, 1)), X_enfermo]

With four columns now in play, you rebuild matrix A using the Gram matrix structure: the transpose of X_enfermo_bias multiplied by itself.

python A_enfermo = X_enfermo_bias.T @ X_enfermo_bias

When you compute determinant and rank again, the determinant returns zero and the rank stays at three even though the matrix has four columns. That gap is the smoking gun: one column is redundant, the space collapses, and the inverse cannot exist.

Why does the determinant become zero when I duplicate a column? Because the duplicated column lies on the same direction as the original. The matrix loses a dimension, the space collapses, and the determinant cancels out.

What does it mean when rank is lower than the number of features?

A mismatch between the number of columns and the rank is your earliest warning of trouble. It tells you that even if your dataset looks rich, some of its features are mathematically saying the same thing.

Think about a practical case: you have a matrix X with shape 100 by 5, meaning 100 houses with 5 features, and the rank returns 4. That single number is telling you that one of those five features is a linear combination of the others, so it does not add real information to the model.

This is exactly the kind of diagnosis you want to run before training, because it saves you from chasing bugs later when your weights explode or your inverse silently breaks.

Key skills and concepts from the class

These are the tools you should leave with, ready to apply in your own notebooks:

Determinant as a quick invertibility test for matrix A in the normal equation [00:38].
Rank as a measure of independent features inside your data [01:10].
np.linalg.det to compute the determinant directly in NumPy [02:03].
np.linalg.matrix_rank to obtain the rank of any matrix [02:20].
np.c_ to concatenate columns and build the bias version of X [03:18].
Gram matrix built as X.T @ X to feed the normal equation [04:05].
Redundant column as the trigger that forces the determinant to zero and lowers the rank [04:40].

The next step is to give this problem its formal name: singularity and multicollinearity. If you already ran the exercise with the 100 by 5 matrix, drop your interpretation in the comments and compare it with what others found.

Gabriel Obregón

Estudiante

📊 Diagnóstico de regresión lineal

🔢 Determinante y 📐 Rango

🎯 OBJETIVO

👉 Aprender a diagnosticar si un modelo de regresión lineal funcionará antes de que fallen los cálculos, usando:

✔ Determinante

✔ Rango

🛑 Para detectar a tiempo:

· Singularidad

· Multicolinealidad

· Características redundantes

💡 IDEA CENTRAL

🧠 El modelo funciona solo si la matriz de la ecuación normal es invertible.

🔍 El determinante y el rango permiten verificar esto de forma directa.

🔢 HERRAMIENTA 1: DETERMINANTE

❓ ¿Qué indica?

📏 Muestra cómo la matriz “escala” el espacio ⚡ Sirve como prueba rápida de invertibilidad

✅ Interpretación

✔ Determinante ≠ 0 → matriz invertible → solución única

❌ Determinante = 0 → matriz no invertible → el método de la inversa falla

🧠 Uso clave: detectar singularidad antes de entrenar el modelo.

📐 HERRAMIENTA 2: RANGO

❓ ¿Qué mide?

🔎 Cuántas características independientes reales hay en los datos

✅ Interpretación

✔ Rango = número de columnas → todas las características aportan información

⚠ Rango < número de columnas → existe al menos una característica redundante

🧩 Idea importante: Una característica redundante = combinación lineal de otras.

🔗 CONEXIÓN ENTRE DETERMINANTE Y RANGO

🔁 Todo está relacionado:

· Características redundantes ⬇

· Baja el rango ⬇

· El determinante se vuelve cero

📌 Por eso suelen aparecer juntos:

· Multicolinealidad

· Singularidad

· Determinante = 0

· Rango reducido

🧪 DIAGNÓSTICO CON NUMPY

🟢 Caso “saludable”

📂 Situación

· Matriz A construida desde X

· ➕ Columna de unos (bias)

· Total: 3 columnas independientes

🎯 Resultado esperado

✔ Determinante ≠ 0 ✔ Rango = 3

🛠 Funciones usadas

· np.linalg.det → determinante

· np.linalg.matrix_rank → rango

✅ Conclusión: La matriz es invertible y el modelo es válido.

🦠 CASO “ENFERMO”

Columna redundante

🔧 Qué se hace

· Se duplica una característica (ej.: habitaciones)

· Se añade al conjunto de datos

· Se reconstruye la matriz A

⚠ Qué ocurre

· La nueva columna depende de otra

· El espacio de datos “colapsa”

🚨 Diagnóstico

❌ Determinante = 0 ⚠ Rango < número de columnas

📛 Interpretación:

· La matriz no es invertible

· Una característica no aporta información

· Aparece multicolinealidad

Introducción al Álgebra Lineal para Machine Learning

Linear Algebra Behind AI Recommendations

Google Colab Setup for Machine Learning Python

NumPy Arrays and Matplotlib Visualized

Vectors, Matrices, and Tensors in NumPy

Operaciones con Vectores y Matrices

How Models Learn From Their Own Errors

Norma L2 vs L1 en vectores con NumPy

Cosine Similarity Explained With Word Vectors

Orthogonal vs Orthonormal Vectors in NumPy

Multiplicación de Matrices

Matrix-Vector Products for Model Predictions

Matrix-Matrix Product for ML Classification

Inverting Matrices With NumPy

Construcción de un Modelo de Regresión Lineal

Cómo predecir precios con álgebra lineal

Solving Linear Regression with NumPy

Determinant and Rank to Diagnose Your Matrix

Multicollinearity: Why Redundant Features Break Models

Fixing Singular Matrices With np.linalg.pinv

Resumen