Determinant and Rank to Diagnose Your Matrix

Resumen

Before trusting the weights of a linear regression model, you need to know if the matrix behind it is healthy. Learning to diagnose a matrix with determinant and rank lets you anticipate failures in the normal equation and protect your code from broken inversions, especially when working with real datasets in NumPy.

Why does the normal equation sometimes fail?

In the previous class we found the weights for a linear regression model and everything worked. But the real question is why it worked. The answer lives inside matrix A, the one we build from the normal equation, and whether it can actually be inverted.

To evaluate that, you only need two numbers you already know from the fundamentals of linear algebra: the determinant and the rank. Together they tell you if your system has a unique solution or if it is about to collapse.

What is the determinant of a matrix? It is a number that describes how a matrix scales space. In the normal equation, it works as a quick invertibility test: if it is different from zero, the matrix is invertible.

What do determinant and rank tell you about your data?

The determinant gives you a fast yes or no answer. If it is non zero, your matrix is invertible and the system has a unique solution. If it is zero, that is a red flag: the inverse method will fail and your model will not return reliable weights.

The rank goes deeper. It tells you how many truly independent features live inside your data. If you have three columns but the rank is two, one of those columns is redundant and adds no new information.

The link between both is direct:

  • If features are redundant, the determinant becomes zero.
  • If the determinant is zero, the rank drops below the number of columns.
  • If the rank equals the number of columns, your matrix is healthy and invertible.

How do I calculate determinant and rank in NumPy?

The diagnosis starts with a healthy version of your data. In Colab, you rename the matrix from the previous class as A_saludable to mark it as the healthy baseline, and then you run two functions from NumPy.

Diagnosing a healthy matrix

For the determinant you use np.linalg.det, and for the rank you use np.linalg.matrix_rank. Both receive the matrix directly:

python det_saludable = np.linalg.det(A_saludable) print(det_saludable)

rango_saludable = np.linalg.matrix_rank(A_saludable) print(rango_saludable)

The output shows a determinant clearly different from zero and a rank of three. That makes sense: the matrix has three columns, one of ones added through X_bias plus the two original features, and all of them carry independent information.

Breaking the matrix on purpose

Now you can enferm the dataset, meaning you intentionally add a redundant column to see how the diagnosis changes. The trick is to duplicate the rooms column by multiplying it by two, then concatenating it with np.c_:

python habitaciones_doble = X[:, 1] * 2 X_enfermo = np.c_[X, habitaciones_doble] X_enfermo_bias = np.c_[np.ones((4, 1)), X_enfermo]

With four columns now in play, you rebuild matrix A using the Gram matrix structure: the transpose of X_enfermo_bias multiplied by itself.

python A_enfermo = X_enfermo_bias.T @ X_enfermo_bias

When you compute determinant and rank again, the determinant returns zero and the rank stays at three even though the matrix has four columns. That gap is the smoking gun: one column is redundant, the space collapses, and the inverse cannot exist.

Why does the determinant become zero when I duplicate a column? Because the duplicated column lies on the same direction as the original. The matrix loses a dimension, the space collapses, and the determinant cancels out.

What does it mean when rank is lower than the number of features?

A mismatch between the number of columns and the rank is your earliest warning of trouble. It tells you that even if your dataset looks rich, some of its features are mathematically saying the same thing.

Think about a practical case: you have a matrix X with shape 100 by 5, meaning 100 houses with 5 features, and the rank returns 4. That single number is telling you that one of those five features is a linear combination of the others, so it does not add real information to the model.

This is exactly the kind of diagnosis you want to run before training, because it saves you from chasing bugs later when your weights explode or your inverse silently breaks.

Key skills and concepts from the class

These are the tools you should leave with, ready to apply in your own notebooks:

  • Determinant as a quick invertibility test for matrix A in the normal equation [00:38].
  • Rank as a measure of independent features inside your data [01:10].
  • np.linalg.det to compute the determinant directly in NumPy [02:03].
  • np.linalg.matrix_rank to obtain the rank of any matrix [02:20].
  • np.c_ to concatenate columns and build the bias version of X [03:18].
  • Gram matrix built as X.T @ X to feed the normal equation [04:05].
  • Redundant column as the trigger that forces the determinant to zero and lowers the rank [04:40].

The next step is to give this problem its formal name: singularity and multicollinearity. If you already ran the exercise with the 100 by 5 matrix, drop your interpretation in the comments and compare it with what others found.