Contenido del curso
Operaciones con Vectores y Matrices
Multiplicación de Matrices
Construcción de un Modelo de Regresión Lineal
Multicollinearity: Why Redundant Features Break Models
Resumen
When two features in your dataset measure the same thing, your model breaks before it even starts learning. That problem has a name: multicollinearity, and it is the most common cause of singularity in machine learning matrices. Understanding it will save you from training models that look fine on paper but collapse the moment you trust their weights.
Why does multicollinearity make a matrix singular?
Multicollinearity happens when you have redundant features. Think of one column in square meters and another in square centimeters, or duplicated room counts. Both measure the same thing, just on a different scale, so they add zero new information [0:35].
When this happens, the columns of your data matrix X become linearly dependent. That dependency travels into the Gram matrix and makes it singular, which means its determinant is zero and it has no inverse. No inverse, no unique solution.
What is a singular matrix in machine learning? It is a matrix whose determinant equals zero, so it cannot be inverted. In training, this means the normal equation has no unique solution for the model weights.
What problems does multicollinearity cause in your model?
Redundant features do not just slow things down. They actively destroy three things you need from any reliable model [1:15].
- Solution ambiguity: with redundant features, infinite combinations of weights produce the same prediction, so the normal equation cannot be solved with an inverse.
- Model instability: when the determinant is close to zero, a tiny change in your input data triggers drastic, erratic shifts in the learned weights.
- Loss of interpretability: unstable weights cannot tell you which feature actually matters, so you lose the ability to explain your model's decisions.
In other words, you stop trusting the math and you stop trusting the story the model tells.
How do you diagnose singularity in Python with NumPy?
The fastest way to see this in action is to deliberately break a healthy dataset. In Google Colab, you can take the original X and add a redundant column [2:30].
python pies2 = X_original[:, 0] * 10.764 X_enfermo = np.c_[X_original, pies2] X_enfermo_bias = np.c_[np.ones((4, 1)), X_enfermo]
The new column pies2 is just the square meters column converted to square feet by multiplying by 10.764. Same information, different scale. Pure redundancy.
Next, build the pieces of the normal equation:
python A_enfermo = X_enfermo_bias.T @ X_enfermo_bias B = X_enfermo_bias.T @ Y
Now run the two diagnostics you already know.
What does a determinant of zero tell you?
When you compute np.linalg.det(A_enfermo), the result is zero [4:10]. That is your first red flag. The matrix is sick.
What does it mean when a determinant equals zero? It means the matrix is singular and has no inverse, so the system of equations has either no solution or infinite solutions, never a unique one.
What does the rank reveal about redundancy?
The second check is the rank with np.linalg.matrix_rank(A_enfermo). The result is 3, even though the matrix is 4x4. That mismatch confirms that one column is a linear combination of the others.
Why does NumPy refuse to solve a singular system?
Knowing the theory is one thing. Watching NumPy throw the error is another. If you try to compute the weights directly:
python theta = np.linalg.inv(A_enfermo) @ B
You get a Singular matrix error [5:20]. Same story if you try the dedicated solver:
python theta = np.linalg.solve(A_enfermo, B)
Same error. NumPy is not being dramatic. It is telling you that no inverse exists, so there is no unique theta to return.
And here is the lesson: a redundant feature, something as innocent as adding square feet next to square meters, was enough to take your training pipeline from working to completely broken.
How can you experiment with near singular matrices?
Take the X_enfermo matrix and tweak one value in the last column so it is no longer exactly the square meters times 10.764. Pick any number you want and rerun the code.
Then answer in the comments:
- Is the determinant now exactly zero or just a very small number?
- Does the inverse work, or does
np.linalg.solvestill fail? - If weights appear, do they look stable or chaotic?
This little experiment shows the difference between perfect singularity and near singularity, and why even almost redundant features can wreck your model's reliability.
The good news is that linear algebra has a tool built precisely for these cases: the pseudo inverse. With it, you can find the best possible solution even when a perfect one does not exist. Share what you got in your experiment and tell me which value broke your matrix the most.