Vectors, Matrices, and Tensors in NumPy

Resumen

Before any machine learning model can learn, you need to translate the real world into a language it understands: numbers organized as vectors, matrices, and tensors. This guide shows you how to build these structures with NumPy, inspect their shape and dimensions, and visualize them in Python so you can move from raw data to objects a model can actually process.

What is a vector in machine learning? A vector is an ordered list of scalars (single numbers) that together describe one object, like a product with price and rating: [20.0, 4.5].

What are scalars and vectors and why do they matter?

Everything starts with two building blocks. A scalar is a single number: a price, an age, the square meters of a house. A vector is an ordered list of scalars that describes a complete object.

If you want to describe a house, one number is not enough. You stack square meters, number of rooms, and price into a vector with three components. That is how a machine sees: not isolated features, but whole objects.

How do I create and inspect a vector with NumPy?

In Google Colab, import NumPy and define a product with two features, price and rating:

python import numpy as np

producto_A = np.array([20.0, 4.5]) print(f"Vector producto: {producto_A}")

Now diagnose what is inside:

python print(f"Number of axes: {producto_A.ndim}") print(f"Shape: {producto_A.shape}") print(f"Total elements: {producto_A.size}")

You get ndim = 1, shape = (2,), and size = 2. NumPy treats this as a one-axis array with two elements [03:00].

Why does the number of components define the dimensions?

Here is the idea that prevents confusion later: the array structure in NumPy is one thing, the geometric space the vector represents is another.

  • A vector with 2 components lives in a 2D plane.
  • A vector with 3 components needs a 3D space.
  • A vector with hundreds of components, like the ones Netflix uses to represent users, lives in spaces of hundreds or thousands of dimensions.

That last point is where the magic of machine learning shows up [04:30]. You will work with up to 3D for visualization, but real models handle far more.

How do I plot a vector in Python with matplotlib?

Use quiver to draw the vector as an arrow from the origin, and label each axis with the feature it represents:

python import matplotlib.pyplot as plt

plt.figure(figsize=(6, 6)) plt.quiver(0, 0, producto_A[0], producto_A[1], angles='xy', scale_units='xy', scale=1, color='red') plt.title("Producto A") plt.xlabel("Precio") plt.ylabel("Calificación") plt.xlim(0, 25) plt.ylim(0, 10) plt.grid(True, alpha=0.3) plt.show()

The arrow points to (20, 4.5), mapping price on the x-axis and rating on the y-axis [06:00]. You just turned a data point into geometry.

What is a matrix and how does shape work in machine learning?

Vectors describe one object. Real datasets bring thousands of users or millions of pixels, so you need to jump from vectors to matrices.

A matrix is a collection of vectors stacked into rows and columns. The convention in machine learning is strict and worth memorizing:

  • Rows are observations: each row is one example (a user, a house, a patient).
  • Columns are features: each column is an attribute (age, price, rating).
  • Shape reads as (observations, features).

This is the structure libraries like Scikit-Learn expect.

How do I read the shape of a matrix? Shape is a tuple (rows, columns). A (4, 4) matrix means 4 observations and 4 features, for a total of 16 elements.

How do I build and slice a Netflix-style matrix?

Simulate ratings from 4 users for 4 movies:

python calificaciones = np.array([ [5, 1, 2, 0], [3, 4, 4, 2], [3, 2, 4, 1], [5, 1, 2, 4] ]) print(calificaciones.ndim) # 2 print(calificaciones.shape) # (4, 4) print(calificaciones.size) # 16

To get the first user's ratings, index the row directly. To get all ratings for the third movie, use slicing with : to take every row and 2 to pick the column:

python calificaciones_usuario_1 = calificaciones[0] calificaciones_pelicula_3 = calificaciones[:, 2]

The colon means "take everything along this axis," and the comma separates rows from columns [10:30].

What is a tensor and how do you represent an image with one?

A matrix works for tabular data, but a color image has height, width, and color depth. That is where tensors come in.

A tensor is a generalization of a matrix to more than two dimensions. In NumPy, a vector has 1 dimension, a matrix has 2, and a tensor has 3 or more. Tensors are the native language of deep learning libraries like TensorFlow and PyTorch.

A pixel combines three RGB channels (red, green, blue), each with a value from 0 to 255. Build a tiny image with two pixels:

python imagen_tensor = np.array([ [[255, 0, 0], [0, 255, 0], [0, 0, 255]], [[125, 0, 0], [0, 100, 0], [40, 0, 115]] ]) print(imagen_tensor.ndim) # 3 print(imagen_tensor.shape) # (2, 3, 3) print(imagen_tensor.size) # 18

Read the shape (2, 3, 3) as: 2 matrices, each with 3 rows and 3 columns. Multiply them and you get the 18 total elements [13:30]. Visualize it with plt.imshow(imagen_tensor) and you will see two pixels rendered with their RGB combinations.

Now your turn: share in the comments a matrix you would build from a real problem. Describe what the rows and columns represent.