Cosine Similarity Explained With Word Vectors

Resumen

Measuring vector length is useful, but in machine learning the real question is often different: how aligned are two vectors? The dot product is the operation that answers this, and it powers everything from Netflix recommendations to ChatGPT's attention mechanism. If you want to understand how AI compares concepts, this is where it starts.

Why does the dot product matter in AI?

The dot product measures alignment between vectors, and that alignment translates directly into similarity between ideas, products, or words.

You interact with it every day without noticing:

  • In recommendation systems, Netflix computes the dot product between your profile vector and a series vector to generate an affinity score. The higher the score, the better the match.
  • In semantic search, when you type "healthy food" and Google returns salad recipes, it's because the dot product between those phrase vectors is high. It measures semantic relevance, not exact word matches.
  • In language models like ChatGPT, the dot product runs thousands of times per second to calculate attention, deciding which words deserve more weight when generating the next token.

There's one rule you cannot skip: both vectors must have the same number of components. Always check this with the shape property before computing anything.

What is the dot product used for in machine learning? It measures how aligned two vectors are. That alignment becomes a similarity score for recommendations, search results, and attention in language models.

How do you compute the dot product in NumPy?

Once NumPy is imported, you can create two vectors and compute their dot product in two ways. Say you have u = np.array([2, 3]) and v = np.array([4, 1]).

The first option uses the classic method:

python product_np = np.dot(u, v) print(f"Result with np.dot: {product_np}")

The second option, more modern and recommended, uses the @ operator:

python product_current = u @ v print(f"Result with @ operator: {product_current}")

Both return 11. But what does 11 actually mean? On its own, not much. A dot product of 5000 isn't necessarily better, because the result depends on the magnitude of the vectors. That's why you need a standardized metric.

What is cosine similarity and why use cosine instead of sine?

From linear algebra, the dot product can also be defined as the norm of the first vector times the norm of the second, times the cosine of the angle between them. If you isolate the cosine, you get the most important similarity metric in natural language processing and recommender systems: cosine similarity.

The formula is the dot product of two vectors divided by the product of their norms. The result always lives between -1 and 1.

Why cosine and not sine?

Because cosine behaves intuitively at the angles that matter:

  • At 0 degrees (perfectly aligned vectors), cosine equals 1. Maximum similarity.
  • At 90 degrees (orthogonal vectors), cosine equals 0. No linear relationship.
  • At 180 degrees (opposite vectors), cosine equals -1. Maximum opposition.

Sine would fail here. It returns 0 for both 0 and 180 degrees, which would treat identical and opposite vectors as equally similar. It also peaks at 90 degrees, which makes no sense as a similarity measure.

What does cosine similarity mean? It's the dot product normalized by the magnitude of both vectors. The result ranges from -1 (opposite) to 1 (identical), with 0 meaning no relationship.

How do you measure word similarity with pretrained vectors?

To see this in action, you can use gensim, a natural language processing library. Install it with !pip install gensim and then import the downloader module:

python import gensim.downloader as api print("Downloading word vector model...") word_vectors = api.load("glove-wiki-gigaword-50") print("Model loaded")

The glove-wiki-gigaword-50 model holds a massive vocabulary but stays light enough to download quickly. Once loaded, the object behaves like a dictionary where keys are words and values are their vectors.

Grab a few words to compare:

python king = word_vectors["king"] man = word_vectors["man"] queen = word_vectors["queen"] cat = word_vectors["cat"]

Now define cosine similarity using the formula:

python def cosine_similarity(v1, v2): return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

Apply it to the pairs:

python sim_king_man = cosine_similarity(king, man) sim_king_queen = cosine_similarity(king, queen) sim_king_cat = cosine_similarity(king, cat)

When you print the results rounded to two decimals, you'll see that king and queen score very high, king and man also score high, and king and cat drops to around 0.39. That low value makes total sense, since the concepts barely overlap.

The pretrained vectors did the heavy lifting of representing meaning. Your job was only to apply cosine similarity to compare them. The closer the result gets to 1, the more related the words are.

How do you interpret a cosine similarity score? Values close to 1 mean the vectors point in the same direction (high similarity). Values near 0 mean no relationship. Negative values mean opposition.

Practice exercise: measure your alignment with someone else

Create a three component vector for your interests in, say, technology, art, and sports. Build another vector for a friend or family member with their own scores. Compute the cosine similarity between both and answer: how aligned are your interests?

Share your results in the comments. And here's a teaser for what comes next: when the dot product equals zero, vectors are orthogonal. What happens if you build an entire coordinate system using only vectors that are orthogonal to each other? See you in the next class.