Vector Space & High-Dimensional Data

Cosine similarity, dot product, Euclidean distance, and approximate nearest neighbor algorithms (HNSW, IVF).

Intermediate · 18 min read

Similarity Metrics

Metric Formula Best For Range
Cosine Similarity cos(θ) = A·B / ( A
Dot Product A·B = Σ(aᵢ × bᵢ) Normalized vectors Unbounded
Euclidean (L2) √Σ(aᵢ - bᵢ)² Images, spatial data 0 to ∞
import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')  # 384-dim embeddings

documents = [
    "Machine learning is a type of artificial intelligence",
    "Deep learning uses neural networks with many layers",
    "The Eiffel Tower is located in Paris, France",
    "Neural networks are inspired by the human brain",
]

embeddings = model.encode(documents, normalize_embeddings=True)

query = "What is deep learning?"
query_emb = model.encode([query], normalize_embeddings=True)[0]

scores = [(doc, np.dot(query_emb, emb)) for doc, emb in zip(documents, embeddings)]
scores.sort(key=lambda x: x[1], reverse=True)

for doc, score in scores:
    print(f"  {score:.4f} | {doc}")
Algorithm Index Type Speed Notes
HNSW Hierarchical graph Very fast Best recall/speed tradeoff; used by Pinecone
IVF-Flat Inverted file Fast Partitions space into Voronoi cells
IVF-PQ IVF + Product Quantization Fastest Compresses vectors 4–32×

Part of the LangChain, LangGraph & Vector DBs series on Tekivex. Browse all tutorials or explore our open-source products.