Vector Space & High-Dimensional Data

Name: GridStorm
Availability: InStock

Cosine similarity, dot product, Euclidean distance, and approximate nearest neighbor algorithms (HNSW, IVF).

Intermediate · 18 min read

Similarity Metrics

Metric	Formula	Best For	Range
Cosine Similarity	cos(θ) = A·B / (	A
Dot Product	A·B = Σ(aᵢ × bᵢ)	Normalized vectors	Unbounded
Euclidean (L2)	√Σ(aᵢ - bᵢ)²	Images, spatial data	0 to ∞

import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')  # 384-dim embeddings

documents = [
    "Machine learning is a type of artificial intelligence",
    "Deep learning uses neural networks with many layers",
    "The Eiffel Tower is located in Paris, France",
    "Neural networks are inspired by the human brain",
]

embeddings = model.encode(documents, normalize_embeddings=True)

query = "What is deep learning?"
query_emb = model.encode([query], normalize_embeddings=True)[0]

scores = [(doc, np.dot(query_emb, emb)) for doc, emb in zip(documents, embeddings)]
scores.sort(key=lambda x: x[1], reverse=True)

for doc, score in scores:
    print(f"  {score:.4f} | {doc}")

Algorithm	Index Type	Speed	Notes
HNSW	Hierarchical graph	Very fast	Best recall/speed tradeoff; used by Pinecone
IVF-Flat	Inverted file	Fast	Partitions space into Voronoi cells
IVF-PQ	IVF + Product Quantization	Fastest	Compresses vectors 4–32×

Part of the LangChain, LangGraph & Vector DBs series on Tekivex. Browse all tutorials or explore our open-source products.