GenAI – Approximate Nearest Neighbors (ANN)
Table Of Contents:
- Foundational Concepts
What is Nearest Neighbor Search (NNS)?
Exact vs Approximate Nearest Neighbors
Trade-offs: Speed vs Accuracy vs Memory
Use cases in GenAI: Semantic Search, RAG, Recommendation Systems
- Distance Metrics
Euclidean Distance
Cosine Similarity
Manhattan (L1) Distance
Dot Product Similarity
Choosing the right metric based on data and task
- Core ANN Algorithms & Techniques
Locality-Sensitive Hashing (LSH)
Concept and hash function families
MinHash, SimHash
Hierarchical Navigable Small World Graphs (HNSW)
Graph-based ANN
Navigation and hierarchy
Product Quantization (PQ)
Vector compression for large-scale retrieval
IVF (Inverted File Index) + PQ
Clustering + quantization
Tree-based Methods
KD-Trees, Ball Trees (less common in ANN but foundational)
Navigable Small World Graphs (NSW)
ScaNN (by Google)
Quantization + re-ranking + hardware optimization
- ANN Libraries & Tools
FAISS (Facebook AI Similarity Search)
IVF, PQ, Flat, HNSW support
GPU acceleration
Annoy (by Spotify)
Forest of random projection trees
HNSWlib
High accuracy, fast, low-memory graph-based ANN
ScaNN (Google)
For high-speed vector search
NMSLIB
General-purpose nearest neighbor library
Milvus / Qdrant / Weaviate / Pinecone / Vespa
Vector databases with built-in ANN
- Vector Indexing Strategies
Flat Index
Partitioned Indexes (IVF)
Hierarchical Indexes (HNSW, trees)
Quantized Indexes (PQ, OPQ, SQ)
Hybrid Indexes (e.g., FAISS + HNSW + PQ)
- Evaluation of ANN Performance
Recall@K
Precision@K
Latency
Index build time and size
Throughput (QPS)
- Application Areas
LLMs & RAG: Embedding-based retrieval
Search Engines: Query-document similarity
Recommender Systems: Item-item and user-item similarity
Anomaly Detection: Rare vector behavior
Image/Audio Retrieval: Perceptual similarity search
- Advanced Concepts
Multi-modal vector search (text+image)
Hybrid search (ANN + keyword search)
Dynamic indexing and deletions
Sharding and distributed ANN (for large corpora)
Federated or secure ANN search
- Integration & Optimization
ANN with embedding models (OpenAI, SBERT, BGE, etc.)
ANN in RAG pipelines (LangChain, LlamaIndex)
Caching & re-ranking strategies
Streaming data updates in ANN indexes
GPU vs CPU inference trade-offs

