Search Algorithms

TLDR

Modern Search Algorithms have evolved from simple character-matching routines into complex, multi-stage pipelines that synthesize lexical precision with semantic intelligence. At the core of this evolution is the transition from Keyword Search (Sparse Retrieval), which relies on exact token matching via inverted indices and BM25, to Vector Search (Dense Retrieval), which utilizes high-dimensional embeddings to capture conceptual intent.

To resolve the inherent trade-offs between these two—where keywords excel at technical identifiers and vectors excel at synonyms—the industry has converged on Hybrid Search. By utilizing fusion algorithms like Reciprocal Rank Fusion (RRF), systems can merge disparate result sets into a single, high-relevance list. This process is further refined by Semantic Search Ranking, a second-stage process where computationally expensive models like Cross-Encoders re-evaluate a small subset of candidates to ensure maximum precision. Together, these components form the backbone of modern Information Retrieval (IR) and Retrieval-Augmented Generation (RAG) systems.

Conceptual Overview

To understand modern search is to understand the "Search Funnel." In a system containing millions or billions of documents, it is mathematically impossible to perform a deep semantic analysis of every document for every query in real-time. Instead, search algorithms operate as a series of filters, progressively increasing in complexity while decreasing the volume of data they handle.

The Sparse vs. Dense Paradigm

The fundamental tension in search lies between Sparse and Dense representations:

Sparse Retrieval (Keyword Search): Documents are represented as vectors where the dimensions equal the size of the entire vocabulary. Most values are zero. This is the realm of the Inverted Index. It is incredibly fast and handles "out-of-vocabulary" terms (like a specific serial number SN-9921-X) perfectly.
Dense Retrieval (Vector Search): Documents are compressed into fixed-length numerical vectors (e.g., 768 or 1536 dimensions) by neural networks. Every value is non-zero. This allows the system to bridge the "Semantic Gap"—understanding that "feline" and "cat" are the same concept despite having no shared characters.

The Systems View: The Search Funnel

A production-grade search architecture typically follows this flow:

Stage 1: Candidate Generation (The Net): Using Hybrid Search to pull the top 100–1,000 documents from both Sparse and Dense indices.
Stage 2: Scoring & Fusion (The Filter): Applying RRF to normalize scores from different mathematical regimes.
Stage 3: Re-ranking (The Scalpel): Using a Cross-Encoder to perform a "deep read" of the top 50 candidates to determine the final order.

Infographic: The Multi-Stage Retrieval Pipeline

Architecture Diagram Description: A vertical funnel divided into three horizontal tiers.

Top Tier (Retrieval): Two parallel tracks. Track A shows a "Query" hitting an "Inverted Index (BM25)" producing a "Sparse Result Set." Track B shows the same "Query" converted to an "Embedding" hitting a "Vector Database (HNSW)" producing a "Dense Result Set."

Middle Tier (Fusion): An "RRF (Reciprocal Rank Fusion)" block where the two result sets merge. The width of the funnel narrows here.

Bottom Tier (Re-ranking): A "Cross-Encoder" model that takes the Query + Top 50 fused results, performing pairwise comparisons to output the "Final Top 10 Results."

Sidecar: A feedback loop labeled "A" (Comparing prompt variants) pointing back to the Query Embedding stage, indicating optimization of the input.

Practical Implementations

Implementing the Inverted Index

The workhorse of keyword search is the BM25 (Best Matching 25) algorithm. Unlike its predecessor TF-IDF, BM25 implements "Term Frequency Saturation." In a document about "Quantum Computing," the first five mentions of the word "Quantum" are highly informative, but the 500th mention adds very little additional relevance. BM25 dampens the impact of high-frequency terms, preventing "keyword stuffing" from gaming the system.

Scaling Vector Search with HNSW

Vector search faces a "curse of dimensionality." Calculating the Euclidean distance between a query and 10 million 1536-dimensional vectors is too slow for a web response. Practical implementations use Approximate Nearest Neighbor (ANN) algorithms, most notably HNSW (Hierarchical Navigable Small Worlds). HNSW builds a multi-layered graph where the top layers contain long-distance links (for fast traversal) and the bottom layers contain short-distance links (for local precision), similar to a skip-list.

The Hybrid Integration

In practice, developers often use a unified database (like Elasticsearch, Weaviate, or Pinecone) that manages both indices. The implementation requires a "Fusion Model." While simple weighted averaging (e.g., 0.3 * keyword_score + 0.7 * vector_score) is possible, it is fragile because the scores are on different scales. Reciprocal Rank Fusion (RRF) is the preferred implementation because it relies on the rank of the document rather than the raw score, making it robust across different query types.

Advanced Techniques

Re-ranking with Cross-Encoders

While Bi-Encoders (used in Vector Search) generate independent embeddings for queries and documents, Cross-Encoders process the query and the document simultaneously. This allows the model to attend to the interactions between specific words in the query and specific sentences in the document. Because this is computationally expensive (O(n) where n is the number of candidates), it is strictly reserved for the final re-ranking stage.

Late Interaction (ColBERT)

A middle ground between Bi-Encoders and Cross-Encoders is Late Interaction, popularized by the ColBERT model. Instead of compressing a whole document into one vector, it keeps a vector for every token. During search, it performs a "MaxSim" operation. This provides the precision of a Cross-Encoder with much of the speed of a Bi-Encoder, though it significantly increases storage requirements.

Optimization via "A"

In production RAG environments, the quality of retrieval is highly sensitive to the query's phrasing. A (Comparing prompt variants) is a critical optimization technique where developers test different ways of augmenting the user's query before it hits the search indices. For example, comparing a raw user query against a "HyDE" (Hypothetical Document Embeddings) variant where an LLM generates a fake answer first, which is then used as the search vector.

Research and Future Directions

The frontier of search algorithms is moving toward Multi-modal Retrieval and Learned Sparse Representations.

Multi-modal Search: Future systems will not just search text but will use unified embedding spaces (like CLIP) to search across images, video, and audio using a single text query. A search for "that scene where the hero jumps off a bridge" will retrieve a specific timestamp in a video file by calculating the distance between the text vector and the video frame vectors.
Learned Sparsity (SPLADE): Researchers are looking for ways to get the benefits of keyword search (precision and efficiency) using neural networks. Models like SPLADE transform a document into a sparse vector of "activated" terms that might not even appear in the original text, effectively performing "expansion" and "retrieval" in a single step.
End-to-End RAG Optimization: Instead of treating the search engine and the LLM as separate components, future research is focusing on "differentiable search indices" where the model is trained to navigate the document corpus directly within its weights.

Frequently Asked Questions

Q: Why is Reciprocal Rank Fusion (RRF) preferred over simple score weighting?

RRF is preferred because keyword search scores (like BM25) are unbounded and depend on corpus statistics, while vector search scores (like Cosine Similarity) are bounded between -1 and 1. Combining these directly is like adding Celsius to Fahrenheit. RRF bypasses this by looking only at the rank (1st, 2nd, 3rd...), using the formula Score = Sum(1 / (k + rank(d))), where $k$ is a constant (usually 60) that smooths the impact of low-ranked results.

Q: How does "A" (Comparing prompt variants) specifically improve search recall?

"A" allows developers to identify which query transformations—such as expansion, summarization, or translation—align best with the underlying embedding model's training distribution. For instance, an embedding model trained on Wikipedia might perform better if the query is transformed into a "fact-seeking" statement rather than a conversational question.

Q: When should I choose Keyword Search over Vector Search?

Keyword search is superior when the user is looking for "needle-in-a-haystack" identifiers: product SKUs, error codes, specific names of obscure libraries, or rare technical jargon. Vector search often "smoothes over" these unique tokens in favor of general semantic meaning, leading to "hallucinated" relevance where a document is conceptually similar but lacks the specific entity requested.

Q: What is the "Cold Start" problem in Vector Search?

The cold start problem occurs when new documents are added to the system. In keyword search, the inverted index can be updated almost instantly. In vector search, if the underlying embedding model's "worldview" changes or if the new documents are in a domain the model hasn't seen (e.g., a new legal framework), the embeddings may be clustered poorly, leading to low retrieval quality until the model is fine-tuned.

Q: How do Cross-Encoders solve the "Vocabulary Mismatch" problem differently than Bi-Encoders?

Bi-Encoders solve it by placing "cat" and "feline" in the same neighborhood of a vector space. Cross-Encoders solve it by looking at the contextual relationship between the query and document. A Cross-Encoder can realize that even if "cat" and "feline" are present, the document is actually about "catapults," rejecting it based on the deep interaction of tokens that a single-vector representation might miss.