Recent research has demonstrated a simple but powerful technique: adding contextual information to chunks before embedding them.
The Problem
Standard RAG systems chunk documents without considering the broader context. A chunk about "the agreement" might lose critical information about which agreement is being discussed.
The Solution
Contextual retrieval prepends each chunk with a brief description of its context—derived from the surrounding document structure.
For example:
- "This chunk is from the liability section of a software license agreement"
- "This passage discusses authentication in the API documentation"
Results
In controlled experiments, contextual retrieval showed:
- 49% reduction in failed retrievals
- Improved relevance for ambiguous queries
- Better handling of documents with similar content
Implementation Notes
The technique requires an additional LLM call per chunk during ingestion, but the retrieval improvements often justify the cost.
We're currently evaluating this approach for future SmartFAQs.ai updates.