Definition
HotpotQA is a benchmarking dataset designed to evaluate the multi-hop reasoning capabilities of AI agents and RAG pipelines, requiring models to retrieve and synthesize information across multiple disparate documents to reach a final answer. It specifically measures the performance of complex retrieval strategies where a single search query is insufficient to resolve the user's intent.
A multi-step reasoning benchmark, not a culinary application or a simple single-document QA task.
"A scavenger hunt where the first clue leads to a second location, and both pieces of information must be combined to find the treasure."
Conceptual Overview
HotpotQA is a benchmarking dataset designed to evaluate the multi-hop reasoning capabilities of AI agents and RAG pipelines, requiring models to retrieve and synthesize information across multiple disparate documents to reach a final answer. It specifically measures the performance of complex retrieval strategies where a single search query is insufficient to resolve the user's intent.
Disambiguation
A multi-step reasoning benchmark, not a culinary application or a simple single-document QA task.
Visual Analog
A scavenger hunt where the first clue leads to a second location, and both pieces of information must be combined to find the treasure.