HotpotQA

HotpotQA

HotpotQA is a benchmarking dataset designed to evaluate the multi-hop reasoning capabilities of AI agents and RAG pipelines, requiring models to retrieve and synthesize information across multiple disparate documents to reach a final answer. It specifically measures the performance of complex retrieval strategies where a single search query is insufficient to resolve the user's intent.

Definition

Disambiguation

A multi-step reasoning benchmark, not a culinary application or a simple single-document QA task.

Visual Metaphor

"A scavenger hunt where the first clue leads to a second location, and both pieces of information must be combined to find the treasure."

Key Tools

DSPyLangChainLlamaIndexHugging Face DatasetsPyTorch

Related Connections

Multi-hop Retrieval(Prerequisite)
ReAct Pattern(Component)
Chain-of-Thought (CoT)(Component)
Supporting Facts(Component)

Conceptual Overview

Disambiguation

A multi-step reasoning benchmark, not a culinary application or a simple single-document QA task.

Visual Analog

A scavenger hunt where the first clue leads to a second location, and both pieces of information must be combined to find the treasure.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles