Definition
The aggregate collection of raw documents and unstructured data that serves as the authoritative knowledge source for an AI agent's retrieval system. Expanding the corpus increases domain depth but requires a trade-off in the form of increased computational overhead for indexing and a higher probability of retrieving irrelevant context (noise).
The raw source material before it is transformed into vector embeddings.
"A warehouse of unsorted boxes containing all the files an organization owns before they are categorized."
- Chunking(Downstream Process)
- Vector Store(Component)
- Knowledge Base(Synonym)
- Ground Truth(Prerequisite)
Conceptual Overview
The aggregate collection of raw documents and unstructured data that serves as the authoritative knowledge source for an AI agent's retrieval system. Expanding the corpus increases domain depth but requires a trade-off in the form of increased computational overhead for indexing and a higher probability of retrieving irrelevant context (noise).
Disambiguation
The raw source material before it is transformed into vector embeddings.
Visual Analog
A warehouse of unsorted boxes containing all the files an organization owns before they are categorized.