SmartFAQs.ai
Back to Learn
Concept

Corpus

The aggregate collection of raw documents and unstructured data that serves as the authoritative knowledge source for an AI agent's retrieval system. Expanding the corpus increases domain depth but requires a trade-off in the form of increased computational overhead for indexing and a higher probability of retrieving irrelevant context (noise).

Definition

The aggregate collection of raw documents and unstructured data that serves as the authoritative knowledge source for an AI agent's retrieval system. Expanding the corpus increases domain depth but requires a trade-off in the form of increased computational overhead for indexing and a higher probability of retrieving irrelevant context (noise).

Disambiguation

The raw source material before it is transformed into vector embeddings.

Visual Metaphor

"A warehouse of unsorted boxes containing all the files an organization owns before they are categorized."

Key Tools
LangChain Document LoadersLlamaIndexUnstructured.ioApache Tika
Related Connections

Conceptual Overview

The aggregate collection of raw documents and unstructured data that serves as the authoritative knowledge source for an AI agent's retrieval system. Expanding the corpus increases domain depth but requires a trade-off in the form of increased computational overhead for indexing and a higher probability of retrieving irrelevant context (noise).

Disambiguation

The raw source material before it is transformed into vector embeddings.

Visual Analog

A warehouse of unsorted boxes containing all the files an organization owns before they are categorized.

Related Articles