Definition
Chunk Overlap is the strategic inclusion of redundant tokens between adjacent text segments during the document indexing phase. Its primary purpose is to preserve semantic continuity and context, ensuring that information split at a boundary is still retrievable in its full context by providing a 'sliding window' effect across the vector space.
Not about data deduplication; it is the intentional redundancy of shared tokens between indices to prevent context fragmentation.
"Overlapping shingles on a roof, where each piece covers part of the next to ensure there are no gaps for water (or context) to leak through."
- Chunk Size(Prerequisite)
- Recursive Character Splitting(Component)
- Sliding Window(Component)
- Context Window(Constraint)
Conceptual Overview
Chunk Overlap is the strategic inclusion of redundant tokens between adjacent text segments during the document indexing phase. Its primary purpose is to preserve semantic continuity and context, ensuring that information split at a boundary is still retrievable in its full context by providing a 'sliding window' effect across the vector space.
Disambiguation
Not about data deduplication; it is the intentional redundancy of shared tokens between indices to prevent context fragmentation.
Visual Analog
Overlapping shingles on a roof, where each piece covers part of the next to ensure there are no gaps for water (or context) to leak through.