Definition
A deterministic strategy for splitting source documents into segments based on a set number of characters or tokens, often incorporating a fixed overlap. While computationally efficient for RAG, it risks breaking semantic integrity by splitting sentences or paragraphs mid-thought.
Distinguish from semantic chunking; it relies on hard count limits rather than linguistic boundaries.
"A long sourdough loaf sliced into exactly 1-inch pieces, regardless of where the crust or internal air bubbles are located."
Conceptual Overview
A deterministic strategy for splitting source documents into segments based on a set number of characters or tokens, often incorporating a fixed overlap. While computationally efficient for RAG, it risks breaking semantic integrity by splitting sentences or paragraphs mid-thought.
Disambiguation
Distinguish from semantic chunking; it relies on hard count limits rather than linguistic boundaries.
Visual Analog
A long sourdough loaf sliced into exactly 1-inch pieces, regardless of where the crust or internal air bubbles are located.