Definition
The process of partitioning unstructured text into smaller, semantically meaningful segments to ensure retrieved information fits within an LLM's context window while maximizing the relevance of vector search results. Effective chunking balances granularity with context retention, often employing strategies like fixed-size splitting or recursive character splitting with overlaps.
Not to be confused with database partitioning or network packet segmentation; this specifically refers to text preprocessing for vectorization.
"Slicing a long loaf of bread into individual pieces that fit perfectly into a toaster without losing the flavor of the whole loaf."
- Sliding Window / Overlap(Component)
- Vector Embedding(Prerequisite)
- Context Window(Constraint)
- Semantic Similarity(Objective)
Conceptual Overview
The process of partitioning unstructured text into smaller, semantically meaningful segments to ensure retrieved information fits within an LLM's context window while maximizing the relevance of vector search results. Effective chunking balances granularity with context retention, often employing strategies like fixed-size splitting or recursive character splitting with overlaps.
Disambiguation
Not to be confused with database partitioning or network packet segmentation; this specifically refers to text preprocessing for vectorization.
Visual Analog
Slicing a long loaf of bread into individual pieces that fit perfectly into a toaster without losing the flavor of the whole loaf.