Definition
Stemming is a rule-based heuristic process in the retrieval stage of a RAG pipeline that reduces inflected words to their root or 'stem' form by truncating suffixes. It increases search recall by allowing different grammatical variations of a word to map to a single index entry, albeit at the risk of reduced precision due to over-stemming.
Unlike lemmatization, stemming uses crude suffix-stripping rules without a dictionary or morphological analysis.
"A hedge trimmer roughly squaring off different branches of a bush to make them look uniform from a distance."
- Lemmatization(Sophisticated Alternative)
- Tokenization(Prerequisite)
- Recall(Optimization Target)
- BM25(Retrieval Algorithm Context)
Conceptual Overview
Stemming is a rule-based heuristic process in the retrieval stage of a RAG pipeline that reduces inflected words to their root or 'stem' form by truncating suffixes. It increases search recall by allowing different grammatical variations of a word to map to a single index entry, albeit at the risk of reduced precision due to over-stemming.
Disambiguation
Unlike lemmatization, stemming uses crude suffix-stripping rules without a dictionary or morphological analysis.
Visual Analog
A hedge trimmer roughly squaring off different branches of a bush to make them look uniform from a distance.