Definition
Stop Word Removal is a preprocessing technique in RAG pipelines where high-frequency, low-semantic words (e.g., 'the', 'is', 'on') are filtered out during indexing or query processing to prioritize tokens that carry domain-specific meaning. While beneficial for optimizing sparse retrieval (BM25) and reducing index size, it can negatively impact dense embeddings by destroying the syntactic context required by transformer-based models.
Not to be confused with 'Negative Constraints' or 'Safety Guardrails' in Agent logic.
"A gold-panning sifter that lets common sand pass through while retaining valuable nuggets of information."
Conceptual Overview
Stop Word Removal is a preprocessing technique in RAG pipelines where high-frequency, low-semantic words (e.g., 'the', 'is', 'on') are filtered out during indexing or query processing to prioritize tokens that carry domain-specific meaning. While beneficial for optimizing sparse retrieval (BM25) and reducing index size, it can negatively impact dense embeddings by destroying the syntactic context required by transformer-based models.
Disambiguation
Not to be confused with 'Negative Constraints' or 'Safety Guardrails' in Agent logic.
Visual Analog
A gold-panning sifter that lets common sand pass through while retaining valuable nuggets of information.