SmartFAQs.ai
Back to Learn
Intermediate

Lemmatization

A text normalization technique in RAG pipelines that reduces inflected word forms to their dictionary base (lemma) by performing morphological analysis and considering part-of-speech context. It is primarily used to enhance keyword-based retrieval (BM25) and improve the semantic consistency of chunks before embedding.

Definition

A text normalization technique in RAG pipelines that reduces inflected word forms to their dictionary base (lemma) by performing morphological analysis and considering part-of-speech context. It is primarily used to enhance keyword-based retrieval (BM25) and improve the semantic consistency of chunks before embedding.

Disambiguation

Lemmatization uses linguistic rules to find real words (e.g., 'better' to 'good'), whereas Stemming simply chops off suffixes (e.g., 'running' to 'run').

Visual Metaphor

"A Master Filing Cabinet where 'running', 'ran', and 'runs' are all filed under a single labeled folder 'run'."

Key Tools
spaCyNLTKStanzaGensim
Related Connections

Conceptual Overview

A text normalization technique in RAG pipelines that reduces inflected word forms to their dictionary base (lemma) by performing morphological analysis and considering part-of-speech context. It is primarily used to enhance keyword-based retrieval (BM25) and improve the semantic consistency of chunks before embedding.

Disambiguation

Lemmatization uses linguistic rules to find real words (e.g., 'better' to 'good'), whereas Stemming simply chops off suffixes (e.g., 'running' to 'run').

Visual Analog

A Master Filing Cabinet where 'running', 'ran', and 'runs' are all filed under a single labeled folder 'run'.

Related Articles