Definition
In RAG pipelines, ETL refers to the ingestion workflow that extracts unstructured data from varied sources, transforms it through cleaning, chunking, and embedding generation, and loads the resulting vectors into a high-performance vector database.
Focuses on semantic preservation and vectorization rather than relational normalization or transactional data movement.
"An industrial wood chipper and sorter: raw timber (unstructured files) is shredded into uniform mulch (chunks) and stored in color-coded bins (vector space) for easy garden coverage (retrieval)."
- Chunking(Component)
- Embedding(Component)
- Vector Database(Destination)
- Indexing(Component)
Conceptual Overview
In RAG pipelines, ETL refers to the ingestion workflow that extracts unstructured data from varied sources, transforms it through cleaning, chunking, and embedding generation, and loads the resulting vectors into a high-performance vector database.
Disambiguation
Focuses on semantic preservation and vectorization rather than relational normalization or transactional data movement.
Visual Analog
An industrial wood chipper and sorter: raw timber (unstructured files) is shredded into uniform mulch (chunks) and stored in color-coded bins (vector space) for easy garden coverage (retrieval).