Definition
The process of reducing the memory footprint and storage requirements of high-dimensional embeddings through techniques like Product Quantization (PQ) or Scalar Quantization (SQ). In RAG pipelines, it balances the trade-off between retrieval latency/memory cost and the precision (recall) of the retrieved context.
Distinguishes numerical dimensionality/bit-depth reduction from general lossless file compression like ZIP or GZIP.
"Downsampling a high-resolution 4K photograph into a smaller, slightly grainy JPEG to save storage while ensuring the main subjects remain recognizable."
Conceptual Overview
The process of reducing the memory footprint and storage requirements of high-dimensional embeddings through techniques like Product Quantization (PQ) or Scalar Quantization (SQ). In RAG pipelines, it balances the trade-off between retrieval latency/memory cost and the precision (recall) of the retrieved context.
Disambiguation
Distinguishes numerical dimensionality/bit-depth reduction from general lossless file compression like ZIP or GZIP.
Visual Analog
Downsampling a high-resolution 4K photograph into a smaller, slightly grainy JPEG to save storage while ensuring the main subjects remain recognizable.