Glossary
Definitions and synonyms for key RAG concepts. Use search or filter by category.
Advanced Concepts
Semantic graphs
Understanding cause-effect
Finding relevant code snippets
"What-if" analysis
Transferring knowledge to smaller models
Combining multiple models
Merging embeddings/outputs
Learning from rankings
Virtual environment testing
Non-text data retrieval
Creating training examples
Using tests as metrics
Advanced Embedding Techniques
Vectors capturing surrounding context
Unified space for multiple languages
Full-dimensional continuous vectors
Reducing vector dimensions while preserving relationships
Models trained on specialized corpora (medical, legal)
Number of dimensions in vector (e.g., 768, 1536)
Models fine-tuned for specific retrieval tasks
Multiple representations per document (title, content, etc.)
Cross-language vector representations
Compressing vectors by 97% through subvector coding
Reducing precision of vectors (4-bit, 8-bit)
/ **Sparse Vectors** - High-dimensional vectors with mostly zeros
SPLADE
Sparse Lexical and DensE embedding combining sparse/dense approaches
Techniques to reduce storage requirements
Advanced Retrieval & Learning
Lifelong learning systems
Multi-language adaptation
Adapting to specific domains
Learning from minimal examples
Learning from new data
Tracking information relevance over time
Updating stale information
Learning to learn quickly
Real-time model updates
Fine-tuning for RAG
Knowledge transfer across domains
No task-specific examples
Advanced Retrieval Methods
Contextualized Late Interaction over BERT
ColBERT applied to multimodal (vision) content
Broadening query scope
Neural dense retrieval approach
LLM-generated synthetic documents
Refining retrieval in steps
Retrieving across multiple documents
Multiple reformulations of single query
Enriching query with synonyms and related terms
Rewriting queries for better matching
Transforming queries to improve retrieval
Breaking complex queries into parts
Adding alternative terms
Architectures & Models
BERT
Bidirectional Encoder Representations from Transformers
BM25
Best Matching 25
DPR
Dense Passage Retrieval
HNSW
Hierarchical Navigable Small World
Hypothetical Document Embeddings
IVF
Inverted File
PQ
Product Quantization
RAFT
Retrieval-Augmented Fine-Tuning
SPLADE
Sparse Lexical and DensE
TF-IDF
Term Frequency-Inverse Document Frequency
Benchmarks & Datasets
Compliance & Ethics
Systematic discrimination
Identifying unfair patterns
CCPA
California Consumer Privacy Act
Understanding model decisions
Measuring equality
FINRA
Financial regulatory compliance
GDPR
General Data Protection Regulation
HIPAA
Health Insurance Portability and Accountability Act
Ethical AI principles
SOX
Sarbanes-Oxley compliance
Clear system explanation
Context & Token Management
Dividing content for token limits
Maintaining important information
Cutting text to fit limits
Context supporting response claims
Tokens in query and context
Tokens in generated response
Reusing previously computed prompts
Fixed-size moving context window
Condensing text to save tokens
Allocated tokens for retrieval and generation
Maximizing value per token
Maximum tokens LLM can process
Core RAG
Data & Context Management
Storing context externally
Previous interactions for context
Retrieval from multiple systems
Querying across distributed knowledge sources
Persistent user preferences and patterns
Session and long-term memory integration
Dynamic knowledge base updates
Curating important memories
Short-term conversation state
Real-time document and response streaming
Data & Privacy
RBAC and ABAC
Tracking system actions
Recording user permissions
Removing identifying information
Tracking data origin and transformations
Source and history tracking
Data storage encryption
Network data encryption
Identifying personally identifiable information
Removing sensitive information
Replacing sensitive data
Data Structures
Sorted tree structure
Connected nodes and edges
Key-value storage
Priority queue structure
Semantic network
Graph with attributes
First-in-first-out
Probabilistic balanced structure
Prefix tree for strings
Database Features
Transaction guarantees
Conditional document selection
Redundancy and failover
Storing non-vector data
Isolated data per tenant
Data partition within database
Copying data across nodes
Handling growing data
Distributing data across partitions
Access control per tenant
Document & Data Management
Redundant content between consecutive chunks
Number of tokens or characters per segment
Breaking documents into manageable pieces for embedding
Complete collection of documents in knowledge base
Breaking text into logical units
Repository for original/source documents
Tracking changes and updates to source material
Character or token-based uniform segmentation
Adding contextual information (tags, dates, source)
Content-aware splitting based on meaning
/ **Agentic Chunking** - LLM-assisted intelligent document splitting
Pulling readable content from various formats
Dividing content while preserving context
Document Processing
Web scraping
Document parsing and chunking
Vision-language model(VLM) OCR
Open-source vision-language model(VLM) OCR
Vision-language model(VLM) OCR
Open-source vision-language model(VLM) OCR
PDF content extraction
PDF text extraction
Browser automation
OCR text recognition
Document parsing and chunking
Domain Applications
API and documentation retrieval
FAQ and ticket automation
Infrastructure documentation
Organizational information system
Market data and compliance
Fact-checking and verification
Contract and precedent analysis
/ **Clinical RAG** - Patient records and guidelines
Property and regulation information
Academic literature integration
Embedding Fundamentals
Angle-based similarity metric between vectors
Similarity calculation for normalized vectors
Neural network converting text to vectors
Multi-dimensional space where vectors are positioned
Numerical vector representations of text capturing semantic meaning
Straight-line distance between vectors
Contextual understanding of text beyond keywords
Finding documents by meaning rather than keywords
Scaling vectors to unit length
Numeric array encoding semantic content
Embedding Models
Open-source high-performance embeddings
CLIP
Vision-language embedding model
Commercial multilingual embedding model
Large-scale training datasets embeddings
Embeddings from Meta's LLaMA models
Open-source efficient embeddings
3,072-dimensional embedding model
1,536-dimensional embedding model
Framework for semantic textual similarity
Embeddings from Voyage AI models
Error Analysis
Incorrect source citation
New documents not retrievable
Confident false information
Retrieved docs not supporting query
Fabricated information not in context
Contradictory information
Adversarial attack via input
Relevant docs ranked too low
Missing relevant documents
Meaning divergence between query and context
Evaluation Tools & Frameworks
Foundational Terms
Query combined with retrieved context before generation
Maximum amount of text an LLM can process (tokens)
Information sources outside the LLM's training data
LLM component that synthesizes responses from retrieved context
Anchoring generated responses in retrieved facts to reduce hallucinations
Process of preparing and storing documents for retrieval
Structured or unstructured collection of documents and data
System component responsible for fetching relevant documents
Technique combining information retrieval with generative AI for grounded responses
Generation & Response Metrics
How well response addresses question
Semantic similarity to expected answer
N-gram overlap with reference
Correctness of source attribution
Logical flow and readability
Addressing all query aspects
Predictions matching reference exactly
Accuracy of claims in response
Response grounded only in retrieved context
Recall-oriented understudy evaluation
Meaning-based comparison
Infrastructure & Deployment
Request routing and management
Content delivery network
AWS, Azure, GCP
Reusing database connections
Docker containers
Multi-machine setup
Placing systems closer to users
Adding more servers
Container orchestration
Distributing requests
Independent service components
On-premise deployment
Using more powerful servers
Intelligent RAG Patterns
Iterative query refinement
Dynamic strategy selection based on query type
Autonomous agent-driven retrieval decisions
Multiple retrieval paths in single query
Post-generation error checking
Retaining interaction history
Rapid adaptation with few examples
Coordination between specialized agents
Self-reflective improvement mechanisms
Internal critique and iteration
Self-assessment of relevance
Model self-evaluates and critiques own outputs
RAG using external tools/APIs
Long-Context Handling
Extended text handling
100K+ token support
Token allocation strategy
Updating high-value information
Preferring recent documents
Importance weighting
Choosing what to retain
Time-aware retrieval
Machine Learning Concepts
Focusing on relevant parts
BERT
Bidirectional Encoder Representations
Two-part model architecture
Adapting pre-trained models
GPT
Generative Pre-trained Transformer
Deep learning models
Learning useful features
Elements attending to each other
Using knowledge from one task for another
Attention-based models
Metrics
AUC
Area Under Curve
BLEU
BiLingual Evaluation Understudy
EM
Exact Match
F1
F1 Score
MAP
Mean Average Precision
MRR
Mean Reciprocal Rank
NDCG
Normalized Discounted Cumulative Gain
ROC
Receiver Operating Characteristic
ROUGE
Recall-Oriented Understudy for GIST Evaluation
Monitoring & Observability
Multimodal RAG
Vector representation of audio
Speech-to-text
Text-to-image, image-to-text
Scanned PDFs and photos
Vector representations of images
Finding similar images
Handling text, images, video, audio
Text from images
Single space for multiple modalities
Extracting frames and audio
Model for image embeddings
QA on images
Optimization Techniques
Grouping queries for efficiency
Storing frequent results
Minimizing expenses
Terminating search early
Improving retrieval speed
Speeding up responses
Distributed caching
Using cheaper models first
Reusing computed contexts
Optimizing query execution
In-memory cache layer
More queries per second
Reducing token usage
Orchestration & Framework Libraries
Multi-agent conversation framework
Agent team orchestration
Declarative LLM programming
End-to-end RAG framework
Comprehensive LLM orchestration framework
Stateful workflow graphs
Document indexing and retrieval framework
Lightweight agent coordination
.NET LLM integration
Performance Concepts
Data transfer rate
Complexity classification
Performance limiting factor
Model prediction time
Response time
Storage requirement
Performance measurement
Algorithm memory usage
Operations per unit time
Algorithm speed analysis
Personalization & Memory
Monitoring user interactions
Segmenting by user attributes
Recent interaction memory
Extreme user customization
Organizing remembered information
System for custom ranking
Long-term knowledge storage
Current conversation context
Learning user preferences
Custom vector representations
Platforms & Tools
AWS
Amazon Web Services
Microsoft Azure
CLI
Command Line Interface
FOSS
Free and Open-Source Software
GCP
Google Cloud Platform
GUI
Graphical User Interface
JSON
JavaScript Object Notation
REST
Representational State Transfer
SDK
Software Development Kit
Prompting Techniques
A
Comparing prompt variants
Step-by-step reasoning prompts
Including retrieved documents in prompt
Runtime prompt modification
Including examples in prompt
Crafting clear task instructions
Enhancing prompt with context
Testing prompt effectiveness
Improving prompt quality
Reusable prompt structure
Managing prompt variations
Base instructions for LLM behavior
Query or request from user
RAG Variants & Techniques
RAG with enhanced retrieval techniques
Fundamental RAG pattern
Separate, independently upgradeable components
Integrated single system
Basic retrieve-then-generate pipeline
Classical single-stage retrieval approach
Ranking & Re-ranking
Model that scores query-document pairs jointly
Using BERT-like models to score pairs
Separate encoders for query and document
ML models for optimal ranking
NDCG@K
Normalized Discounted Cumulative Gain ranking metric
Fraction of top-k results that are relevant
Algorithms determining result order
Reordering retrieved results by relevance
Fraction of all relevant docs in top-k
Assigning confidence to document relevance
Minimum score for document inclusion
Returning k highest-ranked documents
Ranking Algorithms
BM25
Probabilistic ranking function (builds on TF-IDF)
k1 (term saturation), b (length norm)
How rare term is
Adjusting for document length
Standard BM25 implementation
Probability-based scoring
Ordering by relevance
How often term appears
Preventing TF dominance
TF-IDF
Term Frequency-Inverse Document Frequency
Retrieval Metrics
Are top results ranked in order of relevance?
Does context contain info needed for answer?
Harmonic mean of precision and recall
F1@K
F1 score at top-k results
Average precision across queries
Average rank of first relevant result
MRR@K
MRR considering only top-k results
NDCG@K
NDCG@top-k results
Ranking quality metric
Fraction of retrieved results that are relevant
Fraction of all relevant documents retrieved
Numerical measure of document relevance
Embedding-based relevance measure
Search Techniques
Searching across all text fields
Retrieving at different granularity levels
Combining keyword and semantic search
Traditional text matching using terms
ColBERT-style token-level interactions
Sequential retrieval steps with refinement
Search using deep learning models
Scoring document relevance to query
Combining ranking lists from multiple retrievers
Meaning-based search (vs. keyword)
Combining sparse (BM25) and dense (embeddings) methods
Security Threats
Crafted attack inputs
Corrupting training/knowledge bases
Denial-of-service attacks
Circumventing safety guardrails
Stealing model knowledge
Malicious input attacks
Controlling request volume
Selection & Filtering
Filtering by field values
Initial broad retrieval stage
Removing duplicate or near-duplicate results
Connecting mentions to knowledge base entities
Multi-dimensional filtering
Selecting documents by attributes
Identifying entities in text
Modifying queries for better results
Combining results from multiple sources
Specialized Retrieval Approaches
Understanding cause-effect relationships
Identifying clusters in knowledge graphs
Synthesizing info across multiple sources
Graph of connected entities
/ **Knowledge-Graph-Aware Retrieval** - Using entity relationships
Incorporating structured knowledge
Multi-step logical inference
Following relationships for context
Retrieving from tables, databases, knowledge graphs
Storage Technologies
Index for dense vectors
Vectors across servers
Network structure indexing
Combined sparse and dense
Vectors stored in RAM
Mapping terms to documents
Vectors on disk
Index for sparse vectors
Hierarchical indexing
Optimized structure for vector storage
System-Level Metrics
Uptime and reliability
Token usage and infrastructure costs
Time to generate response
Storage and RAM requirements
Time from query to results
Performance as system grows
Queries processed per unit time
Number of tokens consumed
Techniques & Patterns
API
Application Programming Interface
CRAG
Corrective RAG
ETL
Extract Transform Load
MCP
Model Context Protocol
NER
Named Entity Recognition
OCR
Optical Character Recognition
RRF
Reciprocal Rank Fusion
Self-Reflective RAG
VQA
Visual Question Answering
Text Processing
Standardizing letter case
Text standardization
Identifying text language
Converting to base form
Reducing words to root form
Removing common words
Breaking text into tokens
Cleaning spacing
Use Case Specific
Conversational interface
Creating source attributions
Finding relevant documents
Verifying claims
Question answering
Suggesting content
Meaning-based search
Condensing document content
Vector Database Platforms
Lightweight open-source embedded database
Full-text search with vector support
FAISS
Facebook's high-performance similarity search library
Modern vector database with multi-modal support
Enterprise open-source vector database for massive scale
Vector capabilities in MongoDB
Managed vector database with hybrid search
Vector extension for PostgreSQL
Rust-based high-performance vector database
PostgreSQL with vector support
Specialized database optimized for storing and querying embeddings
Alternative term for vector database
Open-source vector database with GraphQL API
Vector Indexing & Search Algorithms
Fast inexact similarity search
Search over dense embeddings
HNSW
Hierarchical Navigable Small World graph indexing
Data structure for efficient keyword retrieval
IVF
Inverted File indexing for clustering
Finding k most similar items
Locating closest points in vector space
Finding items similar to query
Similarity-based retrieval using embeddings
Weak-AND for efficient pruning
Vector Similarity Metrics
Angle-based metric
Angle between vectors
Inner product of vectors
Straight-line distance
Bit-level differences
Set overlap measure
Euclidean distance norm
Grid-based distance
Workflow & Automation Platforms
Data pipeline orchestration
Rapid RAG development
SaaS integration platform
Visual workflow automation
Workflow orchestration
Low-code automation