Definition
The systematic and repeatable skew in RAG pipeline outputs or Agent decision-making caused by imbalances in vector embeddings, training data, or LLM-internal preferences for specific document positions (e.g., primacy/recency bias). It often forces an architectural trade-off where increasing fairness through diverse re-ranking or de-biasing algorithms may increase total inference latency and computational overhead.
Distinguish from 'Hallucination' (random fabrication); bias is a consistent, non-random preference for specific viewpoints, sources, or data positions.
"A Weighted Die: No matter how many times you roll (query), the internal structural imbalance ensures certain outcomes (data points) appear more frequently than others."
- Lost in the Middle(Component (A specific form of position bias in long-context RAG))
- Vector Embedding Skew(Prerequisite (Mathematical imbalance in how semantic space is mapped))
- Re-ranking(Mitigation Strategy (Used to reorganize biased initial retrieval sets))
- RLHF (Reinforcement Learning from Human Feedback)(Source (A common point where human-driven preferences are encoded into the model))
Conceptual Overview
The systematic and repeatable skew in RAG pipeline outputs or Agent decision-making caused by imbalances in vector embeddings, training data, or LLM-internal preferences for specific document positions (e.g., primacy/recency bias). It often forces an architectural trade-off where increasing fairness through diverse re-ranking or de-biasing algorithms may increase total inference latency and computational overhead.
Disambiguation
Distinguish from 'Hallucination' (random fabrication); bias is a consistent, non-random preference for specific viewpoints, sources, or data positions.
Visual Analog
A Weighted Die: No matter how many times you roll (query), the internal structural imbalance ensures certain outcomes (data points) appear more frequently than others.