Definition
A specialized open-source language model fine-tuned to function as an 'evaluator' (LLM-as-a-Judge), providing granular, rubric-based scoring and feedback for RAG outputs and Agentic reasoning steps.
Distinct from the CNCF monitoring tool; this refers to the evaluation-specific model used to benchmark AI performance.
"An Olympic Judge holding a detailed scoring card for every technical element of a performance."
- LLM-as-a-Judge(Prerequisite)
- G-Eval(Alternative)
- Feedback Loops(Component)
- Faithfulness Metric(Component)
Conceptual Overview
A specialized open-source language model fine-tuned to function as an 'evaluator' (LLM-as-a-Judge), providing granular, rubric-based scoring and feedback for RAG outputs and Agentic reasoning steps.
Disambiguation
Distinct from the CNCF monitoring tool; this refers to the evaluation-specific model used to benchmark AI performance.
Visual Analog
An Olympic Judge holding a detailed scoring card for every technical element of a performance.