Definition
The total financial expenditure incurred to process a single end-to-end request within an AI system, aggregating the costs of input/output tokens, vector database read/write operations, and orchestration compute overhead.
Focuses on the holistic expense of a transaction rather than just the unit price of individual tokens or model hosting.
"A restaurant bill for a single dish that accounts for the raw ingredients (tokens), the chef's labor (inference), and the storage of the secret recipe (vector retrieval)."
- Tokenization(Prerequisite)
- Vector Database Retrieval(Component)
- Context Window(Component)
- Model Distillation(Optimization Strategy)
Conceptual Overview
The total financial expenditure incurred to process a single end-to-end request within an AI system, aggregating the costs of input/output tokens, vector database read/write operations, and orchestration compute overhead.
Disambiguation
Focuses on the holistic expense of a transaction rather than just the unit price of individual tokens or model hosting.
Visual Analog
A restaurant bill for a single dish that accounts for the raw ingredients (tokens), the chef's labor (inference), and the storage of the secret recipe (vector retrieval).