Token Usage

Token Usage

The measurement of discrete semantic units processed during an LLM inference cycle, encompassing both input tokens (prompt/context) and output tokens (completion). In RAG and Agentic systems, it serves as the primary metric for managing API costs, monitoring latency, and preventing context window overflow during document retrieval.

Definition

Disambiguation

Quantifies internal model units rather than raw character counts or file sizes.

Visual Metaphor

"A metered taxi fare where the cost and time of the trip are determined by every block (token) the car travels through the city."

Key Tools

TiktokenSentencePieceHugging Face TokenizersLangChain (TokenCounters)LangSmith

Related Connections

Context Window(Hard Constraint)
Chunking(Input Optimization)
Inference Latency(Performance Correlation)

Conceptual Overview

Disambiguation

Quantifies internal model units rather than raw character counts or file sizes.

Visual Analog

A metered taxi fare where the cost and time of the trip are determined by every block (token) the car travels through the city.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles