Definition
Output tokens are the discrete numerical units of text generated by a Large Language Model (LLM) during the inference phase, serving as the primary variable for determining API costs and the 'Time to Last Token' latency in RAG pipelines.
Distinct from 'Input Tokens' (the prompt); these represent the generated payload and directly impact user-perceived speed.
"A ticker tape machine printing a message character-by-character; the longer the tape, the more time and paper consumed."
- Time Per Output Token (TPOT)(Performance Metric)
- Max Tokens(Constraint)
- Chain of Thought (CoT)(Generation Strategy)
- Stop Sequences(Termination Trigger)
Conceptual Overview
Output tokens are the discrete numerical units of text generated by a Large Language Model (LLM) during the inference phase, serving as the primary variable for determining API costs and the 'Time to Last Token' latency in RAG pipelines.
Disambiguation
Distinct from 'Input Tokens' (the prompt); these represent the generated payload and directly impact user-perceived speed.
Visual Analog
A ticker tape machine printing a message character-by-character; the longer the tape, the more time and paper consumed.