Definition
The systematic removal of tokens from an input sequence or retrieved context to adhere to an LLM's maximum context window. While essential for preventing API errors and reducing latency/cost, it necessitates a trade-off between prompt completeness and the risk of losing critical semantic information or 'lost in the middle' nuances.
Truncation is the forced cutting of data to fit limits, whereas 'Filtering' is the intentional removal of irrelevant data based on score.
"A physical funnel where the wide opening represents retrieved documents and the narrow neck represents the LLM's token limit, causing excess material to spill over the sides."
- Context Window(Prerequisite)
- Tokenization(Prerequisite)
- Sliding Window(Implementation Strategy)
- Lost in the Middle(Performance Risk)
Conceptual Overview
The systematic removal of tokens from an input sequence or retrieved context to adhere to an LLM's maximum context window. While essential for preventing API errors and reducing latency/cost, it necessitates a trade-off between prompt completeness and the risk of losing critical semantic information or 'lost in the middle' nuances.
Disambiguation
Truncation is the forced cutting of data to fit limits, whereas 'Filtering' is the intentional removal of irrelevant data based on score.
Visual Analog
A physical funnel where the wide opening represents retrieved documents and the narrow neck represents the LLM's token limit, causing excess material to spill over the sides.