Definition
A security vulnerability where malicious input—either from a user or retrieved context—manipulates an LLM into ignoring its system instructions to execute unauthorized commands. In RAG pipelines, this often creates a trade-off between strict security filtering (which increases latency and reduces agent autonomy) and the system's ability to follow complex instructions.
Targets the model's linguistic reasoning and instruction-following logic rather than structured query syntax like SQL.
"A Trojan Horse in a library: A retrieved book contains a hidden note that commands the librarian to ignore the library rules and unlock the restricted archives."
- Indirect Prompt Injection(Specific RAG variant where the exploit is hidden in retrieved data)
- System Prompt(The primary target of the attack)
- Guardrails(Defensive component used for mitigation)
- Jailbreaking(A subset of prompt injection focused on bypassing safety filters)
Conceptual Overview
A security vulnerability where malicious input—either from a user or retrieved context—manipulates an LLM into ignoring its system instructions to execute unauthorized commands. In RAG pipelines, this often creates a trade-off between strict security filtering (which increases latency and reduces agent autonomy) and the system's ability to follow complex instructions.
Disambiguation
Targets the model's linguistic reasoning and instruction-following logic rather than structured query syntax like SQL.
Visual Analog
A Trojan Horse in a library: A retrieved book contains a hidden note that commands the librarian to ignore the library rules and unlock the restricted archives.