Prompt Injection

A security vulnerability where malicious input—either from a user or retrieved context—manipulates an LLM into ignoring its system instructions to execute unauthorized commands. In RAG pipelines, this often creates a trade-off between strict security filtering (which increases latency and reduces agent autonomy) and the system's ability to follow complex instructions.

Definition

Disambiguation

Targets the model's linguistic reasoning and instruction-following logic rather than structured query syntax like SQL.

Visual Metaphor

"A Trojan Horse in a library: A retrieved book contains a hidden note that commands the librarian to ignore the library rules and unlock the restricted archives."

Key Tools

NeMo GuardrailsLakera GuardRebuffGarakLangChain (ConstitutionalChain)

Related Connections

Indirect Prompt Injection(Specific RAG variant where the exploit is hidden in retrieved data)
System Prompt(The primary target of the attack)
Guardrails(Defensive component used for mitigation)
Jailbreaking(A subset of prompt injection focused on bypassing safety filters)

Conceptual Overview

Disambiguation

Targets the model's linguistic reasoning and instruction-following logic rather than structured query syntax like SQL.

Visual Analog

A Trojan Horse in a library: A retrieved book contains a hidden note that commands the librarian to ignore the library rules and unlock the restricted archives.

Prompt Injection

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles