SmartFAQs.ai
Back to Learn
Intermediate

Adversarial Inputs

Maliciously crafted prompts or data points designed to bypass safety guardrails, exploit logic flaws, or trigger unintended behaviors like data exfiltration and 'jailbreaking' within AI agents or RAG systems. In RAG specifically, this includes context poisoning where retrieved documents contain hidden instructions that override the system prompt.

Definition

Maliciously crafted prompts or data points designed to bypass safety guardrails, exploit logic flaws, or trigger unintended behaviors like data exfiltration and 'jailbreaking' within AI agents or RAG systems. In RAG specifically, this includes context poisoning where retrieved documents contain hidden instructions that override the system prompt.

Disambiguation

Distinguish from 'out-of-distribution' data; adversarial inputs are intentional subversions, not accidental edge cases.

Visual Metaphor

"A Trojan Horse document in a RAG database that contains hidden text telling the AI to ignore all previous safety rules."

Key Tools
GarakPyRITNeMo GuardrailsLlama GuardPromptloo
Related Connections

Conceptual Overview

Maliciously crafted prompts or data points designed to bypass safety guardrails, exploit logic flaws, or trigger unintended behaviors like data exfiltration and 'jailbreaking' within AI agents or RAG systems. In RAG specifically, this includes context poisoning where retrieved documents contain hidden instructions that override the system prompt.

Disambiguation

Distinguish from 'out-of-distribution' data; adversarial inputs are intentional subversions, not accidental edge cases.

Visual Analog

A Trojan Horse document in a RAG database that contains hidden text telling the AI to ignore all previous safety rules.

Related Articles