Definition
A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.
Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.
"A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original."
- Model Distillation(Functional Prerequisite (The benevolent use of extraction techniques))
- Prompt Injection(Component (Often used to bypass output filters during extraction))
- Membership Inference Attack(Related technique for identifying training data)
Conceptual Overview
A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.
Disambiguation
Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.
Visual Analog
A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original.