Definition
A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.
Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.
"A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original."
Conceptual Overview
A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.
Disambiguation
Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.
Visual Analog
A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original.