Model Extraction

A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.

Definition

Disambiguation

Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.

Visual Metaphor

"A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original."

Key Tools

Adversarial Robustness Toolbox (ART)GiskardPyTorchLlama-Index (for probing data leakage)

Related Connections

Model Distillation(Functional Prerequisite (The benevolent use of extraction techniques))
Prompt Injection(Component (Often used to bypass output filters during extraction))
Membership Inference Attack(Related technique for identifying training data)

Conceptual Overview

Disambiguation

Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.

Visual Analog

A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original.

Model Extraction

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles