SmartFAQs.ai
Back to Learn
Deep Dive

Model Extraction

A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.

Definition

A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.

Disambiguation

Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.

Visual Metaphor

"A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original."

Key Tools
Adversarial Robustness Toolbox (ART)GiskardPyTorchLlama-Index (for probing data leakage)
Related Connections

Conceptual Overview

A form of adversarial attack where an attacker reconstructs a functionally equivalent copy of a proprietary LLM or embedding model by systematically querying the API and training a surrogate model on the responses. In RAG pipelines, this can also target the retrieval logic to replicate a competitor's domain-specific ranking or fine-tuned agentic behavior.

Disambiguation

Focuses on 'stealing' the model's intelligence via outputs, not 'Data Extraction' from unstructured files.

Visual Analog

A master forger watching a painter through a window, meticulously recreating every brushstroke on a second canvas to produce a perfect duplicate without ever owning the original.

Related Articles