Modelforgiftning

What Is Model Poisoning?

Model poisoning is an attack vector where adversaries tamper with the AI model itself — its weights, architecture, or training procedure — to embed malicious behavior. Unlike data poisoning, which targets the training dataset, model poisoning directly manipulates the learned parameters. This can occur through compromised supply chains, malicious contributors in federated learning settings, or backdoored pre-trained models downloaded from public repositories. The resulting model appears to function normally under standard conditions but behaves incorrectly when triggered by specific inputs.

Attack Mechanisms

Backdoor attacks insert hidden triggers that activate malicious behavior only when a specific pattern is present in the input. Trojan attacks modify model weights to create hidden functionality that bypasses standard evaluation. In federated learning environments, a compromised participant can inject poisoned gradient updates that corrupt the shared model. Fine-tuning attacks exploit transfer learning by embedding vulnerabilities in foundation models that propagate to downstream applications.

Protection Strategies

Enterprises should verify the integrity of all pre-trained models and third-party components using cryptographic checksums and trusted sources. Neural cleansing techniques can detect and remove backdoors from trained models. Robust aggregation methods in federated learning filter out anomalous participant updates. Regular model auditing — testing against known trigger patterns and analyzing neuron activation distributions — helps identify compromised models before deployment. Maintaining a verified model registry with full provenance tracking is essential for enterprise AI security.

What Is Model Poisoning?

Attack Mechanisms

Protection Strategies

Relaterede termer

Relaterede tjenester og produkter