Tilbage til ordlisten Sikkerhed

Adversarielle angreb på AI

Angrebsteknikker, der manipulerer AI-modelinput for at forårsage fejl — fra billedperturbationer til tekstbaserede angreb.

What Are Adversarial Attacks?

Adversarial attacks are deliberate manipulations of input data designed to cause AI models to produce incorrect outputs. By introducing carefully crafted perturbations — often imperceptible to humans — attackers can cause image classifiers to misidentify objects, fool natural language models into generating harmful content, or bypass security systems entirely. These attacks exploit the mathematical properties of neural networks rather than traditional software vulnerabilities, making them a unique challenge for AI security.

Types of Adversarial Attacks

White-box attacks assume full knowledge of the model architecture and weights, enabling precise gradient-based perturbations. Black-box attacks work without model access, using transfer attacks or query-based methods to discover vulnerabilities. Evasion attacks modify inputs at inference time, while poisoning attacks corrupt training data. Physical-world attacks — such as adversarial patches on stop signs — demonstrate that these threats extend beyond the digital domain into real-world deployments.

Defending Enterprise AI Systems

Robust defense requires a layered approach. Adversarial training exposes models to attack examples during training, improving resilience. Input preprocessing techniques such as image compression and randomized smoothing can neutralize perturbations. Ensemble methods that combine multiple model predictions reduce the likelihood of successful attacks. For enterprise deployments, regular adversarial testing should be integrated into the AI development lifecycle alongside anomaly detection systems that flag suspicious input patterns in production.

Relaterede tjenester og produkter