Back to glossary Security

AI Red Teaming

Testing AI system security through simulated attacks — finding vulnerabilities, guardrail bypasses, and model manipulation methods.

What is AI Red Teaming?

AI Red Teaming is the practice of testing AI system security by simulating adversarial attacks. The red team attempts to: bypass model guardrails, force harmful content generation, extract training data, manipulate outputs, and find prompt injection exploits.

Why is it required?

The AI Act mandates robustness testing for high-risk AI systems (Art. 9). Even without regulation, red teaming is the most effective method for discovering vulnerabilities before production deployment.

AI red teaming techniques

Key techniques include: prompt injection, jailbreaking (bypassing model restrictions), data extraction (extracting training data fragments), adversarial inputs (modified inputs causing incorrect results), and model inversion (reconstructing training data from the model).

Related services and products