What is AI Red Teaming?
AI Red Teaming is the practice of testing AI system security by simulating adversarial attacks. The red team attempts to: bypass model guardrails, force harmful content generation, extract training data, manipulate outputs, and find prompt injection exploits.
Why is it required?
The AI Act mandates robustness testing for high-risk AI systems (Art. 9). Even without regulation, red teaming is the most effective method for discovering vulnerabilities before production deployment.
AI red teaming techniques
Key techniques include: prompt injection, jailbreaking (bypassing model restrictions), data extraction (extracting training data fragments), adversarial inputs (modified inputs causing incorrect results), and model inversion (reconstructing training data from the model).