Highlights:
- Microsoft established its AI red team in 2018 to tackle the evolving challenges of AI safety and security risks.
- The white paper emphasizes the vital role of subject matter experts in AI red teaming, particularly for evaluating content in specialized fields.
A recently released white paper from Microsoft Corp.’s AI red team highlights the safety and security challenges of generative artificial intelligence systems and outlines strategies to mitigate emerging risks.
Established in 2018, Microsoft’s AI red team aims to tackle the ever-changing landscape of AI safety and security risks. The team works to identify and address vulnerabilities by integrating traditional security practices with responsible AI initiatives.
The recently published white paper, “Lessons from Red Teaming 100 Generative AI Products,” reveals that generative AI not only amplifies existing security risks but also introduces new vulnerabilities that demand a comprehensive approach to mitigation. It underscores the critical role of human expertise, ongoing testing, and collaboration in tackling challenges that span from traditional cybersecurity issues to unique AI-specific threats.
The report outlines three key takeaways, beginning with the observation that generative AI systems both amplify existing security risks and create new ones. It highlights how generative AI models introduce unique cyberattack vectors while exacerbating existing vulnerabilities.
In generative AI systems, traditional security risks—such as outdated software components and improper error handling—remain significant concerns. However, model-level vulnerabilities, like prompt injections, introduce unique challenges specific to AI systems.
A case study revealed that the red team discovered an outdated FFmpeg component in a video-processing AI app, which enabled a server-side request forgery attack. This highlights how legacy issues continue to affect AI-powered solutions. The report states, “AI red teams should be attuned to new cyberattack vectors while remaining vigilant for existing security risks. AI security best practices should include basic cyber hygiene.”
The second key insight emphasizes that humans play a central role in enhancing and securing AI systems. While automation tools are valuable for generating prompts, orchestrating cyberattacks, and evaluating responses, red teaming cannot be fully automated. Instead, AI red teaming depends significantly on human expertise.
The white paper asserts that subject matter experts are essential to AI red teaming, particularly in fields like medicine, cybersecurity, and chemical, biological, radiological, and nuclear contexts, where automation has limitations. While language models can identify broad risks such as hate speech or explicit content, they often struggle with more nuanced, domain-specific concerns. This makes human oversight crucial to ensure thorough and accurate risk assessments.
AI models trained primarily on English-language data were often unable to identify risks and sensitivities in diverse linguistic or cultural contexts. Similarly, detecting psychosocial harms, such as a chatbot’s interaction with users in distress, was found to necessitate human judgment to fully grasp the broader implications and potential consequences of such interactions.
The third key insight highlights that a defense-in-depth strategy is essential for ensuring the safety of AI systems. Mitigating risks in generative AI requires a layered approach, incorporating continuous testing, strong defenses, and adaptive strategies.
The paper emphasizes that while mitigations can lessen vulnerabilities, they cannot completely eliminate risks, making continuous red teaming essential for fortifying AI systems. Microsoft researchers highlight that consistently identifying and addressing vulnerabilities increases the cost of attacks, deterring adversaries and enhancing the overall security of AI systems.