NIST Red Teaming Event

In August 2024, Humane Intelligence launched an AI red-teaming exercise supported by the U.S. National Institute of Standards And Technology (NIST). We recruited individuals interested in red teaming models and model developers building generative AI office productivity software. Our goal was to demonstrate capabilities to rigorously test and evaluate the robustness, security, and ethical implications of cutting-edge AI systems through adversarial testing and analysis. This exercise is crucial for helping to ensure the resilience and trustworthiness of AI technologies. 

Engagements

Virtual red teaming event: In the ARIA pilot, red teaming participants sought to identify as many violative outcomes as possible using predefined test scenarios as part of stress tests of model guardrails and safety mechanisms. This virtual qualifier drew participants from anyone residing in the US. Red teaming participants who passed the ARIA pilot qualifying event were invited to take part in an in-person red teaming exercise held during CAMLIS (October 24-26)

In-person red teaming event: This in-person exercise featured a red teaming evaluation using office productivity software that employs GenAI models. The in-person exercise used the “Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1),” as the operative rubric for violative outcomes. The framework covers risk categories including generating misinformation or cybersecurity attacks, leaking private user information or critical information about related AI systems, and the potential for users to become emotionally attached to AI tools. During testing, red teamers engaged in adversarial interactions with developer-submitted applications on a turn-by-turn basis.

Models and Platforms Tested

Anote

End-to-end MLOps platform that enables you to obtain the best large language model for your data. Anote provides an evaluation framework to compare zero shot LLMs like GPT, Claude, Llama3 and Mistral, with fine tuned LLMs that are trained on your domain specific training data.

Meta Llama3.2 90B

Llama3.2 90B (without guardrails) is of the two largest models of the Llama 3.2 collection (alongside 11B), and can support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions.

Robust Intelligence

AI Security startup, whose end-to-end platform enables enterprise customers like JP Morgan Chase, Expedia, Intuit, IBM and more to deploy machine learning models with confidence. The Robust Intelligence platform combines AI Validation, an automated pen-testing framework for LLMs, and AI Firewall, a real-time low latency guardrail that flags unsafe or malicious content such as prompt injections and toxic language in model inputs and outputs.

Synthesia

Synthesia is the world’s leading enterprise AI video communications platform. Over 1 million users across 55,000 businesses, including more than 60% of the Fortune 100, use it to communicate efficiently and share knowledge at scale using AI avatars. Founded in 2017, Synthesia is headquartered in London and makes video creation, collaboration and sharing easy for everyone.

Outcomes

Outcomes: Participants collaborated with industry and government partners to better understand the potential positive and negative uses of AI models, in addition to leveraging this technology to mitigate negative outcomes. This event demonstrated a test of the potential positive and negative uses of AI models, as well as a method of leveraging positive use cases to mitigate negative, and the use of NIST AI 600-1 to explore GAI risks and suggested actions as an approach for establishing GAI safety and security controls.

Sign up for our newsletter