Overview – DEFCON 2023

Humane Intelligence hosted the largest-ever Generative AI Public red teaming event for closed-source API models at DEFCON 2023. This event was developed in collaboration with Seed AI and DEFCON AI Village. Over 2.5 days, 2,244 hackers evaluated 8 LLMs, and produced over 17,000 conversations on 21 topics ranging from cybersecurity hacks to misinformation and human rights. Our winners received a GPU provided by our partners at NVIDIA. Our event and analysis, the first of their kind, studied the performance of these state-of-the-art LLMS by approximating at-scale real-world scenarios where harmful outcomes may occur. Read the articles on The New York Times (sign in required) and The Washington Post.

Participating Companies

Anthropic

Cohere

Hugging Face

NVIDIA

Google

Stability.AI

OpenAI

Scale

Meta

Participating Policy and Community Partners

White House Office of Science and Technology Policy

National Science Foundation

Congressional Artificial Intelligence Caucus

National Institute of Standards and Technology

Houston Community College

Black Tech Street

AVID

Wilson Center

Taraaz

MITRE

“As access to AI grows, its societal impacts will also grow. However, the demographics of AI labs do not reflect the broader population. Nor do we believe that AI developers should be solely or primarily responsible for determining the values that guide the behavior of AI systems…As the technology advances, it will be crucial that people from all backgrounds can help ensure that AI aligns with their values,”

~ An event participant from Anthropic

Generative AI Red Teaming Challenge report cover

The full report

Key findings from the data:

  • Our analysis divided the questions into four broad categories: Factuality, Bias, Misdirection, and Cybersecurity. ​

  • The most successful strategies were ones that are hard to distinguish from traditional prompt engineering. Asking the model to role play, or ‘write a story’ were successful.

  • Human behavior can inadvertently result in biased outcomes. People interact with language models in a more conversational manner than with search engines.

  • Unlike other algorithmic systems – notably social media models – the LLMs did not further radicalize users when provided with aggressive content. In most cases, it matched the harmfulness of the user query. In a few cases, the model even de-escalated.

Sign up for our newsletter