Overview – DEFCON 2023
Humane Intelligence hosted the largest-ever Generative AI Public red teaming event for closed-source API models at DEFCON 2023. This event was developed in collaboration with Seed AI and DEFCON AI Village. Over 2.5 days, 2,244 hackers evaluated 8 LLMs, and produced over 17,000 conversations on 21 topics ranging from cybersecurity hacks to misinformation and human rights. Our winners received a GPU provided by our partners at NVIDIA. Our event and analysis, the first of their kind, studied the performance of these state-of-the-art LLMS by approximating at-scale real-world scenarios where harmful outcomes may occur. Read the articles on The New York Times (sign in required) and The Washington Post.
Anthropic
Cohere
Hugging Face
NVIDIA
Stability.AI
OpenAI
Scale
Meta
White House Office of Science and Technology Policy
National Science Foundation
Congressional Artificial Intelligence Caucus
National Institute of Standards and Technology
Houston Community College
Black Tech Street
AVID
Wilson Center
Taraaz
MITRE
“As access to AI grows, its societal impacts will also grow. However, the demographics of AI labs do not reflect the broader population. Nor do we believe that AI developers should be solely or primarily responsible for determining the values that guide the behavior of AI systems…As the technology advances, it will be crucial that people from all backgrounds can help ensure that AI aligns with their values,”

Our analysis divided the questions into four broad categories: Factuality, Bias, Misdirection, and Cybersecurity.
The most successful strategies were ones that are hard to distinguish from traditional prompt engineering. Asking the model to role play, or ‘write a story’ were successful.
Human behavior can inadvertently result in biased outcomes. People interact with language models in a more conversational manner than with search engines.
Unlike other algorithmic systems – notably social media models – the LLMs did not further radicalize users when provided with aggressive content. In most cases, it matched the harmfulness of the user query. In a few cases, the model even de-escalated.