Programs and services

AI Contextual Evaluations

AI Contextual Evaluations is Humane Intelligence’s term for highly customized, bespoke, and comprehensive analysis of an AI model or system’s performance in a specified problem space. Our AI contextual evaluations use mixed-methods, and can contain any number of AI red teaming workshops, bias bounties, knowledge graphs / ontologies, benchmarks or other evaluation types.

Our Evaluation Overview

Accounting for complexities

One of the most common questions Humane Intelligence clients ask is, “How do I know if an AI model or system performs well in a particular use case?” When we unpack this statement, we see that accounting for AI contextual evaluation complexities can be difficult. Guessing which scenarios to test in an AI red teaming workshop or examining underlying data without a clear direction is insufficient. Similarly, a static taxonomy of harms may not always be adequate to capture the breadth of possible failure points. That’s why in 2026, Humane Intelligence is rapidly developing a knowledge graph / ontology based approach for AI contextual evaluations.

"/

Our new knowledge graph methodology

With our new knowledge graph / ontological based methodology, Humane Intelligence is helping clients better understand their problem space coverage and gaps, and how to use compute and resources more effectively. This approach creates more mathematical and statistical rigor and clarity around AI model and systems performance, evaluation replicability and limitations, and ultimately, better product go/no-go decisions. 

At the link below, we discuss ontologies as it pertains to the Sustainable Development Goals (SDGs). Our knowledge graph / ontology methodology blog post is forthcoming.

Which one is better?

Every AI contextual evaluation is different! While there’s no single way to determine which is better, here’s a quick guide to decide whether a taxonomical or ontological based methodology is more appropriate:

Taxonomies are better for:

  • Smaller AI evaluation budgets
  • AI red teaming workshops that are primarily for educational purposes
  • Rapid, one-off evaluations
  • Well-defined and well documented problem spaces

Knowledge graphs / ontologies are better for:

  • Evaluations that build or expand over time
  • Replicability and comparability
  • Teams with a high degree of subject matter expertise
  • Complex or high-stakes problem spaces that are constantly evolving

Our AI contextual evaluations have helped our past clients answer:

“ How can I determine which LLM to use for which topic?

“ Does this AI system perform well enough for this problem space?

“ How can I create my own benchmarks?

“ I’m being pitched by a vendor and I don’t know how to validate their claims.

“ What are the broader systemic implications of my AI product on my industry?

“ Is my AI evaluation framework robust enough to capture all potential risk?

FEATURED CONTEXTUAL EVALUATION

SINGAPORE IMDA

Humane Intelligence worked with the Singaporean Infocomm Media Development Authority (IMDA) on a red teaming event and contextual evaluation, covering nine languages and drawing participants from across ASEAN.

Want to hire us?

Every evaluation is different, so please get in touch to get a quote.

Sign up for our newsletter