Our vision is to create a world in which everyone has the capacity, training and tools to evaluate AI systems that affect their lives.
In non-algorithmic, non-LLM-based software development, outputs cannot be generated until a software has been implemented and deployed. Therefore, troubleshooting where or why a problematic output arises must come toward the end of the software development lifecycle.
In contrast, algorithmic and/or LLM based systems can generate outputs before the software development lifecycle is complete. Since most AI systems rely on LLMs built by a small number of organizations, the degree to which end users and implementers can directly influence LLM behavior is limited. This makes it essential to evaluate AI system outputs for algorithmic bias, hallucinations, and misleading information as part of implementation and deployment —especially in high-stakes domains like education, gender rights, democracy and governance, public health, and humanitarian response. Evaluations of Machine Learning (ML) algorithmic-based system outputs are also essential during implementation and deployment.
AI / ML system evaluations must also be considered throughout their implementation and development because system outputs often change as the system is developed. We can think of AI evaluations like a cake – yes, a cake. Much like a cake cannot be unbaked if the batter was made with the wrong ingredients, an AI / ML system’s output cannot be broken down into pieces. Once the AI / ML system generates the output, it must be evaluated as a whole.
Humane Intelligence thus considers AI evaluations as a key part of breaking down barriers to AI deployments for social good. These are questions that Humane Intelligence’s evaluations can help answer during the development of an AI / ML system.
For more information about our evaluation types and service offerings,