Trust as a Product Feature: How ETS Builds AI-Enabled Assessments with Humans at the Center

Anybody can build a test, especially now with all that AI can do. The harder question is whether that test measures what it claims to, whether it holds up across populations, and whether it remains fair and valid at scale.

ETS develops, administers and scores millions of tests each year, and most of them carry real consequences for the people who take them. A single result can shape a learner's future, a career opportunity, or a licensing outcome. Enabling these opportunities for people is what drives our mission and why we hold ourselves to such high standards. When AI comes into play, the bar gets higher, not lower. We meet it by making disciplined choices about where AI adds value and by making sure humans, not AI, stay in charge.

How ETS uses AI across the assessment lifecycle

At ETS, AI supports multiple stages of the assessment lifecycle: content development, test assembly and delivery, and scoring.

Content development

We use our proprietary AI content engine to generate first drafts of items and related content across most of our major programs. We set the AI guardrails, constraints, and requirements and bring decades of assessment development experience to direct the initial generation in appropriate ways. Today, close to 80% of our assessment content, including questions and reading passages, start this way.

But generating content is only the starting point. Before an item is used in any of our programs, it goes through a structured review process that aims to ensure its fairness and accessibility while confirming that it behaves according to expectations and the intended rubric. In simple terms, we do not treat AI output as finished work. We treat it as a candidate that must earn its way into use.

Assembly and delivery

We use AI to help personalize tests by adapting them in real time. In an adaptive testing environment, questions or tasks can be selected based on how the test taker responded to the ones before it, helping the assessment gather the right evidence more efficiently. This kind of assessment can allow for shorter, more efficient testing times, reducing the amount of “seat time” for test takers, as well as tailoring content more closely aligned to their level.

This is not just a better way for test takers to show what they can do. It is also an important security measure: people don’t get assigned the same exact form and so may get different sets of content.

Scoring

ETS has used AI in scoring since the early 2000s, long before the rise of large language models (LLMs). The real question is not whether AI can score a response, but whether it can do so reliably, fairly and to the standards of the program it serves.

That is why some ETS assessments are scored entirely by humans, while others use only AI and others utilize a combination of both AI and human scoring depending on the response type. The right scoring model depends on the program, the stakes of the score, the kind of response being evaluated, and the expectations of the markets it serves, all in service of producing the most accurate, fair and defensible result for each learner.

What "trust" means for our stakeholders

Trust in AI-enabled assessment is not a single quality. It is whether the system consistently produces results that are valid, fair and reliable and whether the people relying on those results believe that it does.

Core ETS stakeholders each understand trust as it relates to AI differently. Test takers often view trust as a result of fairness and transparency, while institutional partners may require evidence of disciplined lifecycle controls and humans in the loop. Partners are interested in continued monitoring to ensure that AI is not weakening comparability, reliability, or fairness as programs grow, and policymakers need a clear account of how risks are identified, measured and managed across populations.

At ETS, the goal is not to use AI everywhere. It is to use it where it helps us do more for learners and institutions as we uphold the standards we have built over decades. That means using the right method for the task, keeping humans in charge and thoroughly evaluating evidence before we trust any new capability. That is how we make AI useful and responsible while maintaining the trust instilled in us and our products by our scorers and educators.

{"teaserCardGridModuleHeader":"Insight Drives Progress","teaserCardGridModuleDescription":"Discover the research, stories and ideas moving education, work and human potential forward.","teaserCardGridModuleTheme":"ets-xdark","showSeparator":true,"teaserCards":[{"teaserCardTitle":"Discover AI at ETS","teaserCardDescription":"Learn about our AI vision, principles and solutions - and how we’re empowering our workforce with real-world AI skills.","teaserCardImage":"/content/dam/ets-org/brands/insights-and-perspectives/ai.png","teaserCardImageAlt":"Image 1","teaserCardLink":"/ai.html","enableGatedContent":false,"ctas":[]},{"teaserCardTitle":"Human Progress Report","teaserCardDescription":"See how ETS’s mission comes to life through people and impact. These are stories of transformation, opportunity, and progress in action.","teaserCardImage":"/content/dam/ets-org/Rebrand/Photos/insights-teaser-card-image-1.webp","teaserCardImageAlt":"Image 2","teaserCardLink":"/human-progress-report.html","enableGatedContent":false,"ctas":[]}],"ctas":[]}