From fairness to gatekeeping: Competing futures of AI in language assessment

This dystopian vision sees examinations produced rapidly with minimal human oversight, relying on narrow constructs that fail to capture the complexities of communication. Automated scoring systems would be opaque, and biometric data collected for security might be monetized by private companies. The result would be an expansion of gatekeeping barriers for test-takers, increasing inequality while reducing transparency.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 10-09-2025 13:05 IST | Created: 10-09-2025 13:05 IST
From fairness to gatekeeping: Competing futures of AI in language assessment
Representative Image. Credit: ChatGPT

Artificial intelligence has rapidly entered the field of language testing, bringing both opportunity and unease. A new study by Luke Harding of Lancaster University explores these tensions, asking whether AI can transform language assessment responsibly or whether it risks creating new inequities. 

The study positions language testing at a crossroads. While AI tools promise efficiency, personalization, and fairness, they also raise concerns about corporate control, opaque decision-making, and diminished test-taker rights. The research identifies both dystopian dangers and utopian possibilities, framing the debate around ethics, accountability, and the future role of educators and learners in AI-driven systems.

The paper, titled “Utopian and dystopian visions: Steering a course for the responsible use of artificial intelligence in language testing and assessment”, was published in Language Testing.

The dystopian future: Loss of control and ethical risks

The study opens with a sobering scenario: a worst-case future where AI transforms language testing into a mechanized and profit-driven industry. Harding draws on critiques from educational technology scholars and popular culture to illustrate how unchecked AI adoption could lead to homogenized, mass-produced tests disconnected from learning.

This dystopian vision sees examinations produced rapidly with minimal human oversight, relying on narrow constructs that fail to capture the complexities of communication. Automated scoring systems would be opaque, and biometric data collected for security might be monetized by private companies. The result would be an expansion of gatekeeping barriers for test-takers, increasing inequality while reducing transparency.

Critics such as Ben Williamson and Ricaurte highlight the risks of “platformization,” where schools and assessment systems become dependent on corporate infrastructures. In this scenario, AI is less about enhancing education and more about consolidating control. Harding warns that the temptation to prioritize expediency over fairness could lead to irresponsible practices that undermine the very purpose of language testing.

The utopian alternative: Fairness, transparency, and learning-oriented design

Despite these risks, the author emphasizes that language testing has long engaged with ethical responsibility, offering a foundation for more hopeful futures. He highlights principles such as fairness, accountability, and test-taker rights established in earlier critical language testing research, arguing these can guide responsible AI adoption.

Recent initiatives reflect this shift. The International Language Testing Association revised its Guidelines for Practice between 2023 and 2025 to integrate AI considerations, prioritizing fairness, quality, and transparency. Test providers like Duolingo and ETS have also issued standards emphasizing validity, bias mitigation, privacy, and educational impact. These guidelines aim to set guardrails for developers, ensuring AI systems enhance rather than undermine equity.

Harding envisions a utopian model where AI becomes embedded in learning itself. In this vision, assessments would be context-sensitive, inclusive, and diverse, reflecting real-world language use. Scoring systems would be explainable and bias-monitored, while test-taker data would remain private and under learner control. Instead of one-off, high-stakes exams, assessment could become a seamless, ongoing process integrated into instruction.

Navigating the middle ground: Emerging research and open questions

The paper also surveys seven articles featured in the special issue on AI in language assessment, which collectively show cautious optimism. These studies explore areas such as AI-driven writing tasks, fairness in automated speech scoring, and feedback systems that align assessment more closely with learning. Across these contributions, Harding identifies recurring commitments to transparency, inclusivity, and authenticity.

Yet, unresolved questions remain. Harding raises two deeper philosophical challenges. The first concerns whether AI can ever be trusted to make ethical or moral decisions. While large language models can simulate reasoning, they lack consciousness and moral duty. Delegating high-stakes assessment judgments to such systems risks ceding professional responsibility without accountability.

The second challenge is whether AI literacy among test-takers and stakeholders will be sufficient. Beyond traditional assessment literacy, future learners may need to understand algorithms, data rights, and the limitations of AI systems. Harding suggests that fostering critical AI literacy will be essential but warns that placing too much burden on individuals risks shifting responsibility away from institutions with greater power.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback