Many AI systems depend on hidden human labor, not true automation

The study advances a broader argument about the future of responsible AI. It contends that ethical AI cannot be achieved through policy statements alone. Instead, ethics must be embedded directly into system architecture through enforceable technical constraints. The autonomy coefficient serves as a bridge between ethical principles and engineering practice, translating abstract commitments into measurable requirements.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 17-12-2025 18:19 IST | Created: 17-12-2025 18:19 IST
Many AI systems depend on hidden human labor, not true automation
Representative Image. Credit: ChatGPT

Artificial intelligence systems sold as autonomous are increasingly dependent on hidden human labor, creating a structural integrity crisis in the AI industry. A growing gap has emerged between how these systems are marketed and how they actually operate in production. A new research paper argues that this gap is not a marginal implementation flaw but a systemic design failure that undermines responsible AI, exploits labor, and misleads users, investors, and regulators.

The study, titled AI Autonomy or Human Dependency? Defining the Boundary in Responsible AI with the α-Coefficient, was released on arXiv. The paper introduces a formal framework to distinguish genuinely autonomous AI systems from those that rely on humans as unseen operational substitutes, offering a measurable standard for AI autonomy and a deployment gate designed to block deceptive systems before they reach the market.

While human oversight has long been framed as a safeguard for fairness, safety, and accountability, the paper argues that it is now routinely misused to conceal systems that cannot function without continuous human intervention. This practice, the study warns, has created a false narrative of AI progress while embedding ethical and economic risks deep into modern AI architectures.

When human-in-the-loop becomes human-instead-of-AI

Human-in-the-Loop models were originally developed to strengthen AI systems, not replace them. Early applications focused on tasks such as data labeling, model evaluation, and feedback loops that helped improve performance over time. In high-risk environments, humans were meant to validate outputs, catch rare errors, and provide ethical oversight. These roles were designed to complement a functioning AI core.

The study argues that this balance has shifted. In many commercial deployments, humans are no longer refining AI decisions but performing the decisions themselves. The paper defines this failure mode as Human-Instead-of-AI, or HISOAI. In these systems, the probability that a human makes the final decision approaches certainty, even though the product is marketed as automated. The AI component may act only as a filter, a confidence scorer, or a thin interface layer, while humans silently carry out the substantive work.

This structural dependency carries consequences beyond technical inefficiency. Economically, it creates unsustainable cost models where labor expenses dominate operations, often hidden through outsourcing or gig-based work. Ethically, it shifts risk and responsibility onto workers who remain invisible to users and regulators. Strategically, it undermines trust in the AI sector by inflating claims of autonomy without delivering real technological capability.

The research positions HISOAI not as an implementation mistake but as an architectural deception. Existing Responsible AI frameworks emphasize transparency, explainability, and governance after deployment. However, the paper argues that these tools fail to address a more basic question: whether the system is truly AI-driven at all. Explainable AI can justify decisions, but it cannot reveal that those decisions were made by people rather than models. Governance policies can mandate oversight, but they cannot detect when oversight has become operational substitution.

To expose this blind spot, the study calls for a measurable, enforceable standard that evaluates autonomy before systems are allowed to claim AI status.

Measuring autonomy with the AI Autonomy Coefficient

The AI Autonomy Coefficient, referred to as alpha, quantifies the proportion of decisions an AI system completes without mandatory human substitution. In practical terms, it measures how often the AI operates independently versus how often humans must step in to perform the task the AI is supposed to handle.

The coefficient ranges from zero to one. A value near one indicates a system where the AI handles nearly all decisions, with humans involved only in periodic review or exceptional cases. A value near zero indicates a system that depends almost entirely on human labor. The study establishes a critical threshold: systems with an autonomy coefficient below 0.5 are classified as operating in the HISOAI condition and should not be marketed as AI products.

This threshold is not presented as a performance benchmark but as a structural integrity test. The paper emphasizes that autonomy is distinct from accuracy or quality. A system can produce acceptable outputs while still relying on humans for the majority of its decision-making. In such cases, traditional metrics like accuracy, latency, or user satisfaction fail to reveal the underlying dependency.

To operationalize the autonomy requirement, the study introduces the AI-First, Human-Empowered framework. Under this model, AI systems must demonstrate functional independence before deployment. Humans are explicitly prohibited from acting as hidden substitutes for missing or underdeveloped AI capabilities. Instead, human involvement is restricted to roles that add strategic value, such as ethical oversight, boundary testing, and long-term model improvement.

The framework is enforced through a deployment algorithm that acts as an architectural gate. Systems are evaluated in offline testing and real-world shadow environments to calculate autonomy under realistic conditions. If the system fails to meet the required autonomy threshold, deployment is blocked and the product must be re-engineered or reclassified as a human-powered service.

A case study in the paper illustrates how this mechanism works in practice. A legacy system marketed as AI-driven was found to have an operational autonomy coefficient of 0.38, with more than 90 percent of decision-making cost attributed to human labor. Under the proposed framework, this system would have been flagged and prevented from deployment until its AI core was substantially improved. After multiple redesign cycles, the successor system achieved an autonomy coefficient above 0.8, allowing deployment under the AI-First standard.

Redefining responsible AI and the role of humans

The study advances a broader argument about the future of responsible AI. It contends that ethical AI cannot be achieved through policy statements alone. Instead, ethics must be embedded directly into system architecture through enforceable technical constraints. The autonomy coefficient serves as a bridge between ethical principles and engineering practice, translating abstract commitments into measurable requirements.

One of the most significant outcomes of the AI-First, Human-Empowered framework is its impact on human labor. Under HISOAI conditions, humans perform low-value, repetitive substitution tasks, often under precarious conditions and without recognition. The research shows that enforcing autonomy thresholds fundamentally changes this dynamic. As AI systems become genuinely capable, human effort shifts toward high-value roles that machines cannot replace.

These roles include ethical oversight, where humans evaluate fairness, bias, and societal impact; boundary pushing, where humans handle novel or ambiguous cases that fall outside training distributions; and strategic tuning, where insights from deployment guide future model development. In this configuration, humans are empowered by AI rather than exploited by it.

The implications extend to regulators and investors. Current compliance frameworks focus on explainability, risk assessment, and post-hoc accountability. The study argues that these tools should be complemented by autonomy audits that verify whether a system’s claimed automation matches its operational reality. Without such audits, markets risk rewarding deceptive architectures while penalizing teams that invest in genuine AI development.

The research also raises questions about how AI products are labeled and sold. Systems that fail to meet autonomy thresholds should be marketed transparently as human-powered or hybrid services. Doing so would improve consumer trust, align pricing with actual cost structures, and reduce pressure to overstate AI capabilities.

The paper acknowledges limitations, including the need to adapt autonomy thresholds for safety-critical domains and the challenge of measuring human effort in complex workflows. However, it argues that these challenges are solvable and far less damaging than allowing structurally deceptive systems to proliferate unchecked.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback