Why current AI regulation cannot fully control autonomous systems

Why current AI regulation cannot fully control autonomous systems
Representative image. Credit: ChatGPT

A new study warns that current approaches to governing AI systems are not just flawed but fundamentally incapable of working as intended. The research asserts that widely used methods such as content filtering, monitoring, and reinforcement learning-based alignment cannot fully control AI behavior due to deep mathematical constraints, exposing critical vulnerabilities in how modern AI systems operate.

Submitted on arXiv, the paper titled "The Two Boundaries: Why Behavioral AI Governance Fails Structurally" presents a formal framework explaining why governance systems consistently fail to keep pace with the expanding capabilities of AI. The study argues that unless governance is built into the architecture of AI systems themselves, risk and inefficiency are structurally unavoidable.

The structural gap between what AI can do and what governance controls

The study is based on what the author describes as the "two-boundary model," a framework that separates AI systems into two distinct limits: the expressiveness boundary and the governance boundary.

  • The expressiveness boundary defines everything an AI system is technically capable of doing, including all possible actions it can perform through tools, APIs, and external integrations.
  • The governance boundary, on the other hand, defines what actions are actually regulated by policies, filters, and monitoring systems.

According to the study, these two boundaries are almost always designed independently in modern AI systems. This separation creates a structural mismatch that produces three distinct zones: a functional overlap where actions are both possible and governed, an ungoverned region where capabilities exist without oversight, and a "theater" region where governance policies target capabilities that do not even exist.

The study argues that only the overlapping region functions correctly, while the other two zones represent systemic failure modes. Ungoverned capabilities introduce real-world risks such as unauthorized data access or unintended actions, while governance theater creates a false sense of security by focusing on irrelevant threats.

This structural imbalance becomes more pronounced as AI systems grow more complex. Unlike traditional software with fixed capabilities, modern AI systems can dynamically combine tools and generate new action pathways at runtime. This evolving capability space makes it nearly impossible for governance systems to anticipate and cover every potential behavior.

Adding more governance layers does not solve the problem. Instead, it often expands both the governed and ungoverned regions simultaneously, increasing complexity without eliminating risk. As a result, organizations face rising governance costs while remaining exposed to critical gaps.

Mathematical limits make behavioral AI governance impossible

The study presents a more fundamental challenge rooted in computational theory. Drawing on Rice's theorem, a foundational result in computer science, McCann demonstrates that it is mathematically impossible to fully govern AI behavior using conventional methods when systems are built on Turing-complete architectures.

Rice's theorem establishes that no algorithm can determine whether an arbitrary program satisfies a non-trivial behavioral property. Applied to AI governance, this means that no system can reliably predict whether an AI program will comply with a given policy in all cases.

Most current governance approaches rely on precisely this type of prediction. Content filters attempt to identify harmful outputs, reinforcement learning methods adjust model behavior based on human feedback, and monitoring systems track actions for compliance. All of these approaches depend on evaluating the meaning or intent of AI behavior, which falls into the category of undecidable problems.

Consequently, these methods can only provide approximations. They inevitably produce false negatives, allowing harmful actions to slip through, or false positives, blocking legitimate behavior. Improving these systems does not eliminate the problem but only shifts the balance between these errors.

The study further explains that this limitation applies even to advanced techniques such as Constitutional AI and self-monitoring systems. Regardless of how sophisticated the model becomes, it cannot escape the underlying mathematical constraint. This finding challenges the prevailing assumption that better algorithms or more data can solve AI governance issues. Instead, the study suggests that the problem lies in the approach itself, not its implementation.

Why content filters, monitoring, and AI alignment tools fall short

The research provides a detailed critique of the most widely used AI governance strategies, highlighting their structural weaknesses. Content filtering systems, which analyze outputs to block harmful content, are limited by the complexity of natural language and the unpredictability of AI-generated responses. The study notes that no finite rule set can capture all possible violations, making these systems inherently incomplete.

Reinforcement learning from human feedback, a cornerstone of modern AI alignment, is described as a probabilistic method that influences model behavior without enforcing strict constraints. While it can reduce the likelihood of harmful outputs, it cannot guarantee their absence.

Constitutional AI, which introduces self-regulation mechanisms within models, faces similar limitations. The study argues that self-critique is itself a form of behavioral evaluation and therefore subject to the same undecidability constraints.

Monitoring systems, often used as a secondary layer of defense, are also criticized for their inability to prevent harmful actions. Since they operate as separate systems, they can only observe behavior after it occurs or rely on incomplete data streams. The study emphasizes that monitoring coverage gaps compound rapidly as AI systems perform multiple actions, leading to a high probability of ungoverned behavior over time.

These findings suggest that current governance strategies are inherently reactive and fragmented. They address symptoms rather than the root cause of the problem, leaving systems vulnerable to both known and unforeseen risks.

A structural solution: integrating governance into AI architecture

In response to these challenges, the study proposes a fundamentally different approach known as structural governance. Instead of attempting to control AI behavior through external layers, this model embeds governance directly into the architecture of the system.

The key principle is the separation of computation from action. In this design, AI systems are allowed to compute decisions and generate plans, but they cannot directly execute actions. All actions must pass through a centralized governance boundary that evaluates and authorizes them before execution.

This architectural change transforms governance from a reactive process into a built-in mechanism. Every possible action is subject to the same checks, eliminating ungoverned pathways by design. The study refers to this alignment between capability and control as "coterminous governance," where the expressiveness and governance boundaries become identical.

Under this model, governance decisions are based on structured data rather than interpreting arbitrary program behavior. This shift moves the problem from the realm of undecidable semantic analysis to decidable syntactic validation, avoiding the limitations imposed by Rice's theorem.

The study highlights several advantages of this approach. Governance becomes total, covering all actions without exception. System behavior becomes more transparent, as every action is recorded and validated through the same process. Additionally, governance scales more efficiently, as it is integrated into the execution pipeline rather than added as separate layers.

The research also notes that structural governance does not eliminate the need for behavioral methods entirely. Techniques such as content filtering and alignment can still play a role in shaping AI outputs, but they operate within a system that guarantees control over real-world actions.

Implications for AI policy, industry, and future system design

The study suggests that current regulatory approaches, which often focus on improving oversight mechanisms, may be insufficient if they do not address the underlying architecture of AI systems.

For industry, the findings call for a shift in system design. Building AI systems with structural governance requires rethinking how computation and action are integrated, potentially increasing development complexity but offering stronger guarantees of safety.

The research also highlights the distinction between policy enforcement and policy design. While structural governance can ensure that rules are followed, it does not determine whether those rules are appropriate. This underscores the importance of human judgment and broader societal input in defining AI policies.

Despite its strong claims, the study acknowledges practical challenges. Implementing structural governance may require significant changes to existing systems and development practices. It may also introduce trade-offs in flexibility and ease of use, particularly during early stages of development. However, the study argues that these costs must be weighed against the risks of continuing with current approaches.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback