Internal AI systems pose growing risks outside the scope of current laws

Internally deployed AI systems differ from public-facing systems in three critical ways: configuration, access, and application. Companies often run internal versions of advanced models with reduced safety constraints, allowing them to comply with a wider range of instructions. These models may have direct access to proprietary codebases, internal databases, and production infrastructure. In some cases, they can execute code, initiate training runs, or coordinate with other AI systems in multi-agent workflows.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 16-01-2026 18:03 IST | Created: 16-01-2026 18:03 IST
Internal AI systems pose growing risks outside the scope of current laws
Representative Image. Credit: ChatGPT

New evidence suggests that the most powerful and risky uses of advanced AI are happening out of sight, inside the companies that build them. A new academic study published on arXiv warns that this blind spot could undermine the core goals of frontier AI regulation.

In a study titled Internal Deployment Gaps in AI Regulation, MIT CSAIL researchers Joe Kwon and Stephen Casper examine how current AI laws handle internal deployment of advanced models. Their analysis covers the European Union’s AI Act and General-Purpose AI Code of Practice, California’s SB 53, New York’s RAISE Act, and the proposed US federal Artificial Intelligence Risk Evaluation Act. Across jurisdictions, the authors find a shared structural failure: regulations are built around external deployment, while internally deployed AI systems with high capability and high risk often fall through the cracks.

Why internal AI deployment matters more than regulators assume

Internal deployment refers to AI systems that are fully operational and producing value but are used exclusively within the organization that developed them. This distinguishes them from both experimental research tools and externally deployed products. According to the study, this category of use is expanding rapidly, driven by strong economic incentives and competitive pressure.

Internally deployed AI systems differ from public-facing systems in three critical ways: configuration, access, and application. Companies often run internal versions of advanced models with reduced safety constraints, allowing them to comply with a wider range of instructions. These models may have direct access to proprietary codebases, internal databases, and production infrastructure. In some cases, they can execute code, initiate training runs, or coordinate with other AI systems in multi-agent workflows.

These capabilities enable high-stakes uses that would raise immediate concerns if exposed externally. Internal AI systems can analyze detailed behavioral data, automate parts of scientific research, design new AI models, or optimize engagement algorithms using sensitive information. Because these systems never reach customers or the public, they generate no market signals, user complaints, or media scrutiny. Oversight depends almost entirely on voluntary disclosure by companies.

The researchers argue that this lack of visibility creates a governance problem. External deployment triggers regulatory attention through licensing, audits, reporting requirements, and public scrutiny. Internal deployment, by contrast, allows companies to operate highly capable systems without producing any external signals. Yet these systems shape the pace, direction, and risk profile of future AI releases. Decisions made behind corporate walls today determine what AI systems the public encounters tomorrow.

Internal deployment is not a lower-risk phase of AI development. In many cases, it is where risks are highest. Systems operate with broad permissions, minimal safeguards, and deep integration into complex organizational processes. Traditional oversight tools designed for discrete products struggle to apply when AI systems evolve continuously through internal use.

Three regulatory gaps leaving internal AI largely unchecked

After reviewing major AI regulatory frameworks enacted or proposed in 2025, the authors identify three structural gaps that consistently limit oversight of internal AI deployment.

The first gap is scope ambiguity. Many regulations rely on a clear distinction between research and deployment, exempting research activities from compliance obligations. In practice, internal AI use often blurs this boundary. A system used to automate research, optimize operations, or analyze data is simultaneously performing operational work and contributing to ongoing development. Regulations frequently fail to specify when internal use crosses the threshold into regulated deployment, allowing companies to classify high-stakes internal operations as exempt research.

This ambiguity appears across jurisdictions. In the European Union, the AI Act exempts research and development activities while also defining deployment to include internal use. The lack of clear boundaries allows companies to interpret obligations narrowly. In the United States, federal proposals tie enforcement primarily to external deployment, leaving internal-only systems without a clear enforcement mechanism. Even state-level laws that explicitly mention internal use rely on undefined terms that leave compliance thresholds uncertain.

The second gap is static compliance criteria. Most AI regulations are built around point-in-time assessments, periodic reporting, or pre-deployment gates. These approaches assume stable products with discrete release cycles. Internal AI systems do not fit this model. They can be fine-tuned daily, integrated with new tools, forked into multiple variants, or embedded into evolving workflows without any clear moment that triggers reassessment.

The study finds that even frameworks attempting to address continuous evolution rely heavily on self-assessment. Companies decide when changes are significant enough to warrant new disclosures or evaluations. Incremental updates that individually appear minor can collectively transform a system’s capabilities without ever crossing a self-defined threshold. Fixed reporting intervals, such as quarterly summaries, provide snapshots but struggle to capture rapidly evolving systems optimized for constant iteration.

The third gap is information asymmetry. Regulators lack independent means to determine which internal AI systems exist, how capable they are, or how they are being used. Unlike public-facing systems, internal deployments produce no observable signals. This creates a circular enforcement problem. Regulators need information to establish jurisdiction, but they often lack the authority to compel disclosure until jurisdiction is established.

Some laws attempt to address this through incident reporting or confidential summaries. However, these mechanisms capture failures rather than routine operations and rely on company-provided characterizations rather than verifiable data. Registration requirements help identify developers but not the specific systems they operate internally. Authorities may have the power to request detailed information, but only if they already know what to request and from whom.

According to the authors, this structural imbalance leaves regulators dependent on whistleblowers, accidents, or eventual public release to gain insight into internal AI systems. By the time such signals emerge, the systems may already have shaped external AI development in irreversible ways.

Why existing AI laws struggle and what comes next

The persistence of these gaps, the study argues, reflects deeper tensions in how AI governance has been designed. Traditional product regulation assumes clear stages of development, testing, and deployment, with risks emerging primarily at market entry. Internal AI deployment challenges these assumptions. Systems can perform consequential work while simultaneously serving as tools for further development, making categorical distinctions difficult to enforce.

The authors also point to legitimate confidentiality concerns. Internal AI systems often embody trade secrets, proprietary data, and strategic capabilities. Companies resist broad disclosure out of fear of leaks, competitive harm, or security risks. Yet other high-risk sectors, such as pharmaceuticals and nuclear energy, have developed oversight mechanisms that balance confidentiality with verification. The absence of similar structures in AI regulation is a policy choice rather than an inevitability.

Economic incentives further complicate oversight. Internal AI systems that automate scarce expertise or accelerate research provide compounding competitive advantages. Companies face strong pressure to deploy quickly, even if doing so increases risk. Security measures that would protect internal systems from misuse or theft are costly and slow deployment. In highly competitive and geopolitically sensitive contexts, speed often wins.

Rather than prescribing a single solution, the study maps potential policy approaches and their tradeoffs. One option is to define regulatory scope based on what systems do rather than how companies label them, focusing on functional characteristics such as access to sensitive infrastructure or reliance for consequential decisions. Another approach is to trigger oversight based on capability indicators or use cases rather than fixed timelines. Ongoing oversight mechanisms, such as continuous logging or regulator access, could replace point-in-time assessments for systems that evolve through use.

The authors also call for more robust reporting frameworks that include verifiable data while protecting confidentiality. Incentives for voluntary transparency, combined with credible penalties for non-compliance, could improve situational awareness without stifling innovation. Whistleblower protections emerge as a critical component, given the lack of external signals from internal deployment.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback