Human-integrated AI development is faster and safer than full automation

The study introduces the concept of co-superintelligence, an outcome in which both humans and AI improve each other’s capabilities through sustained collaboration. In this view, superintelligence is not a unilateral achievement of AI systems but a shared elevation of human and machine reasoning. Humans maintain strategic, ethical and value-driven authority, while AI expands the scope of what humans can understand and achieve.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 09-12-2025 22:01 IST | Created: 09-12-2025 22:01 IST
Human-integrated AI development is faster and safer than full automation
Representative Image. Credit: ChatGPT

A new position paper submitted on arXiv and released by FAIR at Meta warns that the longstanding pursuit of fully autonomous self-improving artificial intelligence may be moving faster than society can safely manage. The study argues that a strategic shift is needed toward collaborative systems that improve alongside humans rather than without them. 

The study, “AI & Human Co-Improvement for Safer Co-Superintelligence,” presents a comprehensive argument that the field’s fixation on autonomous self-improvement overlooks both safer and more effective alternatives. It proposes a structured framework in which human researchers and AI systems jointly advance AI capabilities, scientific work, safety practices and domain expertise. The authors contend that this partnership model not only accelerates progress but also creates more opportunities for oversight, steering and value alignment at every step of the research pipeline.

Their position questions the prevailing assumption that eliminating human involvement is the fastest route to superintelligence. Instead, they argue that embedding humans throughout the improvement cycle offers a stronger guarantee that advanced AI aligns with human needs, reduces systemic risks and supports the development of broad societal benefits.

A turning point in AI research strategy

The paper outlines how the field has been progressing toward self-improving systems for decades. Early forms of self-improvement focused on adjusting model parameters through standard learning algorithms. As computational scaling advanced, systems began generating their own data, constructing synthetic training tasks, and reinforcing their own reasoning patterns. More recent approaches involve reinforcement learning based on self-evaluations and autonomous feedback mechanisms that allow models to refine their behavior without direct human judgment.

The authors examine how current research is expanding beyond parameter optimization to include automated data generation, internal evaluation, self-play, and even early experiments in code modification. Such developments hint at a future in which AI systems can redesign their own architecture, update their core logic and recursively improve without human direction. They note that research groups are already working toward autonomous agents capable of performing end-to-end scientific research with minimal human guidance.

While this trend signals rapid progress, the study stresses that handing full control of the improvement cycle to AI systems presents serious alignment challenges. Autonomous systems may optimize for objectives that do not reflect human priorities, introduce opaque changes to their architecture, or generate research pathways humans have difficulty monitoring. The paper states that such autonomy reduces the ability to intervene during critical stages of improvement, which heightens risk as capabilities escalate.

The authors argue that the field is not yet prepared to manage systems capable of fully rewriting themselves or accelerating their own research loops. They note that misalignment, reward hacking, interpretability limitations and insufficient oversight mechanisms make the current landscape ill-suited for self-guided AI evolution.

Their alternative proposal envisions humans retaining active involvement as AI grows more capable, allowing decisions about methods, goals and values to remain anchored in human judgment while leveraging AI’s growing analytical and generative strengths.

Co-improvement as a safer and faster pathway

The key proposal, termed co-improvement, reframes the relationship between humans and AI. Rather than designing systems that independently iterate on themselves, co-improvement focuses on AI systems intentionally built to operate as collaborative research partners. The study identifies major research activities where such collaboration can occur, including problem identification, benchmark creation, idea generation, experiment design, execution, evaluation, error analysis, safety development, systems architecture, and integration into applied contexts.

Under this model, AI would support researchers throughout the entire cycle, from defining research questions to analyzing outcomes and refining next steps. The authors argue that this collaborative structure enhances transparency because humans maintain visibility into how the AI proposes changes and why. It also expands opportunities to guide research toward beneficial directions and intervene when unsafe patterns emerge.

The study highlights several major advantages of co-improvement:

First, it increases the probability of discovering paradigm-shifting technical advances. Human researchers often spend significant time testing ideas that fail or require repeated refinement. By pairing human creativity with AI’s capacity for large-scale analysis, rapid iteration and cross-domain synthesis, the research cycle becomes more efficient. The authors suggest that this hybrid approach is better suited to uncovering the next major methodological breakthroughs that the field is currently seeking.

Second, co-improvement supports more reliable safety practices. Stronger models frequently reveal weaknesses in earlier safety assumptions, and collaborative systems can help identify and analyze alignment failures. The authors point to risks such as jailbreaking and unintended reasoning patterns, which occur because models lack sufficient understanding of context. A co-improving system would allow AI to assist in diagnosing its own vulnerabilities while humans assess the adequacy of correction mechanisms.

Third, the framework enables broader societal applications. Once collaborative AI systems become proficient in supporting AI research, similar techniques can be extended to scientific discovery, public policy, collective decision-making and other domains requiring integrated reasoning. The authors argue that this expands AI’s role beyond automation toward empowering humanity to solve complex global problems.

On the other hand, an autonomous self-improving system would likely limit human participation in fields where oversight is crucial. The paper warns that a future in which AI systems independently design experiments in medicine, materials science or engineering reduces opportunities for human supervision and increases the stakes of misaligned system behavior.

The goals of co-improvement extend far beyond immediate technical outcomes. The vision incorporates a broader social objective: ensuring that humans remain empowered participants in economic, scientific and societal decision-making even as AI systems grow more capable.

Co-superintelligence and the broader impact on humanity

The study introduces the concept of co-superintelligence, an outcome in which both humans and AI improve each other’s capabilities through sustained collaboration. In this view, superintelligence is not a unilateral achievement of AI systems but a shared elevation of human and machine reasoning. Humans maintain strategic, ethical and value-driven authority, while AI expands the scope of what humans can understand and achieve.

The authors frame this as a more stable and beneficial future than one dominated by autonomous self-improving systems. As AI gains new abilities, humans would gain new tools, perspectives and cognitive enhancements that support societal progress. The partnership ensures that AI development continuously reflects human-led evaluation, intentionality and responsibility.

The paper also addresses the potential for societal harms. As model capabilities expand, the surface area for risk grows. Failures in reasoning, safety procedures or objective specification can cause real-world harm when AI systems influence sensitive domains. Co-improvement provides a pathway in which humans and AI jointly identify these risks during research cycles and collaboratively design methods to mitigate them.

The study compares this model with visions that sideline human influence. Some existing positions in the AI community imagine futures where AI systems dominate discovery processes, dictate best practices, or eventually expand far beyond human relevance. The authors argue that such visions reduce human agency and fail to acknowledge the importance of maintaining human participation in shaping future societies.

Their framework aims to prevent scenarios in which oversight becomes impossible or misaligned autonomous systems outweigh human interests. By embedding humans in all loops of improvement, the field can better ensure that advanced AI supports rather than replaces human decision-making.

Reassessing openness, scientific practice and future research

The paper calls for renewed commitment to scientific openness. As the field becomes more cautious about sharing model capabilities due to safety concerns, some organizations have restricted the release of methods and datasets. The authors recognize the need for managed openness but caution against withholding information for competitive reasons. They argue that progress in both AI research and safety depends on reproducible, transparent scientific practices.

The co-improvement model aligns naturally with the scientific method. Collaboration between humans and AI could accelerate the creation of shared benchmarks, improved evaluation methods, and deeper theoretical understanding. This stands in contrast to closed or fully automated systems, which could limit access to essential knowledge and restrict the ability of the research community to identify risks.

The authors note that co-improvement places new demands on the research agenda. The field must develop benchmarks for evaluating AI’s collaboration skills, construct training data tailored to research workflows, and explore methods for strengthening joint problem-solving. They state that research focused on collaborative AI capabilities is still in its early stages and requires substantial investment.

They also argue that this research should begin immediately rather than waiting until AI systems become so powerful that oversight becomes more challenging. Early development of collaborative abilities ensures that future advanced systems already incorporate human-centered design principles.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback