New AI model mimics human thinking across domains, outperforms cognitive theories
In rigorous tests, Centaur consistently outperformed traditional cognitive models, including Prospect Theory and reinforcement learning frameworks. On held-out participant data, it yielded significantly lower negative log-likelihoods, indicating tighter alignment with actual human behavior. In nearly every test across the 160 experiments, Centaur emerged as the superior predictor.

Researchers have developed a new AI model called Centaur that can predict human decisions across dozens of psychological tasks - a breakthrough that brings humans closer to a long-standing goal in psychology.
In a study published in Nature titled “A foundation model to predict and capture human cognition,” researchers introduced Centaur, a foundation model fine-tuned on large-scale human behavioral data. Built on Meta AI’s Llama 3.1 70B large language model, the new model sets a new standard in predicting human decision-making across diverse experimental domains.
Developed using data from more than 10 million human choices in 160 psychological experiments involving over 60,000 participants, Centaur not only outperformed domain-specific cognitive models but also demonstrated robust generalization across novel scenarios.
Can a single model predict human behavior across domains?
The study aimed to build a model that can predict human decisions in any experimental setting expressed in natural language. For this, researchers curated a massive dataset called Psych-101, encompassing trial-by-trial data from experiments ranging from memory tests and multi-armed bandits to complex decision-making and learning tasks.
Centaur was trained using quantized low-rank adaptation (QLoRA), a parameter-efficient fine-tuning method that adjusted only a small portion of the model’s parameters. Despite this limited modification, just 0.15% of Llama’s weights, Centaur demonstrated remarkable gains in accuracy.
In rigorous tests, Centaur consistently outperformed traditional cognitive models, including Prospect Theory and reinforcement learning frameworks. On held-out participant data, it yielded significantly lower negative log-likelihoods, indicating tighter alignment with actual human behavior. In nearly every test across the 160 experiments, Centaur emerged as the superior predictor.
Importantly, Centaur didn’t just match averages, it reproduced the full distribution of participant trajectories. In tasks such as the two-step paradigm, it reflected not only model-free and model-based strategies but also their combinations, just as observed in real human populations.
How well does Centaur generalize to unseen scenarios?
One of the most critical benchmarks for a foundation model is its ability to generalize beyond its training data. Centaur met this challenge head-on. Researchers tested Centaur in three progressively difficult out-of-distribution conditions: modified cover stories, altered task structures, and entirely new domains. In each case, it retained predictive superiority:
- In a reframed version of the two-step decision task, where the original story of “spaceships” was replaced with “magic carpets,” Centaur maintained high accuracy, even though the new narrative was absent from training.
- In a structurally modified experiment called Maggie’s Farm, which introduced a third choice option in a multi-armed bandit setting, Centaur again outperformed both the base model and cognitive models, showing resilience to changes in task complexity.
- In a logical reasoning test based on LSAT-style items, a domain entirely excluded from its training set, Centaur achieved strong predictive performance, confirming its ability to generalize far beyond familiar patterns.
Further evaluations on moral decision-making, economic games, and naturalistic learning tasks reinforced this finding. Across six additional out-of-distribution settings, Centaur retained robust performance while smaller or non-fine-tuned models faltered.
Can this model also reveal how the human brain thinks?
Centaur’s alignment with human behavior is not only statistical but also neurological. The researchers conducted a novel set of tests to examine how well Centaur’s internal representations correlate with brain activity.
In two neuroimaging studies, participants completed decision-making and sentence-reading tasks while undergoing fMRI scans. Centaur’s hidden layer activations, without being explicitly trained for neural tasks, showed significantly stronger correlations with brain activity than the base Llama model. This included accurate decoding of activation in brain regions like the left motor cortex, accumbens, and medial prefrontal cortex, areas known for decision-making and reward processing.
Centaur also conformed to behavioral regularities such as Hick’s Law, which links decision time to response entropy. By modeling nearly 4 million human response times, Centaur outperformed both Llama and cognitive baselines in predicting reaction speed, showing that it captures not just what people choose, but how long they take to decide.
The research extended further, using Centaur to guide scientific discovery. When paired with DeepSeek-R1, another AI system, Centaur facilitated the creation of a new model for multi-attribute decision-making that better matched human choices than traditional strategies. Even when DeepSeek-R1 proposed a novel heuristic, Centaur’s unmatched predictive power helped refine that model into a hybrid strategy, ultimately delivering a solution that balanced accuracy and interpretability.
Implications: Toward a Unified Theory of Mind
This breakthrough opens pathways for automated cognitive modeling, allowing scientists to simulate human behavior at scale without crafting problem-specific models. It can be used for in silico experiments, hypothesis generation, and personalized behavioral prediction.
The researchers also emphasize that Centaur could guide the next generation of neuroscience-informed AI, potentially helping to determine which architectural principles, such as attention mechanisms or vector-based memory, best capture the human mind.
Although the Psych-101 dataset currently focuses on decision-making and learning, plans are in place to expand into psycholinguistics, social cognition, and cross-cultural psychology, addressing existing limitations such as its bias toward WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations.
- FIRST PUBLISHED IN:
- Devdiscourse