Can AI Measure Leadership? New Research Shows AI Teams Mirror Human Team Dynamics

A Harvard Kennedy School and NBER study finds that AI-based leadership tests strongly predict human leadership performance, with a high correlation of 0.81. The research suggests AI agents can reliably simulate team dynamics, offering a scalable and cost-effective way to assess soft skills.


CoE-EDP, VisionRICoE-EDP, VisionRI | Updated: 15-04-2025 10:19 IST | Created: 15-04-2025 10:19 IST
Can AI Measure Leadership? New Research Shows AI Teams Mirror Human Team Dynamics
Representative Image.

In a pioneering study from the Harvard Kennedy School’s Malcolm Wiener Center for Social Policy and the National Bureau of Economic Research (NBER), researchers Ben Weidmann, Yixian Xu, and David J. Deming present compelling evidence that artificial intelligence agents can effectively simulate human teammates in measuring leadership capabilities. Their experiment, published in NBER Working Paper, investigates whether performance in AI-driven group tasks can predict how well someone leads human teams. The results reveal a striking correlation between AI and human-based assessments, suggesting that scalable, low-cost AI simulations could revolutionize how leadership skills are identified, evaluated, and developed.

AI Agents as Stand-Ins for Human Teams

At the core of the study is a cleverly designed lab experiment using a classic “Hidden Profile” team problem, where crucial information is distributed among team members and can only be synthesized through effective communication. Each of the 249 human participants led both AI and human teams through six problem-solving sessions, with the sequence of formats counterbalanced to neutralize practice effects. The AI teammates were GPT-4-based language models carefully prompted to behave as human followers, and the human followers were recruited and organized into randomly assigned teams. The tasks required probabilistic reasoning and collaborative decision-making features that ensured leaders had to actively engage with team members to succeed.

Leaders were evaluated in two separate but structurally identical tracks: one with real humans and the other with AI agents. By comparing performance across these two tracks, the researchers aimed to see if AI-led team performance could serve as a valid proxy for human team leadership. Prior to the team tasks, all participants completed individual assessments measuring fluid intelligence, economic decision-making skill, emotional perceptiveness, typing speed, and performance on an individual version of the team task, together forming a baseline of “hard skills.”

A High-Fidelity Mirror for Leadership Skill

The central result of the study is striking: leadership performance with AI agents correlates extremely highly with leadership performance in human groups. The disattenuated correlation coefficient between the two assessments is ρ = 0.81, meaning the AI test is a powerful predictor of real-world leadership ability. Even after controlling for task-specific skills, the correlation remained strong at ρ = 0.69. This implies that the AI-based assessment does more than evaluate intelligence or typing speed; it captures critical “soft skills” such as information gathering, communication style, and strategic thinking.

To validate these findings, the researchers used rigorous methods, including repeated random assignment of leaders to human teams to isolate their causal effect on group performance. They also created new problem sets for both AI and human tests, ensuring that no AI agent could have been exposed to the answers during training.

What Good Leaders Actually Do

The study doesn’t just stop at measuring outcomes, it delves into what makes a good leader effective. Successful leaders in both AI and human groups tended to ask more questions, facilitate back-and-forth dialogue, and use inclusive language like “we” and “us.” Interestingly, the quantity of speech (measured in word count) was not associated with better performance. Instead, it was the structure and tone of communication that mattered most.

Notably, demographic traits such as gender, age, ethnicity, and educational background had no significant relationship with leadership performance. Instead, higher scores in fluid IQ, emotional perceptiveness, and decision-making skill were the most consistent predictors. Emotional perceptiveness, in particular, showed a stronger correlation with success in human teams than in AI teams, perhaps a sign that emotional dynamics still play a greater role in human-human interactions than in AI-mediated ones.

Overconfidence, Self-Awareness, and the Desire to Lead

The research also uncovered intriguing psychological dynamics. Overconfident individuals, those who overestimated their performance relative to others, were more likely to express a desire to lead, regardless of whether their teammates were AI or human. However, those who were more accurate in assessing their own abilities actually contributed more positively to team performance. This finding offers a valuable distinction between leadership ambition and leadership effectiveness and suggests that tools like the AI test could help organizations better align leadership roles with true capability rather than self-promotion.

The study further demonstrates that the AI test replicates established social science findings. For example, participants with inclusive communication styles tended to be more effective leaders, a pattern long recognized in studies of team cohesion and productivity. The fact that these dynamics emerged in AI settings as well reinforces the test’s robustness and its potential as a reliable research tool.

Scalable Testing, Democratized Access

The implications of this research are far-reaching. Traditional leadership assessments that rely on human interaction are expensive, logistically complex, and often reserved for elite or well-funded institutions. The human portion of this study, for instance, cost $114 per participant and required live oversight. In contrast, the AI-based version cost just $23 and ran autonomously, eliminating the need for scheduling and coordination. This dramatic drop in cost and complexity could democratize access to high-quality leadership evaluation, allowing more organizations, schools, nonprofits, startups, and researchers to assess and develop leadership skills at scale.

Despite some limitations, such as the AI agents’ reduced responsiveness to emotional cues and lack of strategic improvisation compared to a strong human teammate, the researchers view this study as a successful proof of concept. They call for future work to refine AI agent behavior to better mirror the diversity and nuance of real human interaction. Ultimately, they envision these tools not as replacements for human leadership but as accelerators for discovering and nurturing it.

In a world where soft skills like collaboration, decision-making, and emotional intelligence are becoming increasingly valuable in the labor market, this study lights a clear path forward. AI-based leadership assessments may soon play a central role in how talent is recognized, hired, and developed, one simulation at a time.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback