Generative AI works best as learning support, not a teacher replacement
Generative AI feedback can lift student achievement, but its classroom payoff depends heavily on how it is used, according to a new meta-analysis by Chinese researchers that reviewed experimental evidence from schools and universities.
Published in Education Sciences, the study titled "Can Generative AI Feedback Effectively Enhance Learning Outcomes? A Meta-Analysis of 36 Experimental and Quasi-Experimental Studies," analyzed 36 experimental and quasi-experimental studies published from 2023 to 2025, covering 72 effect sizes, to assess whether GenAI feedback improves learning outcomes and which teaching conditions shape its effectiveness.
GenAI feedback delivers moderate academic gains
The findings show that GenAI feedback has a moderate positive effect on academic achievement, with an overall effect size of 0.61. The result places GenAI feedback in the category of a meaningful educational support tool, particularly when it is used to provide timely, personalized and actionable responses that help students revise, reflect and improve their work.
The study addresses a major question facing schools and universities as tools such as ChatGPT and other LLMs move into classrooms: whether AI feedback actually improves learning or simply makes tasks faster and easier. The answer, based on the meta-analysis, is conditional. GenAI feedback can enhance learning, but the gains are strongest when the technology functions as scaffolding rather than as a substitute for student thinking.
The authors reviewed studies involving K-12 and college students across disciplines, including language learning, STEM subjects and professional or applied fields. The analysis included studies that compared learning conditions with and without GenAI feedback and reported sufficient data to calculate academic performance effects.
The results suggest that GenAI feedback can help learners by offering immediate responses, personalized guidance, error correction, model answers, prompts for revision and support for self-regulated learning. In digital and large-enrollment settings, where teachers often cannot provide frequent individual feedback, GenAI can help close a long-standing gap in instructional support.
The paper also warns against treating AI feedback as an automatic solution. The studies showed high variation in outcomes, meaning some learners and contexts benefited much more than others. The authors link this variation to differences in learner readiness, motivation, prior knowledge, digital self-regulation and instructional design.
The risk is cognitive offloading. When students use GenAI to obtain easy answers without evaluating, revising or internalizing feedback, the technology can weaken deep thinking, reduce initiative and encourage dependence. When GenAI is used to guide reflection, support planning and help students improve their own work, it becomes more useful.
The meta-analysis found that the positive effect on cognitive outcomes, such as academic achievement and knowledge mastery, was robust. The pooled effect for cognitive outcomes was 0.60, indicating that GenAI feedback is most reliable as a tool for strengthening knowledge acquisition and observable academic performance.
The estimated effect was much larger, at 1.43, but it was based on only five effect sizes and showed high variation. That means GenAI feedback may help students monitor, regulate and reflect on their learning, but the evidence remains too limited to treat that effect as settled.
For non-cognitive outcomes such as motivation, self-efficacy, engagement and satisfaction, the effect was smaller and not clearly significant. The pooled estimate was 0.29, with the confidence interval crossing zero. The finding suggests that GenAI feedback alone may not reliably improve students' motivation or emotional connection to learning.
Active learning settings get the strongest boost
The clearest moderator was teaching method. GenAI feedback worked better in learner-centered environments than in teacher-centered instruction.
Collaborative learning produced the strongest academic gains, followed closely by self-directed learning. Collaborative learning had an effect size of 0.71, while self-directed learning showed an effect size of 0.68. These approaches place students in active roles, requiring them to question, revise, discuss, solve problems and take responsibility for their learning. In such settings, GenAI feedback can act as a responsive partner that supports knowledge construction rather than delivering information passively.
Inquiry-based learning showed a weaker effect, at 0.34, and direct instruction showed no clear benefit. The authors interpret this pattern as evidence that GenAI feedback is most effective when students are actively using it to build understanding. In direct instruction, where learning is more receptive and teacher-led, AI feedback may add less value or even increase unnecessary cognitive load.
Teachers should not simply add GenAI feedback to traditional instruction and expect strong results. The technology works best when embedded in learning designs that require students to engage, compare, evaluate and revise.
The study also examined whether GenAI's role mattered. GenAI used as an assistant or peer appeared to perform better than GenAI used as a tutor, although the difference was not statistically significant. Assistant roles produced an effect size of 0.68, peer roles 0.77 and tutor roles 0.24. The peer finding was based on only two effect sizes, so the authors urge caution.
When AI acts as an authority figure, students may defer to it or outsource thinking. When it acts as an assistant or learning partner, it may reduce pressure and encourage exploration. This supports a practical classroom rule: GenAI should help students think, not think for them.
Other factors did not significantly moderate outcomes. Educational level, discipline, intervention duration and GenAI role did not show statistically significant differences, though some descriptive patterns emerged. Secondary students appeared to benefit more than university students, while elementary evidence was too limited. Language and STEM subjects showed similar gains. Longer interventions appeared stronger than shorter ones, but the difference was not statistically significant.
The lack of strong discipline-based differences suggests GenAI feedback is not confined to one subject area. It can support language writing, STEM learning, professional training and other fields, provided the instructional design gives students a meaningful role in processing the feedback.
The study's robustness checks strengthened confidence in the main result. Leave-one-out tests showed that no single study drove the overall finding. Sensitivity analyses using different assumptions about pretest-posttest correlations did not change the conclusion. Fixed-effect and random-effects models both showed positive effects. Publication bias checks found no significant bias through Egger's regression test, though the authors note some visual asymmetry and advise cautious interpretation.
Teachers remain central as AI becomes classroom scaffolding
GenAI feedback should be used as complementary scaffolding, not as a replacement for teachers or student effort, the study holds. For teachers, the challenge is to design feedback cycles that keep students active. GenAI can provide instant responses, examples, hints and revision prompts, but students should be required to question the feedback, compare it with learning goals, explain their revisions and make final judgments. Without that process, AI feedback can become a shortcut that weakens independent thinking.
Teachers must preserve their role in emotional support, motivation and social development. GenAI may help with cognitive and metacognitive scaffolding, but it is weaker in areas that depend on human connection, encouragement, classroom trust and emotional sensitivity. This is especially important because the study found only modest and uncertain effects on non-cognitive outcomes.
The findings also point to the need for AI literacy. Students should learn how to use GenAI feedback critically, including how to verify accuracy, detect shallow or misleading responses, avoid over-reliance and treat AI output as input for judgment rather than final authority. The authors warn that GenAI can produce errors, encourage passive dependence and create academic integrity risks when used without oversight.
For tech developers, the study suggests that future GenAI feedback systems should include transparent feedback design, human-in-the-loop controls and better support for metacognitive development. Systems should make clear whether feedback is AI-generated or teacher-validated. They should also help teachers monitor student use, especially when students appear to rely too heavily on automated responses.
Notably, the study identifies major gaps for researchers. The final sample included only 36 studies, and much of the evidence was concentrated in Asia. More studies are needed across regions, age groups, task types and learning settings. The authors also call for more research on long-term sustainability, especially whether GenAI-supported gains continue after AI scaffolding is reduced or removed.
The metacognitive finding also needs deeper testing. The early evidence suggests GenAI feedback could help students plan, monitor and reflect on their learning, but the current base is too small to support firm claims. Future research must determine whether AI feedback builds lasting self-regulation or only improves short-term task performance.
The study also leaves open questions about feedback design. Researchers need to examine which types of GenAI feedback work best, including corrective feedback, explanatory feedback, reflective prompts, model answers, motivational feedback and strategy guidance. The level, timing and tone of feedback may shape whether students use AI productively or passively.
- FIRST PUBLISHED IN:
- Devdiscourse
Google News