Generative AI raises quality concerns in evidence-based policy work
- Country:
- United Kingdom
Generative artificial intelligence (genAI) has not yet transformed policy research inside UK think tanks, despite widespread debate over its potential to reshape knowledge work, according to new research by researchers from the University of the West of Scotland.
The study, titled "Generative artificial intelligence in policy analysis: a study of UK think tanks" and published in AI & Society, finds that GenAI is being used by many think tank researchers, but mostly for routine support tasks rather than for agenda-setting, policy design or substantive public-facing analysis.
The findings challenge claims that AI systems are close to replacing policy analysts or radically changing how evidence-based policy ideas are produced. Instead, the study presents a more cautious picture: UK think tanks are experimenting with tools such as ChatGPT, Claude, Perplexity and Gemini, but researchers remain wary of errors, bias, weak sourcing, shallow analysis and the loss of human judgement in policy work.
The research is based on a survey of 554 policy analysts and policy researchers from 94 UK-based think tanks, alongside in-depth interviews with eight people from the same professional group.
Think tanks use AI, but mainly for routine work
The study finds that about three-quarters of respondents had used GenAI in their professional work, but regular use remained limited. Only 8% used GenAI daily, while 49% said they used it only occasionally.
Think tanks operate in a sector where speed, visibility and influence are central to survival. Many are small organisations that must respond quickly to political, economic and social developments. In theory, GenAI could help them scan evidence faster, draft outputs more quickly and reach policy audiences with greater speed. In practice, the study finds that most use remains narrow and cautious.
Researchers reported using GenAI for transcription, note-taking, meeting summaries, presentation preparation, grammar checking, coding help, spreadsheet troubleshooting and early-stage information gathering. These uses were largely seen as acceptable because they saved time without handing over core analytical judgement to a machine.
The study found that 43% of respondents used GenAI mainly to improve research efficiency, while only 10% said they used it to improve research quality. A further 25% pointed to data analysis support as a use case. More than 70% said GenAI helped ease workload, especially by reducing time spent on repetitive or lower-value tasks.
The strongest support for GenAI came in areas that researchers viewed as administrative or technical. Many saw value in using AI to summarise documents, support desk research, fix code, help with R or spreadsheet formulas, and condense internal material. Some used it as a more advanced search aid to locate examples of policy interventions or to gain an initial view of a subject area.
The paper finds little evidence that think tank analysts are using GenAI as a serious substitute for human-led policy judgement. Researchers were especially reluctant to use it for producing policy ideas, conducting deeper analysis, writing public reports or making qualitative judgements. GenAI is being treated less as a policy analyst and more as an effort-saving assistant. It is helping with the edges of policy work, not taking over the centre.
Human judgement remains central to policy ideas
The study divides policy analysis at think tanks into three main functions: problem recognition and definition, policy formulation, and communication or dissemination. Across all three, the authors find that GenAI has made only modest inroads.
Problem recognition and definition
The first function is the most sensitive. Think tanks help define what counts as a public problem, how that problem should be understood and what kinds of solutions should enter political debate. This work is not simply technical. It depends on values, political context, institutional knowledge, public priorities and judgement about trade-offs.
In this area, respondents were particularly sceptical of GenAI. Many argued that AI systems draw from existing material and are not suited to producing original policy thinking. The risk, they said, is that overreliance on AI could flatten debate, recycle existing assumptions and reduce the creativity needed to define new or complex public problems.
Some researchers did report using GenAI at the very start of projects. These uses included gathering basic overviews, identifying broad arguments in a debate or testing possible directions for early research. But that support appeared to fade as projects moved into deeper evidence-building and argument development. The study finds that researchers often stopped using GenAI once work required more context, sharper judgement and stronger control over evidence.
Policy formulation
Think tanks often seek to turn evidence into concrete policy proposals. This requires weighing competing evidence, dealing with incomplete data, assessing feasibility and understanding how policy choices affect different groups. The study finds little sign that GenAI is being widely used to perform these tasks.
Trust was the key barrier, with respondents worrying about hallucinations, weak citations, factual mistakes, algorithmic bias and outdated information. Some said AI-generated summaries could be inaccurate enough to require extensive checking, reducing the time savings that made the tools attractive in the first place.
Several participants also raised concerns about skill loss. If researchers rely too heavily on AI to summarise sources, analyse text or form arguments, they may weaken the very abilities that make policy analysis valuable. That fear goes beyond technical accuracy. It speaks to the professional identity of policy analysts, whose work depends on close reading, interpretation, evidence selection and argument construction.
The study identifies some more ambitious uses. One respondent described work with an external provider to use chatbots for qualitative interviews at a scale that would otherwise be difficult. Another described exploring GenAI for thematic analysis of large text sets. These examples suggest that AI may expand some research methods, especially where volume is a constraint. But they remain exceptions rather than evidence of broad sector-wide change.
Communication and dissemination
The third function showed more use, but still mostly at the margins. Think tanks depend on reports, media work, policy briefs, events, blogs and social media to influence decision-makers and public debate. GenAI was used for grammar checks, shorter summaries, title ideas, proofreading, presentation planning and, in some cases, image or design support. But respondents were cautious about letting AI draft public-facing work because audience judgement, tone and persuasion require a strong grasp of policy context.
Lack of AI rules raises concern
GenAI adoption is advancing faster than formal organisational policy in many think tanks. Around half of survey respondents said there had been informal discussion about AI use inside their organisations. However, only 7% said those discussions had produced safeguards, principles or policies. About one-third said their think tank had no policy in place to regulate GenAI use.
The study found that 91% of respondents had received no GenAI training from their organisation. Where training did exist, it was described as occasional, voluntary or informal. This leaves researchers to make case-by-case decisions about how far AI should be trusted, when it should be disclosed and what uses are ethically acceptable.
Think tanks often value autonomy, flexibility and intellectual independence. Several interviewees suggested that small organisations may not need rigid rules and that professional trust can guide responsible use. But the study also shows that informal norms may not be enough as GenAI becomes more powerful and more common.
The risks are not only internal. Think tanks play a role in shaping public debate and influencing government-level decision-making. If AI-generated errors or biased outputs enter policy analysis, the consequences can travel beyond the organisation that produced them. Weak policy ideas, unreliable evidence summaries or poorly checked analysis can affect public understanding and policy choices.
In view of the risks, the study calls for clearer codes of conduct, especially around transparency, accountability and explainability. The authors argue that organisations influencing public policy should make sure humans retain intellectual oversight over AI-supported outputs.
The findings also place limits on both the most alarmist and most optimistic predictions about GenAI. The study finds no evidence that AI is close to ending the role of the policy analyst in UK think tanks. Nor does it support claims that GenAI has already revolutionised policy research. Instead, it shows a mismatch between big claims about AI-driven transformation and the everyday reality of limited, careful use.
The key reason is that policy analysis is not only about processing information. It is about deciding which problems matter, what evidence counts, which trade-offs are acceptable and how proposals fit political and social realities. Those tasks remain deeply human, even when AI can assist with searching, summarising and formatting.
- FIRST PUBLISHED IN:
- Devdiscourse
Google News