Public sector AI can harm trust, rights and fairness


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 12-03-2026 19:58 IST | Created: 12-03-2026 19:58 IST
Public sector AI can harm trust, rights and fairness
Representative Image. Credit: ChatGPT

Governments are expanding their use of algorithmic systems in policing and fraud detection, but a new review suggests the real issue is no longer whether artificial intelligence can improve public enforcement work in theory. It is whether public agencies can deploy it without deepening bias, weakening legal protections, or undermining public trust. In the new study, researchers argue that the outcomes of these systems depend less on technology alone and more on the conditions in which they are designed, adopted, and governed.

The study, “Conditions of benefits and risks when algorithmic technology is implemented for public sector policing and fraud detection: a systematic literature review,” published in AI & Society, reviewed 157 studies across disciplines and found a stark divide in the research itself. Engineering and data science work tends to stress accuracy, efficiency, and performance gains, while social science and public administration research is far more focused on risks such as discrimination, privacy loss, legal exposure, and damage to institutional legitimacy. 

The authors note that governments have increasingly explored machine learning systems for crime prediction, fraud detection, and enforcement support, driven by the belief that these tools can improve efficiency and help allocate scarce resources more effectively. However, some of the most visible deployments have produced severe failures, from inaccurate welfare fraud systems to policing tools linked to discriminatory outcomes. The review treats those tensions not as isolated scandals, but as signs of a broader governance problem.

Where the case for algorithmic enforcement looks strongest

The review does find reasons governments remain interested in these systems. Across the engineering and technical literature, algorithmic tools are repeatedly presented as a way to improve enforcement efficiency and detection performance. In fraud detection, some reviewed studies reported very high accuracy rates under controlled conditions, including results as high as 95 percent for insurance fraud detection and 99.3 percent in a model for social security fraud. In predictive policing, the review points to studies claiming strong performance in forecasting crime locations or macro causes of crime. These findings explain why administrators facing budget pressure and rising expectations may see algorithmic systems as attractive tools.

But the authors say those technical gains are not enough by themselves. The review identifies four conditions linked to potential benefits: technical efficacy, internal support from public sector staff, alignment with legal and policy objectives, and public support. Technical efficacy is the most obvious starting point. If a system cannot perform its intended task reliably, there is little basis for expecting public value from its deployment. Yet the paper repeatedly stresses that high model accuracy in controlled settings does not guarantee operational success in the public sector, where data may be incomplete, biased, outdated, or badly matched to real-world conditions.

The second condition, internal support, is just as important. Police officers, investigators, and public sector staff do not work inside clean laboratory settings. They use tools in messy institutional environments shaped by discretion, workload, policy constraints, and public pressure. The review found evidence that many enforcement officers do see possible value in algorithmic systems, especially when they believe the tools can make them more effective. But that support appears tied to whether the systems actually help in practice and whether legal and ethical concerns are kept in check. Public sector staff are more likely to embrace algorithmic support when they still retain room for judgment and when the technology does not appear to impose unfair or unclear decisions.

The third condition is alignment with legal and policy goals. This may be the most overlooked part of the promise narrative. The paper argues that even a technically strong system can fail if it does not fit the legal and administrative aims of the institution using it. A tool designed without adequate regard for legality, accountability, or rights can generate harmful outcomes even if its predictive logic appears sound. The fourth condition, public support, extends the same point outward. Citizens are less likely to accept algorithmic public decisions if systems are opaque, imposed without engagement, or seen as unfair. The review suggests that human governance and accountability remain critical in sustaining legitimacy.

Why the risks remain far more serious than performance scores suggest

If the engineering literature leans toward optimism, the broader review shows that the risk case is both deeper and more institutionally grounded. Social science and public administration studies focus heavily on harms to equity, accountability, privacy, and public legitimacy. The review identifies three main conditions linked to risk: systems designed around a model of threat, discriminatory or inaccurate outputs, and harmful human-machine interactions.

The first of these, threat-based design, goes to the logic built into the system from the start. According to the review, many enforcement tools are designed with an assumption that the target population is primarily a source of risk to be managed. That design choice can make systems more aggressive, more punitive, and less sensitive to rights or context. The paper points to the Dutch welfare fraud system SyRI as an example of how targeted deployment in so-called problem areas contributed to damaging outcomes. In this view, harm does not begin only when a model makes a mistake. It can begin with the values and assumptions embedded in the system before it is even launched.

The second risk condition is discriminatory or inaccurate model output. This includes both biased data and poor-quality data. The review highlights that historical enforcement records may already reflect discriminatory practices, which can then be reproduced by an algorithm even if race or other protected traits are not explicitly used as inputs. It also stresses that bad data quality alone can trigger severe damage. One of the paper’s most striking examples is Michigan’s MiDAS welfare fraud detection system, which was wrong in 93 percent of cases and produced serious social harm. The broader message is that the cost of error in public sector AI is not abstract. It can mean wrongful suspicion, financial distress, legal exposure, and long-term damage to families and communities.

The third risk condition lies in the interaction between humans and machines. Even well-designed systems can cause harm if public officials use them in ways that distort decision-making. The review points to evidence that practitioners may develop a false sense of certainty when algorithmic outputs are treated as more objective or authoritative than they really are. That can reduce professional discretion and encourage overreliance on a system that is ultimately probabilistic and fallible. In one case discussed in the paper, officers became devoted followers of an all-seeing algorithm, effectively weakening the role of their own expertise. This, the authors suggest, is one of the key reasons public sector AI cannot be evaluated as a software problem alone. It is a workplace and governance problem too.

The review also notes that public reaction is often more skeptical than official enthusiasm. In one study cited in the paper, only 29 percent of German citizens viewed welfare fraud inspection by algorithm as a positive development. Other work reviewed by the authors linked the use of facial recognition systems across more than 1,000 US cities with increased racial arrest disparities. These findings reinforce the study’s larger conclusion that public acceptance cannot be assumed simply because a tool promises efficiency. In democratic systems, legitimacy matters as much as throughput.

The study’s real message: public sector AI succeeds or fails as a socio-technical system

The study aims to unify these competing strands through what the authors call a socio-technical governance framework. Instead of asking whether algorithmic systems are good or bad in general, the study argues that outcomes are jointly shaped by technical system quality, human-technology interaction, and institutional context. Benefits and harms do not come from any one of those elements in isolation. They emerge from the way they combine.

Technical system quality includes not only accuracy, but also the absence of discriminatory output and the rejection of threat-based design logic. Human-technology interaction covers staff support, how much discretion officials keep, and whether algorithmic advice changes behavior in harmful ways. Institutional context includes legality, policy alignment, accountability structures, and public support. Taken together, the framework argues that decision outcomes and institutional legitimacy are produced by the interaction of all three. A technically advanced system can still fail if it is socially mistrusted, legally misaligned, or operationally misused.

That framing helps explain why the public debate over AI in policing and fraud detection so often seems stuck between hype and fear. The engineering literature tends to look at what models can do under favorable conditions. Social science asks what institutions, power structures, and citizens experience when those tools enter real governance settings. The review does not dismiss either side. Instead, it argues that both are incomplete on their own. Performance without governance is unsafe. Governance without a realistic grasp of technical capacity is incomplete. The gap between the two has left public agencies with too little practical guidance on how to deploy these systems responsibly.

The authors are also clear that the current evidence base is still too thin to settle the biggest policy questions. The review says there is not yet enough empirical work to definitively show whether deployed predictive policing systems reduce crime in real operational settings. Much of the existing empirical literature focuses on perceptions rather than measured public outcomes. That leaves a significant research gap at exactly the point where governments need the strongest answers. The paper calls for more rigorous empirical designs, including modeling approaches that can better test how variables such as accuracy, discretion, and bias interact across different implementation settings.

Public agencies cannot afford to judge algorithmic systems by vendor claims, benchmark scores, or efficiency rhetoric alone. The review suggests that the safer question is not whether an algorithm is smart, but whether the surrounding institution is ready to use it lawfully, transparently, and with meaningful human oversight. Systems that appear effective in theory may still fail in practice if they operate on bad data, embed punitive assumptions, reduce professional judgment, or lose public legitimacy.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback