Data Quality: The hidden factor driving success or failure in the AI age

In healthcare, the costs are not merely financial. Incomplete, inconsistent or delayed electronic health records can lead to diagnostic errors, misguided treatment decisions and compromised public health monitoring. Multicenter clinical research efforts have shown that without semi-automated data quality assessments and governance frameworks, even high-value datasets can contain widespread inaccuracies. The authors note that international toolkits, including the WHO’s Data Quality Assurance framework, show that quality improves significantly when organizations apply structured governance, continuous validation and staff training.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 05-12-2025 18:11 IST | Created: 05-12-2025 18:11 IST
Data Quality: The hidden factor driving success or failure in the AI age
Representative Image. Credit: ChatGPT

Poor data quality is quietly undermining scientific progress, business strategy, government oversight, and the reliability of artificial intelligence systems, warns a narrative review published in Data.

Titled “Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles,” the study points up that organizations across sectors continue to lose money, damage public trust, and make flawed decisions because they lack consistent governance and lifecycle oversight for the data that fuels modern digital operations.

The study links classic data quality frameworks with real-world failures, sector-wide weaknesses and the growing complexity introduced by AI systems that depend on large-scale, high-integrity datasets. It traces how inaccurate, incomplete, inconsistent or poorly governed data have led to financial disasters, reputational harm, stalled innovation and, in some cases, catastrophic technical failures. The authors state that as AI becomes an integral layer in business, healthcare, government and research, data quality has arguably become the most decisive, yet still most neglected, element in ensuring reliability and safety.

AI raises the stakes as traditional data quality standards fall short

The research examines the evolution of data quality principles from early definitions that emphasized accuracy and fitness for use to newer frameworks that integrate governance, ethics and the FAIR principles. The review highlights that while standards such as ISO 8000 and ISO/IEC 25012 remain foundational, these models were created for an earlier generation of data systems. They do not fully match the demands of AI, which consumes data at unprecedented volume, speed and complexity.

The authors note that accuracy, completeness, timeliness, consistency, relevance and validity continue to be universal dimensions, but AI introduces additional requirements, including algorithmic fairness, provenance, model explainability and continuous monitoring for drift. The review notes that modern AI systems are both dependent on data quality and potential tools for improving it. Even so, most market solutions still focus on preparing data for AI rather than using AI to enhance data quality itself. This imbalance exposes organizations to mounting risks.

The study details how poor-quality data in AI training pipelines can quickly propagate into flawed algorithms that harm individuals or undermine organizational performance. The narrative cites real-world failures, including the loss of NASA’s Mars Climate Orbiter due to a unit mismatch and a major financial setback at Unity Technologies caused by faulty datasets feeding predictive models. These events illustrate how traditional validation methods are insufficient in the era of autonomous and semi-autonomous systems.

Further, the review shows that FAIR principles, originally developed for scientific research, now play an expanding role in digital governance. Ensuring data is findable, accessible, interoperable and reusable improves transparency, reproducibility and machine readability. The authors position FAIR as a flexible framework that helps organizations prepare for AI-intense environments where datasets must support machine processing with minimal human intervention.

Costly consequences across sectors reveal the global scope of data quality failures

The review compiles evidence from healthcare, business, finance, supply chains and public administration to demonstrate how widespread and damaging poor data quality has become. The authors point to estimates that organizations lose millions, and in some cases billions, annually due to low-quality or poorly governed data.

In customer relationship management systems used by retail, service and home appliance companies, inconsistent or incomplete customer records lead to failed campaigns, wasted resources and reduced loyalty. The review points to reports showing that a significant percentage of customer data lacks basic timeliness and completeness, making segmentation and predictive analytics unreliable. These problems intensify when organizations attempt CRM migrations, particularly from legacy systems, where inconsistent formats and missing values disrupt operations and erase trust in new platforms.

In healthcare, the costs are not merely financial. Incomplete, inconsistent or delayed electronic health records can lead to diagnostic errors, misguided treatment decisions and compromised public health monitoring. Multicenter clinical research efforts have shown that without semi-automated data quality assessments and governance frameworks, even high-value datasets can contain widespread inaccuracies. The authors note that international toolkits, including the WHO’s Data Quality Assurance framework, show that quality improves significantly when organizations apply structured governance, continuous validation and staff training.

In the finance sector, the study highlights systemic risk driven by siloed data architectures, inconsistent formats and weak governance. Legacy systems in banking and credit reporting continue to create duplication and inaccuracies. These issues undermine analytics, distort risk assessments and increase regulatory exposure. Financial regulators across Europe and the United States have documented the difficulty of aggregating risk data when foundational datasets lack consistency or completeness. The review notes that modern architectures such as Data Mesh offer potential solutions, but their success depends on organizational alignment and strong quality leadership.

Public administration also shows persistent vulnerabilities. The review references analyses of criminal justice data in the United States, where incomplete or inaccurate records complicate background checks and produce harmful social consequences. The authors highlight research showing that technology adoption alone does not improve data quality. Training, governance structures and redesigned workflows are necessary to ensure the accuracy and reliability of public records, tax systems and government services.

These failures, across domains, reflect common patterns. Organizations often lack clear accountability, formal governance, automated monitoring, or staff capacity to maintain high-quality information. The review argues that these gaps are predictable and preventable, and that the long-term cost of neglect far outweighs the investment needed for sustainable quality systems.

Governance, ethics and AI necessitate a new data quality paradigm

The researchers observe that data quality is no longer a purely technical concern but a socio-organizational responsibility shaped by ethics, policy, culture and emerging AI regulation. The authors detail how AI governance frameworks from international bodies now include data quality as a core requirement. They highlight new standards such as ISO/IEC 42001, the EU AI Act and the NIST AI Risk Management Framework, all of which stress accuracy, fairness, transparency and lifecycle monitoring.

The review outlines how bias in training data leads directly to discriminatory outputs, reinforcing inequalities in healthcare, hiring, credit scoring and criminal justice. The authors argue that fairness must be treated as a fundamental dimension of data quality in AI settings. They also note that provenance, metadata completeness and data lineage are essential for accountability, particularly in automated decision-making environments where responsibility becomes more diffuse.

The narrative illustrates how large language models introduce additional challenges. These systems are trained on massive, heterogeneous datasets with complex provenance, increasing the risk of misinformation, hallucinations and embedded bias. The authors describe how governance failures during dataset collection or filtering can lead to widespread social harm once models are deployed. As LLMs increasingly influence public information ecosystems, failures in quality management grow more consequential.

The study also shows that AI can help strengthen monitoring mechanisms. Machine learning tools can detect contextual anomalies, identify inconsistencies across sources, automate profiling tasks and improve metadata generation. Multi-agent systems demonstrate the potential for automated rule creation, validation and correction at scale. Yet adoption remains limited due to user concerns about trust, transparency and skill gaps.

The review identifies major research gaps. Public administration remains understudied, even though government datasets often underpin national statistics, safety programs and economic policies. Small and medium enterprises also face unique constraints that hinder the adoption of quality frameworks. The authors argue that future work must develop lightweight, accessible tools for these environments.

For policymakers, the study reinforces the importance of integrating FAIR principles into governance frameworks. For practitioners, it highlights the value of embedding validation directly into business processes rather than relying on after-the-fact cleaning. For researchers, the review underscores that reproducibility depends on reliable datasets and clear metadata that support transparency and machine interoperability.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback