Workers overwhelmed as AI systems demand endless data
The hidden labour behind data production is deeper, broader, and more emotionally charged than most public discussions of AI suggest. Across all three sectors, workers reported feeling overwhelmed by the volume of data they needed to supply, uncertain about expectations, and anxious about the consequences of errors. They also described the emotional weight of working within systems that do not recognise or value the labour that goes into making AI function. In pharmaceuticals, skilled worker
A new study exposes how workers in the pharmaceutical sector, higher education, and the arts are being pulled into exhausting cycles of data production, cleaning, negotiation, and emotional labour as they try to fulfil the rising demands of artificial intelligence (AI) development.
The study, titled “Feeding the Machine: Practitioner Experiences of Efforts to Overcome AI’s Data Dilemma”, was published in Big Data & Society, offers one of the clearest cross-sector investigations into what the authors describe as AI’s key tension: the promise of sophisticated systems that depend on vast, high quality datasets, and the reality that these datasets are often incomplete, messy, biased, or simply not suited for training reliable AI tools. The research argues that this gap drives intense pressure on practitioners to supply more data, fix old data, or reorganise their work around the needs of algorithms.
The authors examine three very different UK sectors: drug discovery, university learning analytics, and artistic practice. The comparison shows that the pressure to meet AI’s data needs is highest in corporate environments guided by competition and speed, more mixed in public services shaped by accountability rules, and most contested in creative fields where practitioners push back against extractive models of AI development. Across all cases, the study finds that the human labour behind AI is both undervalued and essential, and that responsible AI governance must treat data work as a core part of the system rather than an invisible burden placed on workers.
How does AI’s data dilemma shape daily work across sectors?
The study outlines what the authors call AI’s data dilemma. Despite the belief that modern organisations are flooded with data, most AI systems still struggle because the data available is not fit for purpose. It may not be labelled, balanced, complete, or even relevant to the real world tasks the system is meant to perform. This places immense pressure on human workers who are expected to fill the gaps, often with limited time, unclear guidance, or conflicting demands.
The researchers identify this as a cultural and organisational problem as much as a technical one. They argue that every workplace has its own culture of data practice shaped by values, expectations, and institutional habits. These cultures influence who collects data, who cleans it, who approves access, and how conflicts are resolved. They also influence how people feel about this work, how much of their day it takes up, and what they worry may happen if the data is wrong or incomplete.
In many workplaces, AI adoption is driven by the expectation that automation will make processes faster or more cost effective. But the study shows that the very systems meant to reduce labour often create new labour demands. Workers are asked to generate new datasets, track new details, document interactions in new ways, or run manual checks to support machine learning teams. This transformation of everyday tasks is happening quietly in many sectors, without public recognition of the emotional, ethical, and skill-based toll it places on staff.
How do the pharma, higher education and artistic sectors experience this pressure?
The study’s three case studies show how the same data dilemma takes different forms depending on the structure, values, and incentives of each sector.
In the pharmaceutical case, AI and machine learning tools for drug discovery rely on clean, balanced datasets that include both successful and unsuccessful experiments. But many companies lack detailed records of failed experiments, leaving large gaps in the negative data needed to train reliable models. This led one company to launch a major internal initiative to produce more data on unsuccessful chemical compounds. The work required chemists to run repetitive experiments chosen by an algorithm, rather than using their usual expertise to design experiments. The study reveals that this shift reduced the sense of creativity and scientific judgment that chemists valued in their work.
These professionals expressed concerns that they were being turned into data suppliers rather than scientists, with the scientific process being shaped less by curiosity and more by the needs of predictive models. The shift also created tension between long standing scientific values and the competitive, efficiency driven corporate culture that views data generation as key to maintaining an edge in drug discovery. Although the company invested heavily in this approach, the project did not deliver clear success, further raising questions for workers about the trade offs being made.
The higher education case shows a different kind of pressure. Universities are increasingly turning to learning analytics systems to track student engagement, improve completion rates, and satisfy regulatory demands. But the study shows that university data is rarely complete or consistent enough to support predictive models. As a result, staff across departments were pushed to record more interactions, gather fine grained details about student behaviour, and input data into new systems. For many workers, this added workload came at a time when the sector was already dealing with staffing cuts and rising demands.
The study also reveals that predictive models in this context can raise ethical concerns, especially when the data used may disadvantage certain groups or when the institution does not have the resources to respond to risk alerts. These tensions led some organisations to scale back their use of predictive models and instead focus on descriptive analytics that highlighted broad trends rather than attempting to forecast individual student outcomes. This created a more cautious and slower approach, indicating that public sector institutions may resist rapid AI adoption when ethical and regulatory concerns outweigh promises of efficiency.
The arts sector presents a strong contrast. Here, practitioners are not struggling with a lack of data for their own models, since many artists work with small, carefully curated datasets built around their creative goals. Instead, they face a different challenge arising from the data appetites of large generative AI companies. Because big tech firms require massive datasets to train their models, they often take images, text, and other creative materials from the internet without permission. This practice places artists in direct conflict with AI developers, especially when the models trained on their work are used to generate images that compete with their income.
Artists interviewed for the study expressed a wide range of emotions, including frustration, fear, curiosity, and a desire to push back against extractive practices. Many said they felt forced to keep up with fast moving AI tools in order to stay competitive, even as these systems threatened their livelihoods. At the same time, artists had more freedom than other sectors to build alternative practices, such as using open source models, creating their own datasets, or working with community led approaches that aligned with values of care and fairness. This sector demonstrated both strong critique of AI’s data hunger and a willingness to experiment with more ethical forms of machine learning.
What does the study reveal about the human cost behind AI development?
The hidden labour behind data production is deeper, broader, and more emotionally charged than most public discussions of AI suggest. Across all three sectors, workers reported feeling overwhelmed by the volume of data they needed to supply, uncertain about expectations, and anxious about the consequences of errors. They also described the emotional weight of working within systems that do not recognise or value the labour that goes into making AI function.
In pharmaceuticals, skilled workers felt their expertise was being undervalued and reshaped by corporate priorities. In higher education, practitioners were concerned that the push for data driven monitoring would change the nature of academic and pastoral work. In the arts, creators felt that their work was being taken and repurposed for corporate gain without fair compensation or consent.
The study shows that the intensity of this pressure varies according to how closely a sector aligns with profit driven or solutionist ideas. Corporate and commercial environments, such as pharmaceuticals, face the strongest push to meet AI’s data demands at high speed. Public sector environments, such as universities, experience mixed pressures shaped by both regulation and ethical concerns. Creative sectors have more scope to push back but still face strong external pressures due to the influence of powerful AI companies.
- READ MORE ON:
- AI data dilemma
- AI labour burden
- data work in AI
- hidden labour of AI
- AI system development challenges
- machine learning data shortage
- UK AI workforce pressures
- data cleaning in AI projects
- AI adoption barriers
- pharmaceutical AI data
- university learning analytics data
- generative AI ethics
- data extractivism
- creative sector AI impact
- responsible AI practices
- data governance in AI
- algorithmic systems labour
- FIRST PUBLISHED IN:
- Devdiscourse

