How Digital Twins can predict and perform human identity

If algorithms can track, classify, and predict behaviour at scale, can they also narrate a life before it is lived?


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 26-02-2026 18:56 IST | Created: 26-02-2026 18:56 IST
How Digital Twins can predict and perform human identity
Representative Image. Credit: ChatGPT

With digital systems tracking location, communication, and online habits at scale, the boundary between documentation and prediction is rapidly dissolving. Researchers are beginning to ask whether the next frontier of artificial intelligence is not automation, but authorship of the human story itself.

In a study titled Machine Biography as Digital Human Twin: Artistic Explorations of Predictive Identity in the Age of Behavioural Data, published in AI & Society, researchers present an AI-generated speculative biography built from a year of self-surveillance, using it to interrogate how digital twins forecast identity and shape imagined futures.

From surveillance archive to predictive self

The project is based on an earlier artistic work called Data Biography, developed in 2017. In that experiment, the authors installed commercial spyware on their own smartphones for a full year. The surveillance software, commonly marketed for parental or employee monitoring, logged nearly every digital trace generated through daily life, including GPS locations, messages, calls, browsing activity, and social media interactions.

Rather than conceal this surveillance, the artists made it visible. They converted the collected data into 365 printed volumes, each representing a single day of digital activity. The installation functioned as a physical archive of behavioural traces, transforming private metadata into a sculptural record of identity shaped by data flows.

Five years later, the team reversed the temporal direction of the archive. Instead of documenting the past, they asked whether the same data could be used to predict a future. The result was Machine Biography, a second installation consisting of 365 new books, each representing a projected day in the year 2050.

The choice of 2050 was deliberate. It is the benchmark year for global climate neutrality targets, energy transition roadmaps, biodiversity frameworks, and demographic projections. By situating the speculative biography at this mid-century horizon, the project ties personal prediction to planetary uncertainty. The imagined life unfolds within a world shaped by climate change, urbanization, population shifts, and political adaptation.

The research traces the intellectual roots of predictive identity to a long history of determinism. From Newtonian physics and Laplace’s belief in total foresight, to cybernetics and behaviourism, Western science has repeatedly sought to eliminate uncertainty through calculation. Today’s machine learning systems revive that ambition in digital form. Massive datasets, behavioural tracking, and probabilistic models promise to anticipate actions, preferences, and risks in real time.

In this context, the Digital Human Twin emerges as a new paradigm. Originally developed in aerospace and manufacturing to simulate physical systems, digital twins are now applied to the human body and behaviour. In healthcare, they are proposed as tools for personalized medicine. In commercial settings, they power recommendation engines and targeted advertising. In governance, they influence decision-making systems that classify and manage populations.

The study argues that these digital doubles are no longer passive representations. They function as operational entities that shape how individuals are categorized, evaluated, and guided. Identity becomes less a narrative constructed by the subject and more an inference generated by algorithms.

Building a narrative twin with open-source AI

To create the speculative biography, the researchers relied on a combination of open-source machine learning tools rather than proprietary corporate systems. This choice was both practical and political. Open-weight models allow inspection, modification, and reproducibility, in contrast to commercial AI platforms that restrict access through application programming interfaces and closed infrastructures.

The predictive pipeline began with the original 2017 dataset, organized into categories such as search history, tweets, emails, GPS locations, WhatsApp messages, browser logs, and images. A multivariate, multi-step time series forecasting model based on convolutional long short-term memory networks was trained to predict when and what type of event might occur in the projected year.

These predicted timestamps and event categories formed the skeleton of the future diary.

For text generation, the team fine-tuned GPT-2 in Spanish using personal conversations and a large corpus of thematically filtered tweets related to issues projected to define mid-century life, including climate change, population growth, biodiversity loss, and energy dependence. The language model was modified to engage in a self-conversational loop, reintroducing its previous output as a prompt to create continuity across entries.

The result was a set of first-person diary fragments that feel intimate yet algorithmically assembled. The texts describe conversations, reflections, and events that never occurred but are statistically plausible given the behavioural data.

Image generation complemented the textual narrative. DALL·E Mini produced speculative scenes based on generated prompts. StyleGAN2 simulated age progression, creating portraits of how the subject might appear decades later. CycleGAN and contrastive unpaired translation models transformed past images into retro-futuristic environments. Fast-SRGAN enhanced image resolution and clarity.

The final installation compiled these texts and images into 365 black-covered volumes, mirroring the earlier white-covered archive of the past. Together, the two projects form a diptych: one documenting data traces, the other forecasting data-driven futures.

The researchers emphasize that the goal was not to test predictive accuracy. Instead, the project probes the narrative and cultural implications of machine forecasting. The digital twin becomes a storytelling engine, assembling probabilistic fragments into a coherent yet unstable autobiographical voice.

This shift from representation to performance is central to the argument. The predictive self is not a faithful copy of a person. It is a performative construct generated through classification systems, model architectures, and institutional assumptions embedded in data practices.

Ethical fault lines in algorithmic identity

The study also sheds light on ethical and epistemological concerns. As predictive systems are integrated into healthcare, education, policing, and financial services, digital twins may influence access to resources and opportunities. Forecasts can become prescriptions. Risk scores can shape interventions. Algorithmic categorizations can harden into governance mechanisms.

The research highlights that data-driven prediction is never neutral. Choices about which data to collect, how to label it, and which outcomes to optimize embed values and power relations into computational systems. Biases present in training data can be amplified through automated inference. Transparency is often limited, particularly when proprietary models dominate the AI landscape.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback