On-demand information systems pave the way for smarter research data access
Research institutions and universities worldwide have been actively developing strategies for efficient RDM, yet a significant challenge remains: how to keep research data relevant and reusable over time. Many digital resources face obsolescence due to outdated software, lack of proper archiving methods, or incompatibility with evolving technological standards.
In an era where digital research data is exponentially increasing, the challenge is not just about storing it but ensuring its long-term accessibility, usability, and adaptability. Traditional Research Data Repositories (RDRs) adhere to FAIR principles (Findable, Accessible, Interoperable, Reusable), yet they often fail to maintain the reusability of data over time. Without dynamic and user-friendly access systems, archived data can become obsolete, leading to inefficient research workflows.
A recent study titled "Building Sustainable Information Systems and Transformer Models On Demand", authored by Thomas Asselborn, Sylvia Melzer, Simon Schiff, Magnus Bender, Florian Andreas Marwitz, Said Aljoumani, Stefan Thiemann, Konrad Hirschler, and Ralf Möller, published in Humanities and Social Sciences Communications (2025), presents a groundbreaking approach to sustainable Research Data Management (RDM). The study proposes an on-demand system that integrates information systems, transformer models, and AI-driven chatbots, ensuring continuous and adaptive use of archived research data.
Revolutionizing research data management with on-demand systems
Research institutions and universities worldwide have been actively developing strategies for efficient RDM, yet a significant challenge remains: how to keep research data relevant and reusable over time. Many digital resources face obsolescence due to outdated software, lack of proper archiving methods, or incompatibility with evolving technological standards. While physical books can be preserved for centuries, digital research materials often risk losing their usability in just a few years.
The study introduces an on-demand information system (ISoD) that enables researchers to retrieve, visualize, and analyze archived data dynamically. This system goes beyond conventional data storage by offering hot and warm archiving methods, ensuring that research data remains accessible and adaptable. By integrating metadata standards like METS (Metadata Encoding & Transmission Standard) and automating the transformation of archived datasets into structured databases, ISoD significantly reduces redundancy, improves collaboration, and enhances interdisciplinary research efforts.
The ISoD process is designed to be scalable, adaptable, and responsive to changing research needs. By allowing users to request data visualization and analysis on demand, the system eliminates the bottleneck of static archiving, ensuring rapid decision-making and efficient data utilization. This model not only minimizes duplication of efforts but also encourages collaborative research across various disciplines, enabling researchers to share, refine, and expand upon existing datasets with ease.
Fine-tuning transformer models directly from research data repositories
A key advancement presented in the study is the ability to fine-tune transformer models directly from an RDR, without requiring extensive programming expertise. Large Language Models (LLMs) like BERT (Bidirectional Encoder Representations from Transformers) have revolutionized natural language processing, yet fine-tuning them remains a complex task, often requiring expertise in machine learning frameworks like TensorFlow or PyTorch.
The study introduces a process called Fine-Tuning on Demand (FToD), enabling researchers to optimize transformer models directly within an RDR using pre-existing datasets and metadata annotations. This feature allows scholars, particularly those in the humanities, to train AI models on historical texts, ancient manuscripts, or domain-specific corpora, making fine-tuned models more precise and context-aware.
One of the most innovative aspects of FToD is its ability to automate model selection and hyperparameter tuning. Through a simple interface, users can choose datasets, specify model parameters, and trigger automated fine-tuning, reducing model training time from months to mere minutes. The study also emphasizes the importance of storing fine-tuned models within the RDR, ensuring that refined AI models remain accessible for future research and applications.
ChatHA: A humanities-aligned chatbot for context-specific AI responses
The study also introduces ChatHA (Humanities-Aligned Chatbot), an AI-powered chatbot specifically designed to interact with research data stored in RDRs. While generic AI chatbots like ChatGPT provide general responses, ChatHA is tailored to generate research-specific answers, complete with references and citations from the RDR.
A major issue with traditional AI chatbots is their tendency to generate incorrect or unverifiable information (AI hallucinations). ChatHA addresses this by grounding its responses in pre-existing research data, metadata annotations, and scholarly references. This ensures that every response is factually accurate, relevant to the research context, and backed by citations, making it a reliable tool for scholars in the humanities and social sciences.
Furthermore, ChatHA integrates subjective content descriptions (SCDs), allowing researchers to retrieve personalized, context-aware responses. This feature is particularly useful for projects requiring annotated texts, manuscript analyses, and historical research, where subjective interpretations play a crucial role. The chatbot's interface allows users to interact with PDF archives, structured datasets, and research notes, making scholarly research more interactive and efficient.
By embedding ChatHA into an RDR, institutions can transform static repositories into dynamic research assistants, providing scholars with instant access to insights, interpretations, and references, without requiring manual searches or extensive query refinements.
Future of sustainable information systems and AI-driven research
The integration of on-demand information systems, fine-tuned transformer models, and AI chatbots represents a paradigm shift in sustainable RDM. This study provides a roadmap for enhancing the longevity, accessibility, and usability of research data, ensuring that datasets remain valuable and relevant long after their initial collection.
Looking ahead, several key advancements could further improve this framework:
- Automated Data Annotation: Using AI to automatically label and categorize research data for faster retrieval and better contextual understanding.
- Federated AI Training: Allowing multiple institutions to collaboratively train AI models without sharing sensitive data, ensuring data privacy and security.
- Multimodal Integration: Expanding ISoD and ChatHA to support audio, video, and image-based research datasets, making it applicable to a wider range of disciplines.
- Blockchain-Based Provenance Tracking: Ensuring research integrity and transparency by securely recording every change made to datasets within the RDR.
The implications of this research extend far beyond academia. By democratizing access to AI tools, improving data longevity, and streamlining interdisciplinary collaborations, this study paves the way for a future where research data is not just archived but actively utilized, refined, and expanded upon.
In a world increasingly reliant on digital knowledge systems, the ability to build sustainable, AI-driven information systems on demand will be a defining factor in shaping the next generation of scholarly research and innovation.
- FIRST PUBLISHED IN:
- Devdiscourse

