Why AI still struggles to build real-world logistics models without human help


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 31-03-2026 18:26 IST | Created: 31-03-2026 18:26 IST
Why AI still struggles to build real-world logistics models without human help
Representative image. Credit: ChatGPT

New research suggests that while generative AI tools show strong potential in the logistics sector, they remain far from replacing human expertise in real-world logistics modeling.

A new study examines this gap, testing how effectively large language models can support logistics simulation. Published in Applied Sciences under the title “Exploring Artificial Intelligence as a Tool for Logistics Process Simulation,” the research evaluates the performance of generative AI systems in building and analyzing complex manufacturing simulations.

While AI can significantly speed up simulation development, it struggles with autonomous model creation, frequently produces errors, and requires continuous human oversight to ensure accuracy. The study ultimately positions AI not as a replacement for engineers but as a powerful assistant that reshapes how simulation work is carried out.

AI shows promise but falls short of autonomous simulation modeling

The research focuses on a real industrial case: a complex manufacturing process that produces construction materials through interconnected operations such as milling, mixing, drying, firing, and packaging. The system spans multiple stages, incorporates material losses, and operates under variable shift schedules, making it a demanding benchmark for simulation tools.

To evaluate AI capabilities, the researchers tested two prominent large language models, Perplexity and ChatGPT, across three scenarios: fully autonomous model creation, output estimation from given inputs, and a human–AI “copilot” approach where AI assists users step by step.

The first experiment reveals a fundamental limitation. Despite both models claiming they can create simulation models, neither could directly build one within ExtendSim, a widely used desktop simulation platform. The barrier lies in the absence of application programming interfaces that would allow AI to interact with the software environment.

Instead, the models could only generate conceptual descriptions or code snippets, often in programming languages like Python. This highlights a structural mismatch between modern AI systems and traditional simulation tools, which rely on graphical interfaces and spatial logic rather than purely sequential code.

The study identifies this limitation as a critical bottleneck for AI adoption in logistics modeling. Without integration into simulation platforms, generative AI cannot independently construct models, restricting its role to indirect assistance rather than full automation.

Accuracy improves with human guidance but errors persist

The second phase of the study tests whether AI can calculate simulation outputs based on detailed process descriptions and numerical inputs. Here, the models demonstrate partial success but only after extensive prompt refinement.

Initial outputs show significant inaccuracies. One model produced results that deviated by more than 300 percent from the benchmark, while another underestimated outputs by over 80 percent. These errors stem from misinterpretation of system dynamics, incorrect assumptions about material flows, and difficulty handling time-based processes.

Through iterative prompting, where researchers clarified inputs and corrected misunderstandings, the models gradually improved. The most accurate result achieved near-perfect alignment with the benchmark, differing by just 0.1 percent. However, this level of accuracy required multiple rounds of human intervention.

The findings underscore a key limitation: AI systems do not reliably interpret complex logistics processes on their own. They often request information that has already been provided, misapply parameters, or simplify systems in ways that distort outcomes.

The study also highlights deeper technical challenges. Large language models are trained to process sequential text, while logistics simulations depend on interconnected systems with parallel processes and feedback loops. This mismatch leads to logical breakdowns and incomplete representations of real-world operations.

Subsequently, accurate simulation outputs depend heavily on human guidance. Without iterative correction, AI-generated results remain unreliable, limiting their standalone utility in industrial settings.

Copilot approach emerges as most effective use of AI

The third experiment identifies a more practical role for generative AI in logistics simulation: acting as a copilot that guides users through model creation. In this approach, AI provides detailed, step-by-step instructions while human users build the model within the simulation software. This method proves significantly more effective than autonomous generation. Although the initial AI-generated instructions contain numerous errors, iterative refinement allows users to correct issues and gradually construct a functional model.

The study documents a wide range of errors in early AI outputs, including incorrect block configurations, missing connections, logical inconsistencies, and hallucinated components that do not exist in the software. A substantial share of these errors, roughly one-third in some cases, are attributed to hallucinations, where the AI generates plausible but invalid solutions.

Despite these challenges, the copilot approach ultimately produces a validated simulation model with outputs closely matching the benchmark. The final model achieves production levels nearly identical to the original system, demonstrating that AI can support high-quality results when combined with human expertise.

One of the most significant advantages of this approach is efficiency. Traditional simulation model development in the case study requires several days of work, including system analysis, parameter identification, and iterative testing. With AI assistance, the entire process is reduced to less than a single working day.

This reduction in development time highlights the transformative potential of AI in logistics workflows. By automating repetitive tasks and providing targeted guidance, AI enables engineers to focus on higher-level decision-making and validation.

However, the study emphasizes that this efficiency gain comes with caveats. Even experienced users must carefully verify AI-generated instructions, as unresolved errors can lead to inaccurate models. For less experienced users, the risk of hidden mistakes is even greater, potentially compromising the reliability of simulation results.

Industry implications and future direction

The study suggests that current AI systems are best suited for augmenting human capabilities rather than replacing them. Their strength lies in accelerating workflows, generating ideas, and supporting iterative development, while their weaknesses include limited contextual understanding, susceptibility to hallucinations, and dependence on structured input.

To address these limitations, the researchers point to several future directions. One key area is the development of retrieval-augmented systems that incorporate domain-specific knowledge, reducing hallucinations and improving accuracy. Another is the integration of AI with simulation software through APIs, enabling direct interaction with modeling environments.

The study also calls for broader evaluation across different platforms and use cases, as the effectiveness of AI may vary depending on system complexity and industry context. Additionally, improvements in prompt engineering and long-term monitoring of AI performance are identified as essential for reliable deployment.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback