Smart factories need AI maintenance tools that explain, not just predict
Artificial intelligence is rapidly changing how industries detect equipment failures, schedule maintenance and prevent costly downtime, but much of the research still falls short of real-world factory deployment, according to a systematic review. The authors found that AI-enabled predictive maintenance has become a fast-growing area of Industry 4.0 research, driven by rising sensor adoption, Internet of Things networks and the need to turn industrial data into reliable maintenance decisions.
The study, titled The Bridge Between Artificial Intelligence and Predictive Maintenance in Industry 4.0: A Systematic Review, was published in Applied Sciences, uses a PRISMA-based methodology to examine research growth, industrial domains, AI algorithms, predictive outputs and performance metrics in AI-supported predictive maintenance.
AI predictive maintenance moves from niche research to Industry 4.0 priority
Predictive maintenance is one of the clearest examples of how Industry 4.0 technologies are reshaping industrial operations. Instead of repairing machines after breakdowns or replacing parts on fixed schedules, predictive maintenance uses repeated monitoring of asset conditions to detect degradation early and determine the right time for intervention. In factories, energy systems, transport networks, maritime assets and other equipment-heavy environments, that shift can reduce unplanned downtime, cut maintenance costs and improve safety.
The review shows that research interest in AI-enabled predictive maintenance has grown sharply in recent years. The selected studies rose from a small base in 2017 to 11 articles in 2020, 24 in 2021, 29 in 2022 and 38 in 2023. The trend reflects the expansion of industrial sensors, wider availability of machine data and pressure on companies to convert raw operational data into actionable maintenance intelligence.
The authors link this growth to the broader Industry 4.0 ecosystem. Predictive maintenance now depends not only on algorithms but also on IoT sensor networks, cyber-physical systems, edge and cloud computing, digital twins, secure data infrastructures and interoperable platforms. These technologies allow machines to generate and transmit continuous streams of data, while AI models identify patterns that could indicate faults, degradation or approaching failure.
AI is now crucial to predictive maintenance research because industrial data volumes have grown beyond what manual analysis can reliably handle. Machine learning models can detect subtle relationships in vibration signals, temperature changes, pressure readings, electrical behavior and operational logs. These models can support tasks such as failure prediction, fault detection, anomaly recognition, fault classification and remaining useful life estimation.
However, the review also identifies a major gap between research activity and industrial maturity. Many studies rely on standardized benchmark datasets, simulated data or short-term controlled experiments rather than long-term, real-world industrial data. These datasets are valuable because they allow researchers to compare algorithms under common conditions, but they do not fully reflect the variability of actual industrial operations.
Real machinery operates under changing loads, weather, operator behavior, maintenance histories, aging patterns and environmental stress. A model that performs well on a controlled dataset may struggle when moved into a plant, ship, rail system, quarry, medical device fleet or energy infrastructure network. The authors warn that publication growth does not necessarily mean that the field has solved problems of robustness, interpretability or operational integration.
The review finds that manufacturing dominates the literature. Around 48% of the reviewed studies are classified within a broad, sector-unspecified Industry 4.0 context, while 36% focus on manufacturing. Other sectors, including robotics, stainless steel, oil and gas and industrial machinery, appear in much smaller shares, while food, maritime, medical and other domains remain underrepresented.
This imbalance is not accidental. Manufacturing environments often have better instrumented machines, more accessible test rigs and structured datasets. This makes them easier for algorithm development and benchmarking. But it also means that AI predictive maintenance research is shaped heavily by data availability rather than by the full range of industrial risk. Sectors with long asset lifecycles, high safety demands and complex operating conditions remain less visible.
The maritime sector is one example cited in the study. Ships and marine systems face variable sea states, corrosion, fuel-quality differences, heavy-duty machinery and strict reliability demands. These conditions make predictive maintenance highly relevant, but the review finds limited representation of maritime and naval systems in current AI-PdM research. Similar concerns apply to medical, food and other safety-sensitive industries where failures can have severe operational or human consequences.
Machine learning dominates, but explainability lags behind accuracy
The review identifies more than 57 AI algorithms used in predictive maintenance research, but a smaller set of mature machine learning methods dominates the field. Random Forest, Support Vector Machine, Decision Tree and K-Nearest Neighbors are among the most frequently reported approaches. Their popularity reflects a preference for models that are established, relatively easy to implement and effective on structured industrial datasets.
Random Forest appears as the most common method, used in around 30% of the reviewed studies. The algorithm is favored because it can handle complex, high-dimensional industrial data and reduce overfitting by combining multiple decision trees. It also offers feature-importance rankings, which can help engineers understand which variables influence predictions. That makes it attractive in predictive maintenance, where users need not only accurate alerts but also some explanation of why a machine is considered at risk.
Support Vector Machine follows closely, appearing in about 28% of the studies. It is widely used because it performs well in high-dimensional spaces and can classify faults even when sample sizes are moderate. It is especially common in fault diagnosis and vibration-based applications. Its limitations include sensitivity to parameter tuning and weaker scalability when datasets become very large, which can be a problem in IoT-heavy industrial environments.
Decision Tree models appear in about 24% of the studies and remain important because of their interpretability. A decision tree can produce visible rule-based logic, making it easier for engineers and maintenance teams to trace how a prediction was made. This is useful in regulated or safety-critical environments where black-box recommendations may not be accepted. The drawback is that decision trees can overfit noisy or limited datasets and may perform poorly in highly complex multi-fault scenarios.
K-Nearest Neighbors appears in about 21% of the reviewed studies. Its value lies in simplicity and strong baseline performance when fault patterns are well separated in feature space. But it can be computationally expensive during inference and sensitive to distance metrics, scaling and the choice of parameters.
The review also notes the presence of deep learning methods, including long short-term memory networks, convolutional neural networks, artificial neural networks and multilayer perceptrons. These methods are useful for complex temporal data, high-dimensional sensor streams and pattern recognition. They can capture non-linear relationships that traditional models may miss. But they often require larger datasets, greater computing resources and stronger explainability mechanisms before they can be trusted in industrial decision-making.
This creates a major tension in AI predictive maintenance. Many models can achieve high predictive accuracy, but fewer provide transparent, auditable explanations. In real industrial settings, a prediction is only useful if operators, engineers and managers can understand it, trust it and act on it. A fault warning that cannot be explained may be ignored. A false alarm that disrupts production may reduce confidence in the system. A missed failure can cause downtime, damage or safety risk.
The review finds that explainable AI remains insufficiently used across the literature. Even when interpretable models are selected, many studies do not deeply explain model behavior or connect predictions to asset physics. Few studies systematically integrate explainability tools that show why certain conditions indicate failure progression. This limits trust, regulatory acceptance and operational use.
The authors urge future research to integrate traditional machine learning with explainable AI frameworks. Such approaches could help maintenance teams understand which variables, operating conditions or sensor patterns drive predictions. In predictive maintenance, explanation is not a secondary feature. It is part of deployment readiness, especially when models influence maintenance schedules, spare-parts planning, safety inspections and production decisions.
The study further highlights a gradual shift toward more advanced architectures, including hybrid models that combine data-driven learning with physics-based or knowledge-based reasoning. Digital twins, knowledge graphs, graph-based learning and AIoT systems are emerging as important tools for connecting predictive models with real-time data flows and asset-level decision support. These approaches may help move the field beyond isolated algorithm testing toward system-level industrial integration.
Research still measures model accuracy more than industrial value
The review finds that most AI predictive maintenance studies focus on three major output types: failure prediction, fault detection and remaining useful life estimation. Failure prediction accounts for about 32% of reviewed studies, fault detection for 22% and remaining useful life estimation for 16%. These outputs reflect a strong focus on forecasting failure events, detecting abnormal behavior and estimating how much useful operating time remains before an asset fails.
Failure prediction is one of the most operationally important outputs because it allows companies to intervene before breakdowns occur. It is used across industrial systems to anticipate potential failure conditions from operational data patterns. In real-world settings, this can help maintenance teams plan repairs, avoid emergency shutdowns and reduce production losses.
Fault detection serves as an early warning system. It identifies deviations from normal operating behavior, often in real time. This is particularly important when labeled failure data are scarce, which is common in industry because serious failures may be rare, expensive or unsafe to reproduce. Fault detection can support condition-based maintenance by flagging abnormal vibration, temperature, pressure or process behavior before a major breakdown.
Remaining useful life estimation goes a step further by estimating how long an asset can continue operating before failure. This output is valuable because it gives maintenance teams a time horizon for action. If accurate and trusted, it can support repair scheduling, spare-parts procurement, workforce planning and production continuity.
Many studies stop at prediction and do not fully address how predictions are turned into decisions. A model may estimate a failure risk or remaining useful life, but maintenance teams still need to decide whether to stop a machine, inspect it, order a part, adjust production or continue monitoring. That decision depends on downtime costs, safety risks, spare-parts availability, production schedules and uncertainty around the model's output.
This is where the literature remains weak. The most common performance metrics are accuracy, recall, precision, F1-score and root mean square error. Accuracy appears in 40% of the studies, recall in 32%, precision in 30%, F1-score in 25% and RMSE in 23%. These metrics are useful for comparing algorithms, but they do not fully capture industrial value.
Accuracy can be misleading when failures are rare. A model may appear accurate because it correctly predicts normal operation most of the time, while still missing rare but critical failures. Precision helps measure false alarms, while recall helps measure missed failures. F1-score balances the two. RMSE is commonly used for remaining useful life estimation because it measures the size of prediction errors in continuous outputs.
These metrics are standard in machine learning, but the review finds that they are often disconnected from business and operational goals. Less than 5% of the reviewed studies address objectives directly tied to operational outcomes such as efficiency improvement or reducing mean time to repair. Metrics such as downtime avoided, maintenance cost savings, system availability, operational risk, energy impact and mean time between failures are rarely quantified.
This creates a translational gap. A model can perform well in a research setting while offering limited practical value if it does not help reduce costs, improve availability, prevent downtime or support safer interventions. For industrial stakeholders, predictive maintenance is not about accuracy alone. It is about making better maintenance decisions at the right time.
The review also places this issue in the context of Industry 5.0, which focuses on human-centricity, resilience and sustainability. Most AI-based predictive maintenance research remains aligned with Industry 4.0 priorities such as automation, efficiency and predictive accuracy. Fewer studies address human-in-the-loop decision-making, explainability, worker trust, resilience or environmental outcomes.
In Industry 5.0, maintenance systems should not simply automate decisions - they must support human expertise, improve resilience and align with broader operational and social goals. This means AI models need to be transparent, reliable and designed for real users, not only optimized for benchmark scores.
The research points to several urgent priorities, including:
- AI predictive maintenance models need long-term validation in real industrial settings. Multi-year datasets across assets, seasons and maintenance cycles would help researchers understand drift, recalibration needs and explanation stability.
- The field needs broader sectoral coverage, especially in underrepresented domains such as maritime, medical, food, energy and other asset-intensive sectors.
- Explainable AI should become a core part of predictive maintenance model design rather than an optional add-on.
- Dual-track evaluation: Future models should be assessed with traditional machine learning metrics and operational indicators. This would allow researchers and companies to see not only whether a model predicts correctly, but whether it improves maintenance planning, reduces downtime, lowers costs or increases safety.
- FIRST PUBLISHED IN:
- Devdiscourse
Google News