AI revolutionizes solar forecasting with transparent federated learning
The researchers enhance classical federated learning by introducing a clustering mechanism that groups clients based on similar data patterns, geographic or environmental, and then applies explainable AI techniques, specifically SHAP (SHapley Additive exPlanations), to interpret the contributions of individual features within each cluster.
A new artificial intelligence framework that promises greater transparency and improved prediction accuracy in solar energy forecasting has been unveiled by researchers in South Korea. The peer-reviewed study, “Explainable Clustered Federated Learning for Solar Energy Forecasting”, published in Energies (2025), introduces a novel approach called Explainable Clustered Federated Learning (XCFL), addressing core challenges in both data privacy and AI interpretability in the energy sector.
The research addresses two key questions: how can machine learning models forecast solar power more accurately across decentralized, data-sensitive environments, and how can these models be made interpretable enough for operators to trust and act on their predictions?
The XCFL framework was developed to overcome limitations in conventional federated learning systems, particularly their inability to handle heterogeneous datasets and their black-box nature. The researchers enhance classical federated learning by introducing a clustering mechanism that groups clients based on similar data patterns, geographic or environmental, and then applies explainable AI techniques, specifically SHAP (SHapley Additive exPlanations), to interpret the contributions of individual features within each cluster.
Traditional federated learning aggregates models from various decentralized clients but assumes a uniform data distribution, leading to suboptimal performance when clients differ in environmental or technical conditions. XCFL counters this by first clustering clients using Mean Shift based on selected features, training localized models within each cluster, and applying SHAP to quantify the importance of features like solar radiation, humidity, snow depth, and air pressure on forecasting outcomes. This clustering ensures that models trained in one region reflect the meteorological reality of that region, while the SHAP values provide an explainable trail from input to output.
The weighted aggregation mechanism in XCFL relies on SHAP values to give higher influence to clients and clusters with stronger data relevance. Unlike the FedAvg method, which treats all client contributions equally, XCFL ensures that model updates from clients with more informative features have greater weight in the global model. This strategic integration directly addresses the question of how to balance model accuracy and interpretability in decentralized systems without compromising on privacy.
Using the German Solar Farm dataset, which includes 21 solar plants across Germany and historical meteorological data spanning 990 days, the study demonstrates that XCFL significantly outperforms FedAvg and centralized CNN and RNN models in all key metrics. For instance, XCFL using CNN achieved an RMSE of 0.38, MAE of 0.27, and an R² score of 0.92, compared to FedAvg’s RMSE of 0.44 and R² of 0.87. In RNN setups, XCFL maintained the lead with an RMSE of 0.41 and R² score of 0.90, confirming its robustness across architectures.
The researchers also explore how many clusters offer the best trade-off between generalization and specificity. They report that five clusters provide the best performance metrics (RMSE 0.0709, MAE 0.0271, R² 0.4159), while too few clusters underrepresent heterogeneity and too many lead to data fragmentation and weak generalization. The integration of SHAP into both local and global aggregation ensures that the model’s architecture is semantically informed, not just statistically optimized, enabling transparency in every training round.
A major practical advantage of XCFL is its ability to maintain efficiency despite the computational overhead of SHAP. By conducting aggregation at the cluster level, rather than the client level, communication costs are reduced and network stability is better maintained. This design choice responds directly to the operational challenge of deploying federated learning in real-world solar power systems, where data transmission and processing resources may be constrained.
Another central contribution of the paper lies in its handling of interpretability at multiple levels - client, cluster, and global. SHAP not only provides a ranked list of feature contributions but also helps identify sensor anomalies or redundancies in data collection. For instance, solar radiation consistently emerges as the most influential feature across all clusters, validating its physical relevance to solar PV output. Meanwhile, features like relative humidity at specific altitudes display minor negative contributions, aiding operational diagnostics.
By visualizing these SHAP values in waterfall plots and heatmaps, the study offers an intuitive understanding of what drives the model's decisions in real-time. This interpretability fulfills a key trust requirement in critical infrastructure applications like energy, where system operators must understand and validate the reasoning behind algorithmic forecasts.
The implications of XCFL extend beyond solar forecasting. Its architecture is scalable to other renewable energy sources, smart grids, and IoT-driven forecasting tasks in healthcare or finance, where decentralized data and transparency are equally critical. The authors highlight future work involving real-time dynamic clustering, additional XAI tools beyond SHAP, and broader application domains, aiming to further enhance the adaptability of the model in fast-changing environments.
- FIRST PUBLISHED IN:
- Devdiscourse

