New machine learning model improves corn yield forecasts in drought-affected areas

Accurate crop yield prediction is critical for global food security, economic planning, and insurance modeling. Traditional process-based (PB) models rely on biophysical equations and empirical data, but often fail to scale with high-dimensional remote sensing datasets. Pure machine learning (ML) models, while adept at processing large data volumes, tend to overfit or misrepresent crop behavior in extreme climate conditions due to lack of domain-specific knowledge.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 22-03-2025 17:18 IST | Created: 22-03-2025 17:18 IST
New machine learning model improves corn yield forecasts in drought-affected areas
Representative Image. Credit: ChatGPT

A new artificial intelligence model developed by researchers at the University of Wisconsin–Madison has demonstrated improved accuracy in forecasting corn yields across the U.S. Corn Belt by integrating soil moisture and drought sensitivity into its predictive framework. The model, known as KGML-SM (Knowledge-Guided Machine Learning with Soil Moisture), outperformed existing machine learning approaches, particularly during drought years.

The findings, published in a paper titled "Knowledge-guided machine learning model with soil moisture for corn yield prediction under drought conditions" and available as a preprint on arXiv, address a long-standing gap in agricultural modeling: the inability of most models to accurately forecast yield under water-stressed conditions. By incorporating soil moisture as an intermediate variable and using a specialized drought-aware loss function, the KGML-SM framework achieves greater spatial and temporal precision in yield estimation.

Conventional models show limitations during drought

Accurate crop yield prediction is critical for global food security, economic planning, and insurance modeling. Traditional process-based (PB) models rely on biophysical equations and empirical data, but often fail to scale with high-dimensional remote sensing datasets. Pure machine learning (ML) models, while adept at processing large data volumes, tend to overfit or misrepresent crop behavior in extreme climate conditions due to lack of domain-specific knowledge.

The KGML-SM model integrates strengths from both approaches. It combines simulated and real-world data, links soil moisture directly to yield via an encoder-decoder architecture, and adapts to field-level environmental conditions. This structure improves predictive robustness without relying solely on observational data, which is often sparse or incomplete in real-world agricultural settings.

Central to the KGML-SM framework is the modeling of soil moisture as a predictive bridge between weather inputs and corn yield outcomes. The model uses a Weather-to-Soil (W2S) encoder, which translates historical weather and satellite-based vegetation indices into soil moisture estimates. These intermediate variables are then combined with other features to forecast yield at the county level.

The soil moisture data was generated using the APSIM crop simulation model and calibrated with USDA National Agricultural Statistics Service (NASS) county-level records. The dual-step approach allowed the researchers to capture both the physiological response of crops to weather and the practical variability observed in real-world production.

The KGML-SM model was tested across five recent growing seasons (2019–2023) in 12 states spanning the U.S. Corn Belt, including Iowa, Illinois, Indiana, Nebraska, and South Dakota. Its performance was benchmarked against four standard models: linear regression (LR), ridge regression (RR), random forest (RF), and multilayer perceptron (MLP).

The new model consistently achieved lower root mean square error (RMSE) and higher R² scores than the baseline methods. The most notable gains occurred in years with severe drought conditions, where traditional models tended to overestimate yields due to insufficient modeling of water stress. In contrast, KGML-SM’s drought-aware loss function penalized overprediction in dry areas, improving accuracy in high-risk zones.

Improved interpretability through attention mechanisms

In addition to predictive accuracy, the KGML-SM framework incorporates attention-based feature weighting, allowing the model to dynamically adjust the importance of input variables throughout the growing season. Soil moisture emerged as the dominant feature during July and August, corresponding with corn’s critical reproductive stages.

Visualization of attention scores and prediction errors showed that the model allocated greater weight to soil moisture in drought-prone counties compared to normal ones. This adaptive mechanism contributed to localized performance gains and provided clearer explanations for model decisions, addressing common transparency challenges in deep learning.

The model’s attention to spatial heterogeneity also enhanced its performance across diverse geographic zones. Yield prediction errors were consistently lower in western Corn Belt states, where variability in rainfall and soil conditions poses a significant challenge for other models. The inclusion of simulated training data further reduced prediction volatility and increased resilience under anomalous conditions.

Model results included precise yield forecasts, attention maps, and comparative scatter plots that showed a tighter alignment between predicted and observed yields for KGML-SM than for traditional models. Unlike the linear regression model, which demonstrated a pattern of yield compression, KGML-SM captured yield variance across high- and low-performing counties.

Future applications and scalability

The research outlines several potential extensions for KGML-SM. Its encoder-based architecture is suitable for transfer learning applications, allowing it to be adapted for use in other crops, regions, or low-data environments. By pretraining on simulated data and fine-tuning with limited local observations, the model could provide decision support in data-scarce developing economies.

Additional use cases include early warning systems, crop insurance risk modeling, and in-season management tools for farmers and agribusiness. The model’s modular design enables integration with real-time data streams, including satellite updates and IoT-based soil sensors.

As climate volatility increases the frequency and severity of drought events, yield forecasting tools that account for soil-water interactions are expected to play a growing role in agricultural planning. The KGML-SM framework offers a scalable, data-efficient, and scientifically grounded solution for anticipating production shifts in the face of environmental stress.

Its capacity to incorporate biological processes, real-time remote sensing, and AI-driven optimization may help mitigate uncertainty in agricultural systems under climate change. The model’s focus on transparency and interpretability also enhances its potential for adoption by public and private sector stakeholders seeking to modernize crop forecasting infrastructure.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback