Machine Learning Uncovers the Environmental Toll of Stubble Burning in Northern India
The study investigates the use of machine learning models to predict the Air Quality Index (AQI) in Northern India, highlighting the significant impact of stubble burning on air pollution. The findings emphasize the effectiveness of models like Random Forest in predicting AQI and the need for better data on meteorological factors to enhance accuracy.
A comprehensive study conducted by researchers from the School of Computer Science and Engineering, University of Westminster, and the School of Computer Science and Technology, University of Bedfordshire addresses the critical issue of air pollution in India, focusing on the northern states of Delhi, Punjab, and Haryana. Air pollution is a severe public health threat, with the World Health Organization estimating that around seven million people die annually due to exposure to fine particulate matter in the air. In India, cities like Delhi frequently experience hazardous air quality levels, driven by factors such as industrial emissions, vehicular pollution, and seasonal agricultural practices like stubble burning. This research aims to predict the Air Quality Index (AQI) by utilizing various machine learning models to understand the contribution of different pollutants to air quality and to assess the impact of stubble burning in Punjab on AQI levels across the region.
Leveraging Machine Learning for Air Quality Predictions
The researchers used a comprehensive dataset from the Central Pollution Control Board (CPCB), covering multiple pollutants such as PM2.5, PM10, NO2, SO2, CO, and others, collected from monitoring stations across Delhi, Haryana, and Punjab. The study applied several machine learning models, including Random Forest, CatBoost, XGBoost, Support Vector Regressor (SVR), and the deep learning model LSTM, to predict AQI. The findings indicate that the Random Forest model outperformed other models in accuracy, closely followed by CatBoost and XGBoost. The research highlights that while Random Forest provided the best performance, the other models also demonstrated strong predictive capabilities, particularly in scenarios involving complex, high-dimensional data. In addition, the time series model SARIMAX was employed, primarily for predictions related to specific cities like Delhi, given its limitations in handling large datasets across multiple locations. The performance of these models was evaluated using metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2), with Random Forest achieving the highest accuracy, indicating its effectiveness in explaining the variability in AQI.
The Stubble Burning Crisis
A significant focus of the study is the role of stubble burning, a major source of air pollution in North India. Stubble burning, particularly in the states of Punjab and Haryana, is a common agricultural practice where farmers burn crop residue after harvest to clear the fields for the next planting season. This practice has been identified as a significant contributor to the region's air pollution, especially during the post-harvest season from October to December. The research provides a detailed analysis of how stubble burning leads to severe air quality degradation in Delhi and neighboring areas. The data show a clear correlation between stubble burning activities and spikes in AQI during these months, underscoring the urgent need for alternative, sustainable farming practices to mitigate this impact.
Overcoming Data Challenges
In conducting the study, the researchers also addressed the challenges associated with the preprocessing of environmental data, such as dealing with missing values and outliers. Missing data is a common issue in large datasets, particularly those related to environmental monitoring. The study applied mean imputation and other techniques to handle missing values, ensuring the dataset's reliability for use in machine learning models. Additionally, the research involved a thorough examination of outliers, which are often indicative of extreme pollution events or errors in data collection. The researchers used visualizations like box plots to identify and analyze outliers, opting for a nuanced approach that balanced the need to maintain data integrity with the recognition that not all outliers represent erroneous data.
The Need for Comprehensive Meteorological Data
The study concludes that while machine learning models like Random Forest are highly effective in predicting AQI, there is a pressing need for more comprehensive data on meteorological factors, which are often missing or incomplete. These factors, such as temperature, humidity, and wind direction, play a crucial role in the dispersion and concentration of pollutants and their inclusion in predictive models could significantly enhance accuracy. The research suggests that integrating better meteorological data with machine learning models could lead to more accurate AQI predictions and more effective pollution control strategies.
- FIRST PUBLISHED IN:
- Devdiscourse