In smart cities, air pollution has harmful impacts on human physical health and the quality of living environment. correctly predicting air quality is important for developing effective strategies to reduce air pollution and promote healthier, more sustainable environments. Tracking and predicting air pollution is essential for enabling individuals to make well-informed choices that safeguard their health. Predicting air quality is vital for public health, environmental management, and the development of effective policies. This research focuses on predicting the Air Quality Index (AQI) using machine learning techniques, with an emphasis on improving model efficiency and prediction accuracy. This study compares the performance of two regression algorithms: Linear Regression and Principal Component Regression (PCR) with Decision Tree Regression, across several key evaluation metrics. The performance is assessed using measures such as Mean Squared Error (MSE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), Median Absolute Error (MedAE), Explained Variance, Adjusted R². The analysis reveals that the PCR with Decision Tree Regression model outperforms Linear Regression in terms of accuracy, as indicated by lower error values and higher explained variance. The superior model also demonstrates better generalization, with more robust metrics like MedAE, which reduces sensitivity to outliers. Overall, the study highlights the advantages of combining principal component regression with decision tree regression for enhanced predictive accuracy.
Introduction
I. Importance of AQI
Air Quality Index (AQI) measures pollution levels from key pollutants: PM2.5, PM10, CO, NO?, SO?, O?.
AQI ranges from 0 (Good) to 500 (Hazardous), indicating public health risk levels.
Accurate AQI forecasting is vital for public health, urban planning, and pollution management.
II. Machine Learning for AQI Forecasting
Machine Learning (ML) can analyze historical pollutant and weather data to forecast AQI.
Popular ML models used include:
Regression models
Decision Trees
Random Forests
Neural Networks
These models identify patterns and predict future air quality with high accuracy using large datasets.
???? Ensemble Learning Approach: PCR + Decision Tree
A. Method Overview
Principal Component Analysis (PCA): Reduces data dimensionality by transforming correlated features into uncorrelated components.
Decision Tree Regression: Models complex, non-linear relationships in the data.
Combined Approach (PCR + Decision Tree):
PCA simplifies the data.
Decision Tree captures intricate patterns in the reduced data.
Result: A powerful and accurate predictive model, especially for high-dimensional data.
???? Evaluation Metrics Used
MSE (Mean Squared Error): Lower is better.
MPE / MAPE (Percentage Errors): Show prediction accuracy in percentage terms.
MedAE (Median Absolute Error): Robust indicator of typical prediction error.
Explained Variance / Adjusted R²: Show how well the model fits the data.
???? Results Comparison
Metric
Linear Regression
PCR + Decision Tree Regression
MSE
28.2020
0.8213
MAPE
3.8531
0.1749
MedAE
2.0088
0.0000
Explained Variance
0.8861
0.9967
Adjusted R²
0.8823
0.9966
PCR + Decision Tree Regression significantly outperforms linear regression on all metrics.
Notably:
Near-perfect MedAE (0.0000) suggests high precision.
Explained Variance (~1) indicates excellent model fit.
Conclusion
Based on the performance metrics, PCR with Decision Tree Regression outperforms Linear Regression. The model exhibits a significantly lower error across various evaluation measures, including MSE, MPE, and MAPE, which indicates it makes more accurate predictions. The near-perfect MedAE indicates that the model\'s predictions are highly accurate for at least half of the data points. Additionally, PCR with Decision Tree Regression demonstrates a much higher ability to explain the variance in the data and a stronger overall fit to the data, as indicated by the Explained Variance and Adjusted R² values. These factors suggest that PCR with Decision Tree Regression is the more effective algorithm for this particular task.
References
[1] Shorouq Al-Eidi, FathiAmsaad, Omar Darwish, YahyaTashtoush, Ali Alqahtani, NiveshithaNiveshitha, “Comparative Analysis Study for Air Quality Prediction in Smart Cities Using Regression Techniques”
[2] R. Sharma, G. Shilimkar, and S. Pisal, ‘‘Air quality prediction by machine learning,’’ Int. J. Sci. Res. Sci. Technol., vol. 8, pp. 486–492, 2021
[3] A. Kumar and P. Goyal, “Forecasting of air quality in Delhi using principal component regression technique,” Atmospheric Pollution Research, vol. 2, no. 4, pp. 436–444, 2011.