TITLE:
Unveiling the Predictive Capabilities of Machine Learning in Air Quality Data Analysis: A Comparative Evaluation of Different Regression Models
AUTHORS:
Mosammat Mustari Khanaum, Md Saidul Borhan, Farzana Ferdoush, Mohammed Ali Nause Russel, Mustafa Murshed
KEYWORDS:
Regression Analysis, Air Quality Index, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Logistic Regression, K-Nearest Neighbors, Machine Learning, Big Data Analysis
JOURNAL NAME:
Open Journal of Air Pollution,
Vol.12 No.4,
December
11,
2023
ABSTRACT: Air quality is a critical concern for public health
and environmental regulation. The Air Quality Index (AQI), a widely adopted
index by the US Environmental Protection Agency (EPA), serves as a crucial metric
for reporting site-specific air pollution levels. Accurately predicting air
quality, as measured by the AQI, is essential for effective air pollution
management. In this study, we aim to identify the most reliable regression
model among linear discriminant analysis (LDA), quadratic discriminant analysis
(QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four
different regression analyses using a machine learning approach to determine
the model with the best performance. By employing the confusion matrix and
error percentages, we selected the best-performing model, which yielded
prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA,
logistic regression, and KNN models. The logistic regression model outperformed
the other three statistical models in predicting AQI. Understanding these
models' performance can help address an existing gap in air quality research
and contribute to the integration of regression techniques in AQI studies,
ultimately benefiting stakeholders like environmental regulators, healthcare
professionals, urban planners, and researchers.