AUTHOR=Saminathan S. , Malathy C. TITLE=Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data JOURNAL=Frontiers in Big Data VOLUME=Volume 6 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2023.1175259 DOI=10.3389/fdata.2023.1175259 ISSN=2624-909X ABSTRACT=Air pollution is a serious challenge to mankind as it poses many threats on health grounds. Air pollution can be measured using the Air Quality Index (AQI). Air pollution is the result of both outdoor and indoor sources. AQI is being monitored by various institutions across the globe. The measured air quality data is kept mostly for public purview. Knowing the past values of AQI, future values of AQI can be predicted or the class/category value of the numeric value can be obtained. This forecast can be done using supervised machine learning methods with more accuracy. In this paper, multiple machine learning approaches are used to classify PM2.5 values. The values for the pollutant PM2.5 are classified into different classes using machine learning algorithms like Logistic Regression, Support Vector Machines, Random forests, Extreme Gradient boosting, and their Grid search equivalents along with the deep learning method Multi-Layer Perceptron. After performing multi-class classification using these algorithms, the parameters Accuracy, and Per-Class Accuracy are used to compare the methods. As the dataset used is imbalanced, SMOTE based approach for balancing the dataset is used. The accuracy of the random forest multiclass classifier with SMOTE based dataset balancing is found to give better accuracy compared to all other classifiers that use the original dataset.