AUTHOR=Ortega-Diaz Liliana , Jaramillo-Ibarra Julian , Osma-Pinto German TITLE=Estimation of the air conditioning energy consumption of a classroom using machine learning in a tropical climate JOURNAL=Frontiers in Big Data VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2025.1520574 DOI=10.3389/fdata.2025.1520574 ISSN=2624-909X ABSTRACT=Air conditioning energy consumption in buildings represents a considerable percentage of total energy consumption, which underlines the importance of implementing measures contributing to its reduction. Predicting energy consumption is critical to making informed decisions and identifying factors influencing power consumption. Machine learning is the most widely used approach for prediction due to its speed, accuracy, and non-linear modeling. In this study, three machine learning models were used to predict the air conditioning energy demand in a classroom of an educational building in a hot tropical climate. The models selected are SVR (Support Vector Regressor), DT (Decision Tree), and RFR (Random Forest Regressor) due to their wide use in the literature; therefore, the goal is to establish which one offers the best performance for this case study based on a comparative analysis using performance metrics. Cross-validation was used to perform robust training. Twenty-two input variables were considered: climatological, operational, and temporal. Occupancy is the variable with the highest correlation with air conditioning consumption; these two variables have a positive relationship of 0.65. Monitoring was carried out for 72 days, including weekends. Six study scenarios were considered, in which the monitoring period varied, influencing the number of samples. In addition, two sensitivity analyses were performed by modifying the time interval of the data (1, 5, 10, 20, 30, and 60 min) and the data split (50:50, 60:40, 70:30, 80:20 and 90:10). The evaluation of the models was performed using RMSE, MAE and R2 metrics, to different characteristics and approaches to error measurement. During the training phase, the RFR model achieved a coefficient of determination (R2) of 0.97, while the SVR obtained an R2 of 0.78 in the test phase. Finally, it is concluded that using shorter time intervals (every 1 min) in the data improves the performance of the predictive models. Splitting the data into 80:20 and 90:10 ratios resulted in the lowest RMSE values for the three models evaluated. Training the models with a larger amount of data allows for capturing more representative patterns, which improves their generalization ability and performance on new data.