AUTHOR=Oliveira Ewerton Cristhian Lima de , Carvalho Eduardo Costa de , Jesus Edmir dos Santos , Rocha Rafael de Lima , Arruda Helder Moreira , Alves Ronnie Cley de Oliveira , Tedeschi Renata Gonçalves TITLE=A statistical and machine learning approach for monthly precipitation forecasting in an Amazon city JOURNAL=Frontiers in Earth Science VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/earth-science/articles/10.3389/feart.2025.1589753 DOI=10.3389/feart.2025.1589753 ISSN=2296-6463 ABSTRACT=IntroductionCity-scale rainfall prediction is crucial for various essential services, such as transportation, supply chain logistics, and leisure activities, as well as for preventing risks associated with high volumes of rain. Belém is a city located in northern Brazil with distinct periods of precipitation, including a rainy season that directly impacts the city’s dynamics and the quality of life of its citizens, often resulting in flooding and infrastructure accidents in several city zones.MethodsMeteorological studies generally use large volumes of data; however, our study is characterized by using a data source with fewer years to predict rainfall precipitation. Additionally, we use meteorological data from a set of sensors installed at a meteorological station located in Belém to train multivariate statistical and machine learning (ML) models to predict precipitation. Besides the use of algorithms, another evaluation was conducted on Feature Composition based on statistical methods to investigate the impact of variables on the prediction.ResultsThe results obtained in our investigation indicate that the vector autoregressive moving average with exogenous regressors (VARMAX) model achieved the best performance in rainfall forecasting, with an average root mean square error (RMSE) of 9.1833 in time series cross-validation, outperforming the other models.DiscussionThe climate-driven patterns directly influenced the performance of the rainfall forecasting models evaluated in this study. As cited above, the VARMAX had the lowest avRMSE, which was obtained using a lag-1 value of exogenous variables. This is particularly noteworthy, as this same configuration not only produced the lowest RMSE for forecasts in 2022 but also highlighted the importance of relative humidity and solar radiation in enhancing predictive accuracy, even in the presence of data anomalies related to solar radiation measurements.