Comparison of machine learning and statistical methods in the field of renewable energy power generation forecasting: a mini review

In the post-COVID-19 era


Introduction
The COVID-19 pandemic had a huge impact on the world economy, society, and public health and was one of the most terrible disasters in human history.The "post-COVID-19 era" is an era in which economic growth, international relations, industrial development, and people's consumption habits have greatly changed due to the pandemic (Schwab and Malleret, 2020).While the impacts of the pandemic on human society will persist for a long time, climate change is also gaining more attention as another serious crisis.The United Nations has listed climate change as a key issue in its recent Sustainable Development Goals (SDGs), which have been adopted into the 2030 Agenda (Usman et al., 2021).We can ascertain the reason: climate change can create catastrophic events, and its effects will be long-lasting, cumulative, and irreversible after a tipping point is reached (Jiao et al., 2020).CO 2 emissions from the power sector decreased significantly during COVID-19, but this was largely due to the economic recession (Bertram et al., 2021).A green economic recovery in the post-COVID-19 era has prompted countries to think about the energy transition.The restructuring of global value chains in the post-COVID-19 era also notably brings new Dou et al. 10.3389/fenrg.2023.1218603opportunities for a transition to green and low-carbon energy (Ministry of Ecology and Environment of the People's Republic of China, 2020).
To address global climate change and ensure energy security, many countries and regions have increasingly sought to shift to a low-carbon development paradigm and formulated corresponding strategies and policies (International Energy Agency, 2021).In 2021, the European Union (EU) set a new target of 40% renewable energy as a share of primary energy (up from 32%) by 2030 in the "European Climate Law" (Council of the European Union, 2021).In October 2021, the Japanese government released "The Sixth Strategic Energy Plan, " which proposes to use renewable energy as the main source of electricity and increase the proportion of installed renewable energy to 36%-38% by 2030 (Ministry of Economy, Trade and Industry, 2021).In November 2021, the U.S. published "The Long-Term Strategy of the United States: Pathways to Net-Zero Greenhouse Gas Emissions by 2050", outlining plans to achieve net zero emissions by 2050 (United States Department of State, 2021).In June 2022, China's National Energy Administration released the "14th Five-Year Plan for Renewable Energy Development" to increase annual renewable energy generation to approximately 3.3 trillion kW hours by 2025 (National Development and Reform Commission, 2022).Efforts at the policy level have opened new horizons for energy transformation in the post-COVID-19 era.
As a vital area of the energy transition, renewable energy generation is gaining increasing attention in the post-COVID-19 era.Renewable energy consists of energy from natural sources, including the sun, wind, water, and biofuels (International Energy Agency, 2023).Compared with fossil energy, renewable energy has absolute advantages in terms of environmental friendliness (Bauer et al., 2016;Li et al., 2021;Li and Haneklaus, 2022).In addition, long-term reliance on traditional nonrenewable energy sources results in resource depletion, while renewable energy generation has better sustainability (VanDeventer et al., 2019).However, the chaotic nature of the meteorological system leads to intermittent and fluctuating wind and PV power generation, while the grid input requires stability and smoothness.Furthermore, connecting a large number of new energy sources to the grid directly threatens the safety and stability of grid operation (Li et al., 2021;Krechowicz et al., 2022).This instability is not conducive to the economic dispatch of electricity, and the frequent abandonment of wind and solar causes the utilization of renewable energy to be very low (Fan et al., 2022).If PV and wind power generation can be predicted more accurately and timely, it can effectively promote beneficial grid connections and the efficient utilization of new energy sources (Barbieri et al., 2017;Alkesaiberi et al., 2022).Therefore, research on the prediction of renewable energy power generation has important research value and practical significance.
Renewable power forecasting uses the change patterns of historical data and information to directly or indirectly predict power generation in the future.Power forecasting is used to maintain grid security and stability and to provide information to make decisions regarding power dispatch (Mahmoud et al., 2018;Netsanet et al., 2018).The research of many current studies is mainly focused on grid dispatch and control, power system planning and maintenance, and power plant siting (Demolli et al., 2019;Ma et al., 2019;Chen and Liu, 2020;Dupré et al., 2020;Wang and Lin, 2023).Mainstream methods can be divided into physical, machine learning, and statistical methods (Mellit et al., 2020).The physical methods refer to the direct calculation of wind and PV power generation through physical models.Machine learning focuses on learning hidden features in historical data that can be applied to new sample data for classification or prediction (Samuel, 1967).Power generation prediction can be considered a regression problem, and support vector machines, random forests, extreme learning machines, and deep learning algorithms are receiving increasing attention as solutions to this problem.Statistical methods are based on historical data, from which information is extracted to predict time series.For example, statistical methods aim to determine the relationship between power generation and historical data such as wind speed and direction, solar irradiance, humidity, and temperature (Zheng et al., 2020).Traditional statistical methods mainly contain time series methods such as ARMA, ARIMA, and ARMAX.This paper focuses on the two most common approaches, statistical and machine learning methods.
2 Machine learning methods and statistical methods

Machine learning methods
In early research on the prediction of renewable power generation, traditional machine learning methods were widely used, the most representative of which are support vector machines (SVMs) (Fonseca et al., 2012;Shi et al., 2012) and artificial neural networks (ANNs) (Izgi et al., 2012;Dumitru et al., 2016;Son et al., 2018).With the development of this research field, traditional machine learning methods have gradually been replaced by hybrid machine learning methods, ensemble learning methods, and deep learning methods.
Hybrid machine learning methods combine the advantages of more than two methods to achieve better prediction results.A key direction is to introduce intelligent optimization algorithms into traditional machine learning to improve the performance of the algorithms.The most commonly used hybrid methods in renewable energy generation forecasting have been based on support vector machines (SVMs) and extreme learning machines (ELMs).SVM is a nonlinear classifier based on the kernel method, which can avoid the disadvantages of overfitting (Olatomiwa et al., 2015).VanDeventer et al. ( 2019) introduced a genetic algorithm (GA) based on the support vector machine and demonstrated that the GASVM model can reduce the root mean square error (RMSE) by 98% in short-term PV power prediction compared to the conventional SVM model.Similarly, Xiao et al. (2022) proposed an SVM model based on gray wolf optimization (GWO-SVM), which can significantly reduce the RMSE in PV power prediction.ELM is a special single-layer feedforward neural network with fast learning ability and good generalization.Acikgoz et al. ( 2020) used ELM for short-term wind power output prediction and improved the training speed by more than 140 times compared to ANN methods on different time scales and seasons.Wu et al. (2020) considered the impact of different weather conditions on PV predictions and used the kernel extreme learning machine (KELM) combined with the firefly algorithm (FA) and the variational modal decomposition algorithm (VMD), resulting in a normalized root mean square error Dou et al. 10.3389/fenrg.2023.1218603(NRMSE) and normalized mean absolute error (NMAE) below 10% for all weather conditions.Guermoui et al. (2021) improved the accuracy of ELM in multisegment ultrashort-term prediction by decomposing PV power series according to frequency range through an iterative filter (IF) decomposition method.Li et al. (2021) proposed an enhanced crow search algorithm (ENCSA) for ELM parameter optimization, which resulted in a mean absolute percentage error (MAPE) consistently below 4% in short-term wind power prediction.Another popular research direction is the combination of two machine learning models.Gastón et al. (2010) combined k-nearest neighbor (K-NN) and SVM to predict solar radiance with a substantial improvement in prediction accuracy compared to traditional methods.Shi et al. (2013) proposed a shortterm wind power prediction model combining radial basis function neural network (RBFNN) and least squares support vector machine (LS-SVM), which achieved excellent results in ultra-short-term wind power generation prediction 15 min in advance.Yang et al. (2014) and Dong et al. (2015) used ANN combined with SVM for PV prediction.The results show that the prediction error of the hybrid method is smaller than that of the traditional single machine learning method.Theocharides et al. (2021) proposed a framework for a PV generation forecasting method combining ANN and K-means, which showed good performance in terms of prediction accuracy and stability.Lu et al. (2021) proposed a combined wind power prediction model based on ELM and LS-SVM, which demonstrated that the method can effectively improve prediction accuracy using historical data from a wind farm in Ningxia Hui Autonomous Region, China.
Ensemble learning models combine multiple single machine learning models to obtain better accuracy and generalization.Commonly used ensemble strategies include bagging, boosting, and stacking (Voyant et al., 2017).The bagging strategy focuses on reducing the variance by training multiple parallel base learners.Random forest is a bagging ensemble algorithm that uses decision trees as base learners, which shows good stability and generalization ability in the field of wind power prediction (Mahmud et al., 2021;Ziane et al., 2021) and can cope with random fluctuations in historical data or interference from unrelated weather factors (Lahouar et al., 2017;Shi et al., 2018).The boosting strategy focuses on reducing bias and is an additive method that continuously trains new learners.Xiong et al. (2022) and Fan et al. (2022) significantly improved the accuracy of short-term wind and PV power prediction using the XGBoost algorithm.The stacking strategy differs from bagging and boosting in that its base learners are usually heterogeneous, and it uses meta-learning to effectively combine multiple base learners for prediction.Chen and Liu (2020) and Rosa et al. (2022) improved the accuracy of wind power output prediction with a stacking ensemble strategy that takes advantage of the strengths of each prediction model in extracting different features.
Deep learning is a new research direction in the field of machine learning.Deep learning methods can automatically extract highdimensional features from data.Previous research mostly used ANN models for power generation prediction (Izgi et al., 2012;Dumitru et al., 2016;Son et al., 2018).However, compared to ANNs with independent inputs, recurrent neural networks (RNNs) can better exploit the dependencies in time series.Pang et al. (2020) demonstrated that the RNN method improves the normalized mean bias error (NMBE) and RMSE by 47% and 26%, respectively, compared to the ANN method for short-term solar radiation prediction.Although the traditional RNN method can make good use of the information of the data, it suffers from the problems of short-term memory and gradient instability.Jung et al. (2020) demonstrated the effectiveness of LSTM-RNN for longterm prediction using an RNN model containing LSTM units for more than 63 months of data generated from multiple PV plants.Mellit et al. (2021) conducted experiments on PV power prediction over four different short-term time horizons and demonstrated that LSTM performed the best.Ahn and Park (2021) introduced LSTM units into a deep RNN for ultrashort-term and short-term prediction, and the prediction accuracy was above 92% in all cases.Li et al. (2020) introduced attention mechanisms in the traditional LSTM to reduce the effects of random variations in wind power.Luo et al. (2021) combined physical laws and domain knowledge to impose constraints on LSTM to reduce unreasonable PV power prediction results and improve the robustness of the predictions.Shahid et al. ( 2021) introduced a GA into LSTM and improved the accuracy of wind power prediction by 6%-35% compared to existing techniques.Hu and Chen ( 2018) used a combined model of LSTM and ELM to predict wind speed and outperformed SVM, LSTM, and ELM models alone in predictions 10 min and 1 h in advance.Carrera et al. (2020) used deep feedforward networks (DFN) and RNN trained on historical weather forecast data and demonstrated that the models are more accurate in predicting PV generation 1 day ahead than a single machine learning method.Hossain et al. (2021) proposed a hybrid model based on convolutional layers, gated recurrent unit (GRU) layers, and a fully connected neural network for wind power prediction, which reduced MAE, RMSE, and MAPE by 2.11%, 0.72%, and 16.88%, respectively, for wind power generation from Australian capital wind farm compared to RNN.Meng et al. (2022) combined a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU) to propose an ultrashort-term wind power prediction model and demonstrated that the model has lower prediction error and higher solving efficiency.The methods mentioned above are summarized in Table 1.
Hybrid machine learning methods have the advantage of being able to exploit the strengths of different methods to improve the prediction of machine learning models.However, intelligent optimization algorithms are often uncontrollable and slow to converge.It is also a challenge to choose the most suitable model for hybridization.The advantage of ensemble learning is the ability to organically combine multiple single machine learning models to obtain more accurate, stable, and robust results.However, the disadvantages are increased complexity and time consumption.The advantage of deep learning is its powerful learning ability, so it can handle many complex problems and a large amount of data.However, it requires powerful computational resources, and the model is difficult to build and train.Rajagopalan and Santoso (2009) used the autoregressive moving average (ARMA) to forecast wind power within 1 hour.Gao et al. (2009)  ②This article is about the power generation prediction of direct or indirect wind and PVs, speed prediction and solar radiation prediction.

Statistical methods
wind power prediction.The limitation of ARMA is its inability to handle nonstationary series, which restricts its performance on complex forecasting tasks (Diagne et al., 2013)

Comparison
In contrast to statistical methods, machine learning methods have been widely used in recent years.In terms of the time horizon of prediction, current research mostly focuses on shortterm power generation prediction, in which both statistical and machine learning methods show good performance, but their performance in long-term prediction needs to be improved.In terms of modeling difficulty, statistical methods models do not require large amounts of data to build models, while machine learning models require large amounts of data to train.In terms of applicability, statistical methods describe the relationship between current and historical values.However, this also means that these methods are only suitable for predicting phenomena related to historical data, and most models can only handle linear relationships (Demolli et al., 2019).Machine learning methods are capable of handling complex multivariate prediction problems and capturing nonlinear relationships for a wider range of applications (Donadio et al., 2021;Rahman et al., 2021).Cocchi et al. (2018) demonstrated that machine learning methods have higher prediction accuracy than statistical methods in complex power generation prediction problems.Statistical methods have good interpretability, but machine learning methods are often considered to be "black box problems" (Kane et al., 2014;Alsaigh et al., 2023).
Several studies have combined the advantages of machine learning methods in fitting nonlinearities with the advantages of statistical methods in fitting linearities.Cadenas and Rivera (2010) and Jiao (2018) both developed hybrid models of ARIMA models and ANNs.ARIMA was first used to make wind speed predictions and then the obtained errors were used to build neural networks, which predicted wind speed with higher accuracy than the ARIMA and ANN models working separately.Wu and Chen (2011) combined ARMA and time delay neural network (TDNN) to predict solar radiation and also achieved good results.In addition, Liu et al. (2012) proposed the use of ARIMA to determine the structure of ANN, and the MAPE was reduced by 27.38% in multi-step prediction compared to that predicted by ARIMA alone.Zhang et al. (2020b) combined a hybrid machine learning model with an ARIMA model to predict the wind speed for the next 4 h and demonstrated that the prediction error of this method was significantly reduced.

Discussion
The above analysis shows that machine learning methods are more common than statistical methods in practical renewable energy generation forecasting efforts.However, existing machine learning prediction methods have some shortcomings.The prediction models proposed in many studies only target specific regions and time horizons, resulting in models that do not apply to prediction tasks in a wider range of scenarios (Wang et al., 2019).The machine learning models mentioned above were studied using different datasets, and their predictions were influenced by the local climate and geography (Liu et al., 2022).Additionally, machine learning methods perform differently in different time horizons (Lonij et al., 2013;Lipperheide et al., 2015;Das et al., 2018).Therefore, it is difficult to compare the performance of individual models with different spatiotemporal characteristics (Krechowicz et al., 2022).In addition, the prediction and decision steps of traditional power systems are executed separately.Accurate prediction results can provide useful information for decision-making (Mahmoud et al., 2018;Netsanet et al., 2018).However, this separation of forecasting and decisionmaking leads to an information gap between the forecaster and the decision-maker (Yu et al., 2021;Zhao et al., 2022).Moreover, this framework is not conducive to the simultaneous improvement of the quality of forecasting and decision-making.In this paper, the following future research directions are proposed to address the above issues.
Sufficient and valid multiscale information needs to be extracted at multiple temporal and spatial scales to match current patterns of change in generation, select and integrate multiple prediction models that fit current characteristics, cover larger regions of PV and wind farm clusters, support prediction tasks with multiple timescales, and move toward more integrated and comprehensive applications (Chen et al., 2019;Jiang et al., 2019;Zhang et al., 2020a;Wang et al., 2022;Che et al., 2023).( 2) Building integrated models for forecasting and decisionmaking.The integration of renewable energy forecasting models and application-specific decision models into a unified framework avoids the loss of important information in the forecasting phase and facilitates the synergistic optimization of forecasting and decision-making (Yan and Wang, 2022;Zhao, 2022).

Conclusion
As a key area of the energy transition, renewable power generation forecasting technology has become a hotspot for research.Accurate power prediction of renewable energy generation is beneficial to the safety and stability of the power grid and economic dispatch.In this paper, we review relevant research in renewable energy generation forecasting by machine learning methods and statistical methods.The discussion of machine learning methods and statistical methods provides us with the following conclusions: 1) Machine learning methods are capable of dealing with complex nonlinear relationships, and statistical methods are mostly applicable to linear relationships.2) Machine learning methods, especially deep learning methods, can perform well on large-scale datasets, and statistical methods are mostly applicable to small-scale data.3) Machine learning methods are less interpretable, while statistical methods tend to have better interpretability.However, the models proposed in the existing research are only for specific scenarios.In power systems, the prediction and decision steps are separated.Based on these problems, we propose that future research addresses the construction of unified forecasting models and integrated models for forecasting and decision-making.This review provides some reference for the research of machine learning in the field of prediction of renewable energy generation.Ji and Chee, 2011.

TABLE 1 Power prediction of renewable energy generation based on machine learning methods.
(Das et al., 2018)s mainly include persistence methods, regression methods, exponential smoothing methods, and time series methods.Most research works in the field of wind power forecasting use time series methods(Das et al., 2018).①N/A indicates that the time horizon of the forecast is not explicitly stated in the text.
②This article is about the power generation prediction of direct or indirect wind and PVs, including wind speed prediction and solar radiation prediction.

TABLE 2 Power prediction of renewable energy generation based on statistical methods.
introduced an autoregressive conditional heteroskedasticity model (ARCH) based on the ARMA model to solve the problem of variance variation of wind speed time series, which has higher prediction accuracy than the classical ARMA model in ①N/A indicates that the time horizon of the forecast is not explicitly stated in the text. Note: Li et al. (2014)14)012))used an autoregressive integrated moving average (ARIMA) model to obtain a greater than 50.49%improvement in accuracy over the persistence model for short-term wind power prediction.Zhang et al. (2014)used the ARIMA model for short-term PV power prediction that outperformed two machine learning methods, the RBFNN, and the LS-SVM;Dokuz al. (2018)used the DBSCAN clustering algorithm to preprocess the original data before ARIMA prediction and reduced the RMSE by 20% in long-term wind speed prediction;Li et al. (2014)proposed ARMAX for PV power