State-Of-The-Art Solar Energy Forecasting Approaches: Critical Potentials and Challenges

For decades, solar energy has taken an increasingly important part, which will continue to rise, driven by carbon peaking and carbon neutrality strategic goals, in the energy consumption of China (Yang et al., 2021a; Mahidin et al., 2021). Due to the intermittence and volatility of sunlight, photovoltaic (PV) power generation is more erratic than conventional power which results in some problems of the grid: frequency instability (Liu et al., 2020; Murty and Kumar, 2020), dispatch difficulty (Peng et al., 2020; Tummala, 2020), and voltage and current surges (Bozorg et al., 2020; Yang et al., 2021b). Hence, accurately forecasting the power generation of the PV system is one of the major issues of PV system’s engineering practice to settle the aforementioned problems (Huang et al., 2021a; Yang et al., 2021c). According to the modeling means of prediction, the prevailing PV power prediction methods are broadly divided into three categories, namely, physical, statistical, and artificial intelligence (AI) forecasting technologies (Yang et al., 2021d). Furthermore, the applicable ranges of different forecasting technologies are given in Figure 1. Moreover, these PV power forecasting technologies face different challenges. First, it is difficult for physical forecasting technology to obtain accurate future weather forecast information and determine output characteristic model parameters. Second, statistical forecasting technology is not demanding for geographical location and other information of PV systems but requires masses of historical data to deduce statistics laws. As for AI forecasting technology, it is easy to trap in the local optimum because of internal defects of the AI algorithm. This work aims to clarify aforementioned problems and give some perspectives on various PV power prediction methods.


INTRODUCTION
For decades, solar energy has taken an increasingly important part, which will continue to rise, driven by carbon peaking and carbon neutrality strategic goals, in the energy consumption of China (Yang et al., 2021a;Mahidin et al., 2021). Due to the intermittence and volatility of sunlight, photovoltaic (PV) 1 power generation is more erratic than conventional power which results in some problems of the grid: frequency instability (Liu et al., 2020;Murty and Kumar, 2020), dispatch difficulty (Peng et al., 2020;Tummala, 2020), and voltage and current surges (Bozorg et al., 2020;Yang et al., 2021b). Hence, accurately forecasting the power generation of the PV system is one of the major issues of PV system's engineering practice to settle the aforementioned problems (Huang et al., 2021a;Yang et al., 2021c).
According to the modeling means of prediction, the prevailing PV power prediction methods are broadly divided into three categories, namely, physical, statistical, and artificial intelligence (AI) forecasting technologies (Yang et al., 2021d). Furthermore, the applicable ranges of different forecasting technologies are given in Figure 1. Moreover, these PV power forecasting technologies face different challenges. First, it is difficult for physical forecasting technology to obtain accurate future weather forecast information and determine output characteristic model parameters. Second, statistical forecasting technology is not demanding for geographical location and other information of PV systems but requires masses of historical data to deduce statistics laws. As for AI forecasting technology, it is easy to trap in the local optimum because of internal defects of the AI algorithm. This work aims to clarify aforementioned problems and give some perspectives on various PV power prediction methods.

PHYSICAL PREDICTION METHOD
The physical prediction method refers to a technology that excavates the factors related to PV power generation from the principle and then creates a physical model. Specifically, physical method modeling is based on numerical weather prediction (NWP) by utilizing atmospheric physical data including wind speed, temperature, rainfall, humidity, length of day (Urquhart et al., 2013), and cloud image via a total sky imager (Shen et al., 2019) or satellite (Tuohy et al., 2015). Besides, it can be further classified as a simple physical model method and a complex physical model method. A simple physical model needs power system parameters, weather data, satellite observations, and so on (Hammer et al., 1999). The literature (Peder et al., 2009) applies a simple physical prediction model combined with the HIRLAM mesoscale weather pattern to forecast the future power generation of 21 small PV power stations in the Jutland peninsula but obtains a relatively poor predictive value. The literature (Inman et al., 2013) verifies that the PV power prediction model of wavelength-independent only absorbs light of aqueous vapor after experiments. In order to ensure the stable operation of the bulk power grid, the prediction of power generation of the PV microgrid system must be more accurate. In terms of this issue, work (Lorenz et al., 2011) creates a complex physical prediction model based on the local weather forecast data and performs prediction tests based on an actual PV power station to assess the accuracy of the model. NWP models which can be classified into two categories of wide-area prediction models and local area models prediction are utilized to forecast the solar illumination intensity and cloud distribution. Local area models are usually used for short-term forecasting of the PV plant power. So far, NAM (Mathiesen andKleissl, 2011), MM5 (Fernandez-Jimenez et al., 2012), and WRF (Lima et al., 2016) are developed and applied in the PV power prediction of local area models. NAM takes SURFRAD actual measurement data as inputs and takes MBE and RMSE as the evaluation index of the model performance. Moreover, the prediction results utilizing with the NAM model prove that applying the irradiance as the model output variable can decrease the error and offset of power forecasting. The MM5 model can provide power production prediction values of each hour in the following day through analyzing historical information of hourly power outputs and estimation values of climate parameters in the past 1.5 years. The investigation of the wide-area PV power prediction model is worth paying more attention due to its well accuracy in estimating cloudy and cloudless sky situations. GFS and ECMWF (Mathiesen and Kleissl, 2011) are two typical models for the wide-area PV power prediction method.
Under the condition of reasonable model parameters, the physical PV power forecasting method can accurately predict the results of the future power output. However, the physical forecasting approach has the disadvantage of requiring a complex model of the solar radiation output and a characteristic model of the PV power generation system, as well as the precise future weather forecast information. In addition, determining the parameter values of the output characteristic model is more complicated for different types of generating unit systems (Perez et al., 2002).

STATISTICAL PREDICTION METHOD
The statistical method needs to collect a large number of data related to the power output of the PV power generation system to regress some unknown constants and further obtain the functional relationship between the output power and the measurable unknown. According to the amount of unknowns, the statistical method can be divided into the unary linear regression method, multiple linear regression method, and nonlinear regression method. Because there are many factors affecting PV system power generation, the prediction result is not satisfactory by using the unary linear regression method. The multiple linear regression method adopted in the literature (Li et al., 2011) takes solar radiation intensity and ambient temperature as two main factors to build a multiple linear regression model of the PV system and finally obtains the linear function relation of the output power on six unknowns, including radiation intensity and temperature. By using this linear function, the output power of the PV power generation system can be predicted as long as the value of corresponding solar radiation and ambient temperature is obtained. The literature (Li and Li, 2008) employs the support vector machine (SVM) to design a regression algorithm of the solar farm power prediction model. Because the SVM is based on the principle of risk minimization and has a strong ability of generalization, the error of solving results is relatively smaller even though there are fewer training samples. Furthermore, the SVM learning algorithm is used to solve the convex quadratic optimization problem; hence, the solution obtained by the SVM is the global optimal solution. In reference (Zhu and Tian, 2011), the least squares support vector machine (LSSVM), which is the improved version of the SVM mentioned earlier, is used to predict the output power of the PV power generation system. NARX and NARMAX (Di Piazza et al., 2016) are representative nonlinear regression models which take solar irradiance, temperature, and day time as input variables of prediction models. The literature (Bouzerdoum et al., 2013) proposes the SARIMA model and studies its performance in power prediction of solar farms. Moreover, SARIMA enhances the prediction accuracy of real solar farms. In the literature (Pedro and Coimbra, 2012), the ARIMA model which is the linear non-stationary method is applied to forecast a local 1 MW PV plant. This model takes hourly power output values for the past half year as input variables and the mean absolute percentage error (MAPE) calculated by Eq. 1 as the performance metrics of the model. The experimental results indicate that the ARIMA model is more sensitive and accurate in reflecting the shape changes of solar irradiance. Similarly, the ARIMAX (Pedro and Coimbra, 2012) model which adopts the former solar irradiance as inputs also achieves approximate goals compared to the ARIMA model. However, the influence of weather factors other than solar illumination is not fully considered for both ARIMA and ARIMAX models.
where P pre is the predicted power value, P meas represents the mean of actual power values, P 0 is the capacity of tested solar farm, and n is the sample size.
These aforementioned regression prediction models try to modify the models through the deviation between the measured and predicted values of PV power generation. In particular, the multiple linear regression method can enhance the prediction accuracy without extra measurement data, which is a method worthy of further study. The merits of the statistical method are simple operation, fast prediction, and good relation expression between the factors and the output power, hence more suitable for fitting the new situation. However, the statistical method has the complexity and difficulty in establishing the regression equation due to its high accuracy demand of the distribution rule and historical sample data. Thus, it has a lower prediction accuracy.

AI PREDICTION METHOD
Nowadays, PV power forecasting based on the AI algorithm is a very popular research area because of its strong self-learning and self-adaptation ability. In the literature (Kaushika et al., 2014), the PV array generation sequence, weather type, irradiance intensity, and temperature are adopted to build the backpropagation (BP) neural network prediction model. But this method requires a large number of historical power data and massive calculation. Moreover, it is not suitable for new or under-construction power stations due to unavailable historical data. In the literature (Tang et al., 2016), the extreme learning machine (ELM) is employed to forecast the extracted power of solar farms. Particularly, combined entropy is introduced to the prediction model and observably promotes the forecasting accuracy and the convergence. However, neural networks often need a large number of training samples to obtain a good accuracy and generalization ability (Huang et al., 2021b;Yang et al., 2021e). As a result, its prediction performance will be greatly reduced in the case of small samples. In addition, the structure and parameters of the neural network are not easy to determine. Moreover, the existing training algorithms often lead to parameters falling into local minimum . Therefore, it is urgent for developing a new training algorithm to train the neural network model of PV power prediction.

DISCUSSION AND CONCLUSION
The efficient PV power forecasting technology can not only improve the grid connection ability and security but also effectively reduce light discarding. Also, various prediction technologies of the aforementioned PV plant are summarized and evaluated in Table 1.
But the PV power forecasting technology still faces many challenges. Recommendations and limitations for future research studies are shown as follows: 1) Pre-processing of the mass of experimental data is manually performed; hence, efficient algorithms should be developed to effectively summarize and extract information data and establish connections among them; 2) It is urgent for developing a swarm intelligence algorithm to train the neural network model of PV power prediction; 3) Regional prediction is important for power dispatching which should be further analyzed and studied; 4) Many works only consider cloud cover as a meteorological factor to represent the extent of sky cover but ignore that the partial shading of the PV panel caused by a cloud will lead to the multi-peak phenomenon of the PV curve. This issue requires further research for more accurate power prediction.

AUTHOR CONTRIBUTIONS
HY: writing the original draft and editing. BY: conceptualization. YH: visualization and contributed to the discussion of the topic. NC: formal analysis.