Optimal Power Flow Calculation Considering Large-Scale Photovoltaic Generation Correlation

In order to analyze the impact of large-scale photovoltaic system on the power system, a photovoltaic output prediction method considering the correlation is proposed and the optimal power flow is calculated. Firstly, establish a photovoltaic output model to obtain the attenuation coefficient and fluctuation amount, and analyze the correlation among the multiple photovoltaic power plants through the k-means method. Secondly, the long short-term memory (LSTM) neural network is used as the photovoltaic output prediction model, and the clustered photovoltaic output data is brought into the LSTM model to generate large-scale photovoltaic prediction results with the consideration of the spatial correlation. And an optimal power flow model that takes grid loss and voltage offset as targets is established. Finally, MATLAB is used to verify that the proposed large-scale photovoltaic forecasting method has higher accuracy. The multi-objective optimal power flow calculation is performed based on the NSGA-II algorithm and the modified IEEE systems, and the optimal power flow with photovoltaic output at different times is compared and analyzed.


INTRODUCTION
In recent years, with the strong support of national policies, photovoltaic capacity of China has grown rapidly in the short duration (Mohammadi and Mehraeen, 2017). At the end of 2019, the national photovoltaic power generation capacity reached 224.3 billion kWh, a year-on-year increase of 26.3%. The "Three Norths" area is affected by the large scale of local new energy installations and the limited consumption space. This area abandoned 87% of the country's photovoltaic power generation, and its light abandonment rate dropped by 2.3% year-on-year to 5.9% (Hashemi and Østergaard, 2017). In order to further reduce the national light abandonment rate, it is of great significance to study the large-scale photovoltaic output prediction method and its impact on the planning and operation of the power system (Bowen et al., 2016;Sun et al., 2017;Zhongkai et al., 2018).
New energy sources such as photovoltaics and wind power have volatility and randomness (Yang et al., 2016). How to improve the accuracy of new energy prediction has become a hotspot for both domestic and foreign scholars (Lorenz et al., 2009;Li et al., 2019). An improved vector autoregression model that can combine historical PV output power with weather monitoring data is proposed. The model generates PV output simulation series and applies them to short-term forecasts for small-scale smart grids Bessa et al., 2015). In (Ghislain et al., 2019), history data of adjacent power stations was applied to generate the probability density function of photovoltaic output based on quantile regression and Lasso penalty technology, which provides a theoretical basis for short-term photovoltaic output prediction. Cluster analysis methods are applied in the process of PV output modeling and reliability assessment. Based on the theory of cluster analysis, a multiscale time-series clustering model of solar irradiance intensity is developed from the perspective of data mining (Lin et al., 2018).
With the integration of large-scale new energy into the power grid, the regularity and volatility of new energy output have an increasing influence on the operating status of the system (Xia et al., 2016;Ye et al., 2019). Gaussian mixture models are applied to accurately establish source-load uncertainty models, due to the stochastic response of source-load interactions. An adaptive linearized semi-invariant method for probabilistic power flow calculation that considers the source-load strength randomness is proposed, which can effectively reduce the global linearization error of the power flow (Liu et al., 2019). Under the condition that the wind farm is connected to a hybrid AC-DC system, the optimal power flow problem of minimizing the total transmission loss is studied in the article. The internal point method (IPM) is applied when dealing with discrete variables as a way to improve the accuracy of the optimal solution (Cao et al., 2013). The traditional safety constrained economic dispatch model for provincial power grids is extended and applied to large power grids across provinces and regions. In order to solve the problem of optimal resource allocation over a large area, the model is further optimized to improve the solution efficiency. The economic dispatching problem of large-scale multi-region cogeneration units under different scenarios is studied in (Nazari-Heris et al., 2019), balancing the operation cost and pollutant emission targets, and the proposed method can reduce the cost of $1939534.08 per year. The impact of price uncertainty on active distribution network dispatching is studied in (Nazari-Heris et al., 2020), in which the multi-objective problem is solved by robust optimization algorithm and ε-constraints. In (Nazari-Heris et al., 2018), optimal stochastic scheduling of virtual power plant considering NaS battery storage and combined heat and power units is studied.
Due to the insufficient number of power plants modeled in the existing literature, PV output impact on the system at different moments and the spatially correlation of large-scale PV generation are not represented. Therefore, this paper proposes the optimal power flow calculation considering the correlation of large-scale photovoltaic power generation. First, in this paper, attenuation coefficients and fluctuations are obtained from PV output models and measured data, and the k-means method is used in the clustering analysis of PV output fluctuations in largescale power plants. The attenuation coefficient and fluctuation amount through the photovoltaic output model and the measured data, and use the k-means method to cluster analysis on the photovoltaic output fluctuation of large-scale power stations. Secondly, establish an LSTM prediction model that considers spatial correlation, obtain a large-scale photovoltaic output prediction curve, and establish an optimal power flow model that takes power system loss and voltage offset as targets. Finally, the proposed large-scale photovoltaic prediction method and optimal power flow calculation model were simulated in the modified IEEE39-bus system and IEEE118-bus system, verifying that the prediction method has higher accuracy and the daily regularity of photovoltaic output has a greater impact on the system.

Ideal Model of Photovoltaic Power Generation
The ideal output P t of the photovoltaic power station without considering the influence of the shading and temperature is Eq. 1 (Wang et al., 2019): Where P stc is the output of photovoltaic panels under standard conditions (solar radiation intensity I stc 1000 W/m 2 ; temperature T stc 298 K).
The total solar radiation intensity at a certain place on the earth at time t is Eq. 2: Where I t is the solar radiation intensity without attenuation. I b is the direct solar radiation. I d is the solar scattered radiation. I b is the main component of solar radiation. The direct solar radiation in a certain place can be expressed as (Yang et al., 2011): Where S is the solar constant, about 1366 W/m 2 ; N is the day sequence, which means the day number of the year; ρ is the solar incident angle, which is the difference between the solar zenith angle θ Z and the photovoltaic panel inclination β; τ b is the atmospheric transparency coefficient of direct solar radiation; M is the atmospheric mass, which is related to the altitude.
Where a is the altitude of the measuring place; P(a) is the atmospheric pressure of the measuring place; P 0 is the standard atmospheric pressure. α s is the local solar altitude angle, which is complementary to the solar zenith angle θ Z . Due to the action of air molecules and erosol particles, light radiation energy is redistributed in a certain law to form scattered radiation (Bunea et al., 2006;Dall Anese et al., 2014). According to the Berlage formula, the intensity of solar scattered radiation is Eqs 6-8 (Yang et al., 2011;Zhang et al., 2014): Where k is a parameter related to air quality. τ d is the atmospheric transparency coefficient of scattered radiation. Φ is the latitude of the area; δ is the solar declination angle; ω s is the solar hour angle. The solar declination angle changes with the seasons and is calculated by the following formula : Where N 1 92.975 is the number of days from the vernal equinox to the summer solstice; α 1 is the number of days from the vernal equinox; and so on, N 2 93.269, N 3 89.865, N 4 89.012. The time angle of the Sun is represented by ω s , sunrise is negative, sunset is positive, and it is 0°at noon, increasing by 15°e very hour according to the rotation of the earth. Meanwhile, the time difference has an impact on ω s . Beijing time is the time in the eastern eight time zone, and the longitude of the interval is 120°e ast. The formula for calculating the time angle ω s based on Beijing time in a certain area is Eq. 10: Where ψ is the local longitude; t is Beijing time.

Spatial Correlation Characteristics of Large-Scale Photovoltaic Output
Photovoltaic power generation is affected by a variety of practical conditions (Samadi et al., 2014;Lingfeng et al., 2017). Without considering the volatility, the theoretical output will be attenuated. The daily attenuation coefficient K i is used to characterize the attenuation of photovoltaic output, the expression is Eq. 11: Where K i is the attenuation coefficient on the i day; y i (u) and f i (u) are the measured photovoltaic power value and the theoretical photovoltaic power value of the u sampling point; n is the number of sampling points. Eq. 11 uses the least squares method to find the best fitting coefficient, so that the squared residual sum of attenuated theoretical output and the measured output is minimized, which is the optimization problem of matching the theoretical model with the measured data.
Natural phenomena such as cloud movement, floating dust occlusion, etc., cause fluctuations in photovoltaic output and produce fluctuation components. The difference between the actual photovoltaic output and the theoretical attenuation output represents the fluctuation component ΔP t of the photovoltaic output, and its expression is Eq. 12: The spatial correlation characteristics of photovoltaic output are affected by two spatial scales: large-scale weather and small-scale weather. Large-scale weather mainly affects the overall attenuation, while small-scale weather affects fluctuations. The similar fluctuations in photovoltaic output indicate that the geographical environment and weather conditions of the power station are similar, and the spatial correlation of photovoltaic output is high. Therefore, this paper clusters the fluctuations of each photovoltaic power station output in each power station group. By selecting the optimal cluster number, the final group result of the power station is obtained. The k-means method is used here to cluster the output fluctuations of multiple photovoltaic power stations, and the optimal cluster number of power stations is determined by the sum of squared error (SSE) within the group.

PHOTOVOLTAIC OUTPUT PREDICTION MODEL CONSIDERING SPATIAL CORRELATION
Taking into account the spatial correlation between power stations can make the photovoltaic output prediction model more comprehensive and reduce the photovoltaic output prediction error. Therefore, this paper considers the fluctuation of photovoltaic output to perform a cluster analysis of large-scale photovoltaic power stations, and obtains the spatial correlation characteristics between the power stations, and then predicts the photovoltaic output of multiple photovoltaic power stations at the same time.
Long short-term memory (LSTM) neural network is an improved deep learning algorithm based on recurrent neural network (RNN), especially for processing time series with seasonal periodic changes. LSTM neural network consists of input layer, hidden layer and output layer. Figure 1 shows the unfolded cyclic network.
The LSTM gating mechanism contains three gates: forget gate f t , input gate i t and output gate o t . In addition, the structure also contains an internal memoryC t . The calculation formula of each variable in the cyclic network is Eq. 13: Where W f , W i , W o and W c are the weight matrix; b f , b i , b o , b c are the bias parameters; σ are the activation functions that are usually rely or sigmoid functions; x t , C t , h t are the input layer state, control unit state, and hidden unit state at time t, respectively.
The structure of the LSTM model is the same as the cyclic neural network. It can be seen as multiple copies of the same neural network, and each neural network module will pass the message to the next module. Taking the history output data and solar radiation intensity data of the target photovoltaic power station as the input of the LSTM network model, the expression of the prediction model can be obtained Eq. 14: Where h t+1 is the predicted value of photovoltaic power of the target power station; h t , . . . , h t−n is the history data of photovoltaic output of the target power station; x t+1 , . . . , x t−n is the solar radiation intensity data of the target power station. Figure 2 is a block diagram of the photovoltaic output prediction model considering spatial correlation. There is a higher spatial correlation between the PV plants obtained by clustering with the above method. Taking the power stations in each type of power station group as the target power station in the prediction model, and using the history output data and solar radiation intensity data of each target photovoltaic power station as the input of the LSTM network model, a multi-dimensional photovoltaic output prediction sequence can be obtained at the same time. The prediction data from all PV clusters are summed to get the total predicted output of large-scale PV clusters in the province.

Optimal Power Flow Model
Based on the NSGA-II algorithm, this paper performs the optimal power flow calculation with the objective functions of active grid loss and voltage offset. The optimal power flow model consists of three parts: objective function, equality constraints and unequal constraints.

Objective Function
(1) Active power loss Where P loss is the total grid loss of the system, P k.loss is the active power loss of branch k; g k is the conductance of branch k; N L is the set of AC and DC branches, and N VSC is the set of nodes of the converter station.
(2) Voltage offset Where U i , U spec i , U max i , and U min i are the actual value, expected value, minimum value and maximum value of the node voltage respectively.

Equality Constraints
Where P Gi and Q Gi respectively represent active power and reactive power of generator output at node i, P Li and Q Li respectively represent the active power and reactive power of the load at node i, and U i and U j represent the voltage amplitude of nodes i and j respectively. θ ij represents the phase angle difference between node i and node j, G ij and B ij represent the conductance and susceptance between node i and node j respectively, and N B is the set of power system nodes.

Inequality Constraints
Where P Gimax , P Gimin , Q Gimax , Q Gimin represent the upper and lower limits of the active power of the generator at bus i, and the  upper and lower limits of output reactive power, respectively; T imin and T imax respectively represent the upper and lower limits of the adjustable ratio of the transformer at bus i, N T represents the number of transformers in the system, U imax and U imin respectively represent the upper and lower limits of bus i voltage, P limax and P limin respectively represent the upper and lower limits of the active power carried by the branch, N G represents the generator set, N l is the power line set, and N B is the power system set. Figure 3 shows the flow chart of the optimal power flow using NSGA-II to consider photovoltaic correlation. Specific steps are as follows:

Optimal Power Flow Solution
(1) Initialization parameters need to be set. The user has to set up the system grid structure, enter the PV output and solar irradiance data, k-means algorithm parameters and NSGA-II algorithm parameters.
(2) The paper uses the k-means clustering method to analyze the fluctuation of PV output, and then substitutes the history PV output data and other information of the power plants into the LSTM model, so as to obtain the predicted output of large-scale PV power plant clusters in the province. (3) We then replace the photovoltaic output into the optimal power flow calculation model and carries on the optimal power flow calculation.

Large-Scale PV Forecast Output
This paper uses the actual data of a large-scale centralized photovoltaic power station in a province as an example, intercepting the photovoltaic output from May to July 2018 for simulation. The data sampling interval is 15 min, and each photovoltaic power station contains 5,152 output data. The k-means method is used to cluster the attenuation coefficients and fluctuations of multiple photovoltaic power stations, and the optimal cluster number of the power station group is determined by the sum of squared errors (SSE) within the group. Taking province A as an example, cluster analysis of large-scale power stations in the province was carried out on the first to seventh of June when the weather fluctuates frequently. Taking June 1 as an example, the result of the optimal number of clusters in a power plant group through SSE is shown in Figure 4. It can be seen that the blue curve has an obvious inflection point when the number of clusters is set to 6. After that, as the number of clusters increases, the trend of the curve is relatively flat, which proves that the optimal number of clusters in the power plant group is 6. Table 1 shows the optimal clustering number of photovoltaic power plants in the province from June 1 to 7. This paper selects power stations that always belong to the same cluster set as typical power stations under different weather conditions, and defines power stations that do not belong to the same set as variable power stations. In order to improve the forecast accuracy of PV output in the province, the percentage of variable power plants should be lower. Through the statistics of clustering number and variable power plant ratio data, the variable power plant ratio curve under different clusters number can be plotted as shown in Figure 5. By observing the curve for the clustering number from 5 to 7, it is obvious that the minimum value of the variable power plant ratio is obtained when the clustering number is 6. Therefore, considering different weather conditions, the optimal classification number of power stations in the province is 6. Table 2 shows the number of power stations corresponding to the six types of cluster power stations in Province A. Number of power stations 7 9 12 8 14 5 FIGURE 6 | Prediction curve of day-ahead and intra-day photovoltaic output and actual measurement curve of A province. The cross-validation method was applied to the history data of six kinds of photovoltaic power plant groups, and the history data was trained and tested according to the ratio of 9:1 (4637:515) of the training set to the test set. Set the time step of the model input layer to 10 and the number of hidden layers to 2, The dimension of the first layer of the hidden layer is 15, and the dimension of the second layer of the hidden layer is 30. The model simulation results in the forecast curve of the province on August 1 as shown in Figure 6. Based on this model, carry out 1-4 steps, which is 15 min-1 h ultra-short-term rolling forecast. It can be seen that the large-scale photovoltaic forecasting method considering the correlation has higher accuracy.
The benefits of considering photovoltaic correlation is further verified. Table 3 lists the comparison of forecast results with and without considering correlation. It can be seen that the mean value, standard deviation and maximum of the prediction are all closer to the actual value of photovoltaic, compared with the forecast results without considering correlation, which further demonstrate that the large-scale photovoltaic forecast method considering the correlation has higher accuracy.

Optimal Power Flow Calculation
In order to analyze the impact of large-scale photovoltaic on the system, simulation verification was carried out in the modified IEEE 39-bus system and IEEE 118-bus system. The optimal power flow mode is implemented on a personal computer with Intel Core i7 CPU(2.20 GHz) and 16.00 GB RAM.
The power grid connection of modified IEEE 39-bus system is shown in Figure 7. The province's intraday photovoltaic output forecasts at different times will replace the traditional generator in 38th bus. The simulation parameters are taken from manpower 4.1.
The NSGA-II algorithm is used to perform multi-objective optimal power flow calculations for systems containing large-  scale PV output, and an optimal set of Pareto solutions is obtained for each moment. In NSGA-II algorithm process, the maximum iteration number is set as 200, the population size is set to be 100, and the distribution indices for crossover and mutation operators are 20. The computation time of the proposed optimal power flow model is 10 min. Figure 8 shows the Pareto solution set at 12:00, and the optimal solution set is uniformly distributed in the objective function space. It is verified that the NSGA-II algorithm has strong validity and applicability, can coordinate the active power loss and voltage offset of the system, and has a strong global search capability. Table 4 shows the comparison of the optimal power flow calculation results at different times. As the 38th bus is connected to photovoltaic output, the PV output gradually increases. Therefore, the traditional generators are avoided to supply power to the remote load, which reduces the transmission distance and the grid loss in the system. At the same time, the improvement in power supply reduces the variations of the optimization objectives, and the system operating state tends to stabilize. In addition, the grid loss and voltage offset are in a mutually restrictive relationship, and the operator can select the system operating state according to actual needs.
In order to further validate the effectiveness of the proposed method in a large-scale system, case studies are performed on IEEE118 system. The province's intraday photovoltaic output forecasts at different times will replace the traditional generator in 90th bus. The computation time of the proposed optimal power flow model is 17 min. Figure 9 shows the Pareto solution set at 12: 00, and Table 5 shows the comparison of the optimal power flow calculation results of IEEE118 system at different times. These results further demonstrate the NSGA-II algorithm have a better performance in solving multi-objective problems, and the effectiveness of the proposed method is also validated.

CONCLUSION
This paper proposes an optimal power flow calculation method considering the correlation of large-scale photovoltaic output, and the following conclusions are obtained: (1) The k-means clustering algorithm is used to obtain the spatial correlation between large-scale PV plants. The accuracy of the PV output prediction model is improved by using the LSTM network model for multidimensional PV output prediction.
(2) The impact of regular variations in PV output on the system is obtained through optimal power flow analysis. Traditional generators are required to adjust their output according to the PV conditions to meet the grid-wide load. As the PV output increases, the transmission path and the grid loss decrease, but the voltage offset increases. (3) The photovoltaic forecast method proposed in this paper is conductive for dispatch center to obtain more realistic photovoltaic output, and then the optimal dispatching is utilized to balance the economy and security of the power system.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
HL and HJL made important contributions to the concept, idea, topic selection, design and data acquisition, analysis or interpretation of the research work; WL and JB wrote papers or modified their key contents; ZW comprehensively reviewed and checked the final published papers.

FUNDING
This research was financially supported by the Key technology projects of State Grid Inner Mongolia East Electric Power Co., Ltd. (SGMDTL00YWJS2000669).