Wind Speed Distributions Used in Wind Energy Assessment: A Review

With economic development and population growth, energy demand has shown an upward trend. Renewable energy is inexhaustible and causes little pollution, which has broad prospects for development. In recent years, wind energy has been developed as an essential renewable energy source. The use of wind power is very environmentally friendly and plays a critical role in economic growth. Assessing the characteristics and potential of wind energy is the first step in the effective development of wind energy. The wind speed distribution at a specific location determines the available wind energy. This paper reviews the wind speed distribution models used for wind energy assessment, and they are applicable to different wind regimes. All potential wind speed distribution models should be considered for modeling wind speed data at a particular site. Previous studies have selected several parameter estimation methods and evaluation criteria to estimate model parameters and evaluate the goodness-of-fit. This paper discusses their advantages and disadvantages. The characteristics of wind speed distribution are constantly varying geographically and temporally. Wind energy assessment should consider local geographical elements, such as local climate, topography, and thermal properties difference between the land and the sea, and focus on long-term variations in wind characteristics.


INTRODUCTION
Energy is a necessary material basis for the survival and development of human society. In recent years, the energy crisis has become a worldwide problem faced by humankind. With the continuous growth of the population and the rapid development of the global economy, energy demand is also increasing. Traditional fossil fuels (such as coal, oil, natural gas, etc.) have been widely used in almost all areas of daily life. However, fossil fuel reserves are limited. Fossil fuels gradually formed in nature over millions of years may be entirely exhausted by humans within a few hundred years (Baz et al., 2021). In addition, the carbon in fossil fuels is transformed into carbon dioxide, which increases the concentration of carbon dioxide in the atmosphere, leading to aggravation of the greenhouse effect, changing the global climate, and breaking the ecological balance (Arenas-López and Badaoui, 2020). As people gradually realize the importance and urgency of environmental protection, energy conservation, emission reduction, and sustainable development, renewable energy has become the primary energy source required for future social development and has broad potential demands. Wind energy is one of the potential renewable energy sources that can be used for commercial purposes. Many countries take wind power development to improve the energy structure and protect the ecological environment. Wind energy also plays a vital role in economic growth, creating more employment opportunities, and promoting the development of science and technology (Liu et al., 2019;Kandpal and Dhingra, 2021). Wind energy has recently attracted widespread attention in the power generation industry (Jansen et al., 2020). Global wind-generated power has increased significantly from 2012 to 2017 ( Figure 1) (World Bank, 2017).
At present, to meet the increasing electricity demand, researchers have focused on improving the efficiency of wind power generation (Chen and Blaabjerg, 2009). Assessing the characteristics and potential of wind energy is the first step in the effective development of wind energy. Wind energy resource assessment is an essential part of the feasibility analysis of wind farm projects. Whether the assessment is reasonable directly impacts the cost of power generation and economic benefits (Mauritzen, 2020). The wind speed distribution at a specific location determines the available wind energy and the performance of the energy conversion system. Therefore, to reduce the uncertainty of wind energy output estimation, it is necessary to accurately understand the distribution characteristics of wind speed (Celik, 2003). Two methods are generally used to determine the wind speed distribution: 1) the time series method (Morales et al., 2010;Katikas et al., 2021) and 2) the statistical analysis method (Ouarda et al., 2016;Elie Bertrand et al., 2020). The result obtained by the time series method may be more accurate because it is based on the original wind speed data. However, with an enormous amount of data, the processing process of the time series method is more complicated. When there is a lack of long-term wind speed data, it is more feasible to use statistical analysis to explain the behavior and characteristics of historical wind speed data and estimate the wind energy output. The statistical analysis method uses limited parameters to characterize wind speed distribution, which is efficient and straightforward. The process of statistical analysis to assess the potential of wind energy is shown in Figure 2. It is crucial to determine the most suitable probability density function (pdf) for historical wind speed data (Alavi et al., 2016a). The most widely used pdf in wind speed modeling is the Weibull distribution (Petković et al., 2014;Wais, 2017a;Sarkar et al., 2019). In recent years, researchers have considered several pdfs to find the most accurate model for the wind energy assessment of a particular site (Lo Brano et al., 2011;Ouarda et al., 2015;Jia et al., 2020;Wang et al., 2021). After determining the pdf, the value of each parameter in the function should be estimated. The commonly used parameter estimation methods mainly include the maximum likelihood method (Miao et al., 2019), the least squares method (Jung and Schindler, 2017), the moment method (Li and Miao, 2021), and the power density method (Akdağ and Dinler, 2009). Previous studies used different goodness-of-fit criteria to evaluate the accuracy of the model. Commonly used criteria include the coefficient of determination (ul Haq et al., 2020), the root mean square error (Guarienti et al., 2020), the Kolmogorov-Smirnov test statistic (Ouarda and Charron, 2018), the Anderson-Darling test statistic (Soukissian, 2013), and the chi-square test statistic .
Although the research of wind speed distribution has pursued the unified modeling of existing wind regimes, previous studies have not proposed a model that can provide a sufficient description at any site. Correspondingly, different parameter estimation methods and goodness-of-fit criteria have advantages and disadvantages, which bring difficulties to wind energy potential assessment. This paper reviews wind speed models, parameter estimation methods, and goodness-of-fit criteria and briefly discusses the advantages and disadvantages. The aim is to analyze the reasons for the differences in the fitting

WIND SPEED DISTRIBUTION MODELS
The kinetic energy contained in the airflow is converted into electrical energy by the wind turbines. The wind speed presents a positively skewed distribution with statistical characteristics. However, wind power has randomness, volatility, and intermittent characteristics, making the output power of wind farms fluctuate considerably. Several probability distribution models have been widely used in wind farm analysis, planning, design, construction, and operation. Wind speed distribution models can be roughly divided into two classes: parametric distribution models and nonparametric distribution models.

Parametric Distribution Models
The theoretical average output can be calculated as follows (Masseran, 2015): Where P is the theoretical average wind energy output, v is the observed wind speed, and f(v) is the pdf of wind speed.
The observed average output can be calculated as follows： Where P o is the observed average wind energy output, v is the observed mean wind speed. Energy prevision bias is the percentage error between the theoretical average output and the observed average output: The estimation of wind power mainly depends on the f(v). Therefore, selecting an appropriate pdf can provide more accurate energy potential results and reduce energy prevision bias.

Weibull Distribution
The Weibull distribution was promoted by the Swedish physicist Weibull (Weibull, 1951), and it has been used in various fields, such as physics, materials science, geography, medicine, economics, etc. The pdf of the two-parameter Weibull distribution is: where α is the scale parameter, which controls the abscissa scale of the data distribution; k is the shape parameter of the Weibull distribution, which determines the width of the data distribution. The two-parameter Weibull distribution has a simple form, high flexibility, and efficient computing parameters, making it the most popular and famous wind speed distribution model. The Weibull distribution has significant advantages, especially in areas dominated by temperate depressions (Harris and Cook, 2014). For example, Bilir et al. (2015) used a two-parameter Weibull distribution to evaluate wind energy resources near Ankara, the capital of Turkey. Shu et al. (2015) used a two-parameter Weibull model to characterize the wind speed distribution in Hong Kong. In addition, the Weibull distribution has also been applied to the estimation of the performance of the automatic wind power generation system (Celik, 2006), the simulation and prediction of the wind speed time series (Kaplan and Temiz, 2017), the wind turbine failure analysis (Jin et al., 2021), etc. Nevertheless, the two-parameter Weibull distribution is not suitable for all wind regimes in nature.
Previous studies have shown that the two-parameter Weibull distribution is less effective in fitting low wind speeds, especially for wind speed data with considerable null wind probability (Akgül et al., 2016). The two-parameter Weibull distribution cannot determine the actual null wind probability, so null wind speed data needs to be removed before fitting, making it impossible to characterize the existing wind regimes. A three-parameter Weibull distribution is proposed for wind energy evaluation as an alternative probability distribution for the two-parameter Weibull distribution (Montoya et al., 2019): where μ is the location parameter, which represents the minimum wind speed. The three-parameter Weibull distribution can estimate the probability of null wind speed and give greater weight to low wind speeds. Wais (2017b) showed that the accuracy of the two-parameter Weibull distribution is higher in estimating wind power output only when the null wind speed is insignificant. Otherwise, the relative error of the three-parameter Weibull distribution is minor. However, adding a location parameter may lead to the negative values of the pdf, which requires artificial control of the lower limit of the three-parameter Weibull distribution to avoid negative wind speed. In addition, compared with the two-parameter Weibull distribution, the three-parameter Weibull distribution cannot achieve the same high value at high wind speeds. Since wind power has a cubic relationship with the wind speed, although the three-parameter Weibull function fits the wind speed better at some sites, it does not necessarily mean that it estimates the output more accurately. When the shape parameter of the two-parameter Weibull distribution is equal to 2, the Rayleigh distribution is formed: Rayleigh distribution is used to model wind speed and evaluate the standard performance of wind turbines (Saleh et al., 2012;Valencia Ochoa et al., 2019). Compared with the Weibull distribution, because the Rayleigh distribution contains only one parameter, it is more convenient to use, and its parameters are easier to estimate. However, the Rayleigh distribution is based on the assumption that the long-term mean wind vector is zero. The vector of wind prevailing at sea, such as trade wind, deviates significantly from zero, making the applicability of the Rayleigh distribution to sea winds relatively limited (Perrin et al., 2006).

Extreme Value Distribution
Previous studies have shown that extreme wind speeds have almost no effect on the parameter values of the Weibull distribution . Once the wind speed exceeds the threshold, the Weibull model is no longer applicable. There are obvious errors in the maximum annual wind speed distribution and the return period estimated by the Weibull model, which affects the risk assessment of extreme winds (Perrin et al., 2006). A practical alternative is to use extreme value distributions.
Extreme value distributions used for wind speed modeling include the Gumbel distribution, the inverse Weibull distribution, and the generalized extreme value distribution. The pdf of the Gumbel distribution is as follows: November 2021 | Volume 9 | Article 769920 where α is the scale parameter, and μ is the location parameter. The pdf of the inverse Weibull distribution is as follows: where α is the scale parameter, and k is the shape parameter. The pdf of the generalized extreme value distribution is shown as follows: where α is the scale parameter, k is the shape parameter, and μ is the location parameter. Gumbel distribution and inverse Weibull distribution are subsets of the generalized extreme value distribution. The extreme value distribution is a heavy-tailed distribution whose right tail is thicker than the Weibull distribution. Therefore, the extreme value distribution is more accurate when estimating the occurrence probability of extreme wind speeds. However, due to the lack of historical extreme wind speed data, the traditional extreme value distribution can only estimate the annual maximum wind speed distribution and does not consider the monthly or daily extreme winds, thus reducing the reliability of quantifying extreme events (Torrielli et al., 2013). In addition, studies generally believe that there is a threshold wind speed, and the extreme value distribution is not suitable for modeling below the threshold wind speed. However, the estimation results of threshold wind speed are pretty different, so the applicable range of extreme value distribution is unclear, making it difficult to determine the most suitable type of extreme value distribution (Kang et al., 2015).

Gamma Distribution
Gamma distribution is also one of the distributions widely used in wind speed distribution modeling (Aries et al., 2018). The twoparameter gamma distribution is also called the Pearson Type III distribution: where α is the scale parameter, and k is the shape parameter.
If the logarithm of a random variable conforms to the gamma distribution, the random variable follows the Log Pearson type III distribution: where α is the scale parameter, k is the shape parameter, and μ is the location parameter. The random variable that obeys the Gamma distribution takes the reciprocal to get the inverse Gamma distribution, also known as the Pearson type IV distribution (Masseran, 2015): where α is the scale parameter, and k is the shape parameter. The above distributions are subsets of the generalized Gamma distribution. The generalized Gamma distribution is one of the earliest probability distribution models applied to wind speed modeling (Guedes et al., 2020), and its pdf is shown as follows: where α is the scale parameter, k and h are shape parameters, and μ is the location parameter. The generalized Gamma distribution adds a shape parameter, which significantly improves the flexibility of the model. When the value of h is equal to 1, it is transformed into a Weibull distribution. When the value of h is equal to 2, it is transformed into a generalized normal distribution. When the value of k is equal to 1, it is transformed into a Gamma distribution. When the value of h tends to infinity, it is transformed into a lognormal distribution (Alavi et al., 2016b): where α is the standard deviation of the logarithm of the random variable, and μ is the mean of the logarithm of the random variable. Kiss and Jánosi (2008) proposed that the fitting effect of the generalized Gamma distribution is significantly better than that of the Weibull distribution, especially for high wind speeds, and is suitable for regions with different underlying surfaces and climatic conditions in Europe. Sarkar et al. (2017) proposed that the fitting effect of the Gamma distribution and the Weibull distribution are close, and the Gamma distribution can be used as an alternative model for the Weibull distribution in low ranges. However, the analytical expression of the wind power density function derived from the Gamma distribution is complicated, and the analytical expressions of the mean, variance, skewness, and kurtosis of the wind power density function cannot be determined, which affects the efficiency of wind energy potential assessment (Samal, 2021). Therefore, to simplify the calculation process, the Gamma distribution cannot be the first choice for the wind speed distribution model.

Multi-Parameter Probability Distribution
Several probability distribution models containing more than three parameters have been proposed to characterize the wind speed distribution to pursue higher fitting accuracy. Commonly used multi-parameter probability distributions include the fourparameter Burr distribution, Johnson SB distribution, Kappa distribution, and the five-parameter Wakeby distribution. The pdf of Burr distribution is as follows: where α is the scale parameter, k and h are shape parameters, and μ is the location parameter. The pdf of Johnson SB distribution is shown as follows: where α is the scale parameter, k and h are shape parameters, and μ is the location parameter. The pdf of Kappa distribution is shown as follows: where α is the scale parameter, k and h are the shape parameters, and μ is the location parameter. F(x) is the cumulative distribution function (cdf) of the Kappa distribution. The pdf of Wakeby distribution is as follows: where α and k are scale parameters, c and h are shape parameters, and μ is the location parameter.
Lo Brano et al. (2011) proposed that the Burr distribution provides high fitting accuracy for wind speed data in southern Italy. The results of Jung and Schindler (2017) showed that Wakeby distribution and Kappa distribution are suitable choices for onshore and offshore wind speed distribution models, respectively. Soukissian (2013) proposed that Johnson SB fits the wind speed data measured in the Mediterranean Sea well. Because the multi-parameter probability distribution contains more parameters, it is more flexible, more adaptable, and has higher fitting accuracy, which reduces the error of wind energy estimation. However, compared with the two-parameter or three-parameter probability distribution models, its complexity is also greatly improved. Therefore, if the multiparameter probability distribution model does not significantly improve the estimation accuracy, it is not recommended to prioritize this type of model in most cases.

Mixture Distributions
The above models are all single wind speed distributions. The single distribution cannot describe complex wind regimes, especially the wind speed distribution with bimodal or multimodal characteristics (Santos et al., 2021). Therefore, previous studies tend to use mixture distributions to assess the wind energy potential under complex wind regimes. The pdf of the mixture distributions are as follows: α is the number of mixtures, ω i is the weight of each single distribution model, and f i (x) is the pdf of different single distributions.
Generally, for special wind regimes, the fitting effect of mixture distributions is better than that of single wind speed distributions. Carta and Ramírez (2007) used a two-component mixture Weibull distribution to describe the wind regimes at weather stations in Spain. Ouarda et al. (2015) proposed that mixed distributions, such as mixed Weibull and Gamma distributions, fit bimodal wind speed regimes better than single distributions. However, there are still some difficulties in using mixture distributions appropriately. Firstly, it is difficult to determine which one or several single distribution models are used to construct a mixed distribution. The fitting effects of mixed distributions constructed from different single wind speed distributions are significantly different . Secondly, the optimal number of single distributions to form a mixture distribution cannot be calculated, and the researcher often subjectively determines it (Ouarda and Charron, 2018). In addition, the parameter estimation process of the mixture distribution is more complicated, which leads to overparameterization. Therefore, the use of mixture distributions has limitations.

Nonparametric Distribution Model
Although parametric distributions have certain advantages in wind speed modeling, choosing a qualified distribution is still challenging. The theoretical probability distribution model may not describe the actual wind regimes, and the estimated parameter values may not pass statistical tests (Xu et al., 2015). The nonparametric model does not need to make any assumptions about the theoretical distribution of wind speed, nor does it need to estimate the parameters of any distribution. Its parameters can be automatically learned from the historical data (Qin et al., 2011). Commonly used nonparametric models include kernel density estimation (KDE) and maximum entropy principle (MEP).

Kernel Density Estimation
KDE can get the pdf from the sample data: where n is the number of samples, h is the bandwidth, K(α) is the kernel function, and α is the relative difference between the estimated value and the sample value: where x is the observed value, and x i is the estimated value. There are several kernel functions used to construct KDE functions, among which the most widely used is the Gaussian kernel function:  (Han et al., 2018). KDE model needs to select an appropriate bandwidth.
Otherwise, there will be over-fitting or under-fitting, which will significantly affect the estimated value. Although several methods for selecting bandwidth, determining the best bandwidth is still challenging (Tenreiro, 2011).

Maximum Entropy Principle
The content of MEP theory is that under some constraints, the distribution model should reach the maximum remaining uncertainty (i.e., maximum entropy): where S is the Shannon's entropy. f(x) is the pdf of wind speed, a is the minimum of wind speed, and b is the maximum of wind speed. The constraints are as follows: where m n is the n-th order statistical moment: where N is the number of samples. Through Lagrangian multiplier method, the expression of pdf can be obtained: where β 0 , β 1 , . . . , β T is the Lagrange multipliers, and T is the maximum of n. MEP has strong flexibility and can describe complex wind regimes with a large proportion of null wind and a bimodal distribution (Zhou et al., 2010). In some cases, MEP can more accurately characterize the wind speed distribution than parametric distributions, and the estimation error of wind power density is minor (Chellali et al., 2012). However, MEP has limitations in some cases, such as the difficulty of selecting constraint conditions (Zhang et al., 2014).

PARAMETER ESTIMATION METHOD
Particular parameters define all single distribution models and mixed models. Determining the best value of the parameter has a significant impact on the effect of models in fitting wind speed data. Previous studies have used different methods to estimate the parameters of the wind speed distribution models, but calculating the best value of the parameters is still a challenging task.

Maximum Likelihood Method
MLM needs first to construct a likelihood function or a loglikelihood function and then seek the parameter value that makes the function reach the maximum value. MLM estimates the parameter values of the probability distribution through numerical iteration, and the most commonly used method is Newton's method (Tosunoğlu, 2018). The maximum likelihood estimator is asymptotically unbiased, consistent, and asymptotically effective, and it can reach the minimum variance. In addition, MLM can estimate sample variance to construct confidence intervals and perform hypothesis testing, suitable for analyzing wind speed data in the form of time series (Ramírez and Carta, 2005). However, MLM is more sensitive to the initial value. When the upper and lower limits of the parameters are unknown, its effect is poor (Flynn, 2006). The numerical iterative method requires a large amount of calculation, especially when estimating the parameters of a multi-parameter model or a mixture model, the efficiency of MLM is low (Seo et al., 2019). Some studies have applied improved MLM, called the alternative maximum likelihood method (AMLM), to modeling wind speed distribution (Akdağ and Güler, 2015). AMLM is based on the idea of linearizing the nonlinear term in the likelihood equation through Taylor series expansion and derives the parameter estimator in a non-iterative manner. MLM and AMLM are often used to estimate the shape parameter and scale parameter of the Weibull distribution. Previous studies have shown that the accuracy of the two methods will show differences due to the different geographical locations of wind speed observations (Chaurasiya et al., 2018).

Least Squares Method
LSM is known as the graphic method. The parameter value estimated by LSM minimizes the sum of squares of the deviations between the empirical cdf and the cdf of the model where φ is a vector containing the parameters of the pdf. Divide the observed values into N intervals: V max is the maximum value in each interval, P i is the empirical cdf, and F is the cdf of the model: Some studies have proposed that the accuracy and robustness of LSM are lower than other parameter estimation methods, which may be related to the error in the definition of the cdf (Chang, 2011). Deep et al. (2020) used the modified LSM for parameter estimation, and the results show that its effect is not inferior to other methods. LSM needs to linearize the objective function, and the logarithmic transformation is the basis of LSM. The cdf of the Weibull and Rayleigh distributions contains the exponential term, which is easy to perform the logarithmic transformation, so it is more suitable to use LSM to estimate its parameters. However, linearization of Lognormal and Gamma distribution is more complicated, making LSM unsuitable for these two models (Alrashidi et al., 2020).

The Method of Moments
The first step of MOM is generally to calculate the first four moments of a random variable: x 1 n n i 1 where x, s, g, w are the mean, standard deviation, skewness, and kurtosis of the wind speed series respectively; n is the length of the wind series; x i is the wind speed of the i-th time step. The second step of MOM is to make the sample moments equal to the population moments. MOM is simpler and more accurate than other parameter estimation methods, especially for multiparameter distributions such as Johnson SB distribution and Wakeby distribution (Liu et al., 2015). But MOM estimator is skewed and unstable and cannot reach the minimum variance (Carta et al., 2009).
Some studies have proposed the linear moment estimation method (LMOM) for parameter estimation based on MOM. LMOM needs to calculate the probability weighted moments: where b r is the r-th order probability weighted moment (r > 0). Then the first five linear moments can be calculated: LMOM estimator is unbiased, it is more stable for samples with abnormal data, and better results will be obtained with a smaller sample size (Soukissian and Tsalis, 2018). However, LMOM, like MOM, cannot fully utilize all the valid information in the sample. When other parameter estimation methods are difficult to calculate parameters, MOM and LMOM are effective alternative methods. Akdağ and Dinler (2009) proposed a new method for calculating the shape and scale parameters of the Weibull distribution. The relationship between the scale parameter α of the Weibull distribution and the average wind speed V is:

Power Density Method
From Eqs 1, 28: where V 3 is the average value of the wind speed cube, V 3 ( V) 3 is recorded as the energy pattern factor (E p ), and the value of E p at a particular site is generally regarded as a fixed value, ranging from 1.4 to 4.4. Therefore, the shape parameter k of the Weibull distribution can be calculated by: As long as the average wind speed value is obtained, PDM is applicable and does not require complete historical wind speed data. PDM does not need to calculate complicated iterative equations, so the numerical solution of the parameters can be obtained easily. In addition, the wind power density can be directly estimated by this method. The results of Shu et al. (2015) showed that the accuracy of PDM is not worse than other traditional parameter estimation methods. However, PDM is currently only suitable for estimating the parameters of the Weibull distribution. Other probability distribution models may not be able to derive the power density expression from Eq. 1, or the expression may be too complicated to perform the next step. Therefore, the scope of the application of PDM is minimal.

GOODNESS-OF-FIT CRITERIA
After the distribution model and the parameter values are determined, it is necessary to evaluate the goodness-of-fit of the model. The goodness-of-fit criteria reflect how well the selected model fits the wind speed data so that the applicability of the model to the sample can be evaluated. Choosing different goodness-of-fit criteria may lead to different results.

Coefficient of Determination (R 2 )
R 2 is one of the most widely used goodness-of-fit criteria. It is a metric for estimating the consistency between the distribution model and the observed data. R 2 is expressed as the square of the correlation coefficient between the observed value and the estimated value: Frontiers in Energy Research | www.frontiersin.org November 2021 | Volume 9 | Article 769920 y i is the i-th observed data, y 1 n n i 1 y i , x i is the i-th estimated value, and n is the number of samples.
There are several commonly used variants of R 2 . The first is R 2 PP , which refers to R 2 associated with P-P plot: where F i is the empirical cumulative probability of the measured data in the i-th wind speed interval, F 1 n n i 1 F i ,F i is the estimated cumulative probability of the i-th wind speed interval.
The second variant is R 2 PP , which refers to R 2 associated with the Q-Q plot: where p i is the probability of the measured data in the i-th wind speed interval,p i is the estimated probability of the i-th wind speed interval,p i F −1 (F i ), p 1 n n i 1 p i . The third variation is the adjusted coefficient of determination (R 2 a ): where N is the number of samples, and d is the number of parameters in the probability distribution model. The closer the value of R 2 and its variants to 1, the better the fitting effect of the model. Among them, R 2 has the simplest structure and is easier to use. However, R 2 cannot fully reflect the fitting effect of the theoretical distribution, and it is not appropriate to use R 2 alone to evaluate the model (Hossain et al., 2014). The definition of R 2 PP is related to the cdf, and the gradient of the cdf reaches the maximum when the random variable takes the middle value, so the middle part of the distribution has a more significant influence on the value of R 2 PP . R 2 QQ is more sensitive to the maximum gradient of the inverse cumulative distribution function, which corresponds to the tail of the distribution. The plotting position of the Weibull distribution provides an unbiased estimate of the cumulative probability, so R 2 PP is usually preferred to evaluate the fitting effect of the Weibull distribution (Akdağ et al., 2010). R 2 a provides penalties for multi-parameter models and avoids overfitting. When the gap between N and d is large, R 2 a ≈ R 2 , indicating that R 2 a is not suitable for wind speed data with a large sample size. Dividing the wind speed data into a histogram can significantly improve the applicability of R 2 a .

Root Mean Square Error (RMSE)
RMSE determines the accuracy of the model through the itemby-item comparison between the observed probability and the estimated probability. RMSE usually has two forms, RMSE PP which is similar to R 2 PP and RMSE QQ which is similar to R 2 QQ : The closer the RMSE value is to 0, the better the fitting effect of the selected model. RMSE is usually used together with R 2 as a criterion for evaluating the goodness-of-fit of the model. However, RMSE is more sensitive to abnormal data. If a specific wind speed data deviates greatly from the expected value, the error caused by it is more prominent. It is not appropriate to use RMSE to evaluate the goodness of the model in fitting short-term wind speed data. In addition, unlike R 2 , the value of RMSE is related to the selected probability distribution model, so RMSE cannot be used to compare the goodness-of-fit of different models to the same set of wind speed data.

Kolmogorov-Smirnov Test and Anderson-Darling Test
The KS and AD tests are used to determine whether a given probability distribution model is suitable for a set of wind speed observation data. They are also used to compare the goodness-offit of different models to the same set of data. Both the KS and AD tests compare the empirical cumulative probability with the estimated cumulative probability distribution. Specifically, the KS test calculates the maximum difference between the two: The KS test is an accurate nonparametric test suitable for a continuous distribution. It is more sensitive to the middle part of the distribution.
AD test is improved based on KS test: The AD test is related to the weight function and is more sensitive to the tail of the distribution. Both the KS and AD tests have a critical value. If these two test statistics are lower than the critical value, the assumed distribution is accepted. The critical value of the KS test is independent of the selected model, and the critical value of the AD test varies with the model, so the AD test is more accurate (Saeed et al., 2021).

Chi-Square Test (χ2)
χ2 can verify whether the measured wind speed data frequency is consistent with the frequency obtained from the assumed model. It is often used to compare the goodness-of-fit of different models. The chi-square test needs to divide the observation data into several groups, and the frequency of each group is expressed in the form of histograms. Then calculate the test statistics: where, O i is the observed frequency of the i-th group, E i is the estimated frequency of the i-th group, calculated by the following equation: v i and v i−1 are the upper and lower bounds of the wind speed of the i-th group, respectively. If the value of χ2 is greater than the critical value, the assumed model is rejected. Similar to the AD test, the critical value of the chi-square test depends on the selected probability distribution model. Since the frequency histogram is used, the chi-square test is less affected by a single observation. However, the chi-square test results show significant differences under different class intervals, and the selection of class intervals is usually subjectively determined, limiting its applicability (Mert and KarakuŞ, 2015).

DISCUSSION
The fitting effect of the wind speed distribution model usually varies according to the location of different sites (Ouarda et al., 2016). This section uses a case study to compare the fitting effects of the two-parameter Weibull distribution and the threeparameter Weibull distribution at different sites. Two sites are selected in this study. Shengsi is located in the southeast of the Yangtze River estuary and is an island on the East China Sea.
Xianyang is a city located in the Guanzhong Basin, surrounded by mountains (Figure 3). Table 1 introduces the altitude, geographic coordinates, period of record, maximum wind speed, and the probability of null wind speed of the two sites. The wind speed data are obtained from the Integrated Surface Database of the National Centers for Environmental Information (www.ncei.noaa.gov/maps/hourly/). The resolution of wind speed data is 1 m/s, and it is recorded every three hours. Previous studies have shown that the wind direction has almost no influence on the wind energy output, and the output of each direction is not distinguished (Kiss and Jánosi, 2008). Therefore, this case study uses wind speed data aggregated from all directions.
The two-parameter Weibull distribution and the threeparameter Weibull distribution are used to fit the wind speed observation data of the two sites, respectively. To ensure that the fitting effects are comparable, the Least Squares Method is used to estimate the parameter values, and R 2 is used as the goodness-offit criteria. Figure 4 shows the fitting curves of the two distributions. In Shengsi, the fitting curves of the two distributions basically coincide. However, in Xianyang, the three-parameter Weibull distribution obtains a more accurate estimation of the probability of null wind speed by adjusting the value of the location parameter μ, while the two-parameter Weibull distribution underestimates the low wind speed probability. By comparing the value of R 2 ( Table 2), it can be concluded that the two distributions have basically the same fitting effect on the wind speed data of Shengsi, but the fitting effect of the three-parameter Weibull distribution is better than that of the two-parameter Weibull distribution in Xianyang.
Considering that the two-parameter Weibull distribution has more flexibility and its parameters are easier to estimate, it is recommended to use the two-parameter Weibull distribution to evaluate the wind energy in Shengsi. However, for Xianyang, where low wind speeds account for a large proportion, it is more suitable to use the three-parameter Weibull distribution to fit wind speed data. The wind speed distribution characteristics of different sites determine the applicability of the models. Shengsi is located in the coastal area of eastern China, where there is a thermal difference between land and sea, coupled with the influence of the East Asian monsoon and tropical storms, resulting in strong winds throughout the year. Xianyang is located in the inland area, affected by the continental climate, causing a large proportion of null wind. In addition, the topography also affects the characteristics of wind speed distribution (Kim and Lim, 2017). The mountains around Xianyang block the airflow movement, while the terrain of Shengsi is flat, and the airflow is less obstructed.
For a specific site, the wind speed distribution may have different characteristics over time. Xiao et al. (2021) proposed that the wind speed in the Badain Jaran Desert in China reached the highest in April, followed by November, which was much higher than the wind speed in other months. Usta and Kantar (2012) proposed that the statistical characteristics of monthly, seasonal, and yearly wind speed data, such as mean, variance, skewness, and kurtosis, are significantly different. Therefore, the optimal wind speed distribution model under different time scales of wind speed data may be different. Some studies have conducted wind energy assessments on longer time scales. Gao et al. (2018) proposed that climate warming has led to a gradual decline in the long-term wind energy potential of the Indian Ocean, which will affect the economic income of wind farms. Shu et al. (2015) proposed that long-term temperature variations will lead to variations in wind characteristics, which will change the energy output. However, on the one hand, the trend of climate change is very complex and has not been fully understood, making it difficult to predict long-term variations in wind characteristics. On the other hand, the location of wind farms is often determined only by wind speed observation data within 1-2 years or even less (Chen and Blaabjerg, 2009). If the wind  energy potential of the area decreases year by year due to rising temperatures, the output of wind farms may not meet expectations. Therefore, wind energy assessment should consider the impact of long-term climate change rather than focusing only on the current output of the site.

CONCLUSION
Previous studies have proposed a variety of probability distribution models for the modeling of wind speed distribution. In different sites, the models have shown particular applicability, but a specific model has not been found to have the best fitting effect on the wind speed data recorded at all sites. In addition, the selection of parameter estimation methods and the setting of goodness-of-fit test statistics are not unified. They have their advantages and disadvantages and are suitable for different observation data and wind speed distribution models. Therefore, all potential wind speed distribution models should be considered for selecting the wind speed distribution model for a particular site. According to the chosen model and the characteristics of the wind speed observation data, the parameter estimation method and the goodness-of-fit test statistics are determined.
Wind characteristics determine the wind energy output of a wind farm. The characteristics of wind speed distribution are constantly varying geographically and temporally. The wind characteristics at different locations are affected by various geographical elements, such as local climate, topography, and thermal properties difference between the land and the sea. The variations in these geographical elements make the applicability of a model in different regions may change significantly. The wind speed distribution in different months, seasons, and years may have different characteristics for a specific site. Long-term variations in wind characteristics are a topic of great concern because they will lead to gradual variations in wind energy potential and affect the economic benefits of wind farms. We suggest that wind energy assessment consider the long-term variations in local wind characteristics instead of just focusing on the current energy output.

AUTHOR CONTRIBUTIONS
SH contributed to conceptualization. DZ contributed to funding acquisition. XN contributed to methodology. HQ contributed to formal analysis. All authors contributed to the article and approved the submitted version.