PV Cell Parameter Extraction Using Data Prediction–Based Meta-Heuristic Algorithm via Extreme Learning Machine

To reliably evaluate the practical performance and to undertake optimal control of PV systems, a precise PV cell parameter extraction–based accurate modeling of PV cells is extremely crucial. However, its inherent high nonlinear and multimodal characteristics usually hinder conventional optimization methods to obtain a fast and satisfactory performance. Besides, insufficient current–voltage (I–V) data provided by manufacturers cannot guarantee high accuracy and flexibility of PV cell parameter extraction under various operation scenarios. Hence, this article proposes a novel parameter extraction strategy by data prediction–based meta-heuristic algorithm (DPMhA). An extreme learning machine (ELM) is adopted to predict output I–V data from measured data, which can provide a more reliable fitness function to meta-heuristic algorithms (MhAs). Consequently, MhAs can undertake a more stable search for optimal solution through extended I–V data; thus, PV cell parameters can be obtained with high accuracy and convergence rate. Its effectiveness is validated via three typical PV cell models, that is, single diode model (SDM), double diode model (DDM), and three diode model (TDM). Last, comprehensive case studies illustrate that the DPMhA can considerably enhance the accuracy and effectiveness compared with those without data prediction.


INTRODUCTION
With severe environmental deterioration (Peng et al., 2020), fossil fuel depletion , severe air pollution (Sun et al., 2020), and increasing concerns on global energy crisis over the past decade , energy reform and sustainable development have become essential for an environment-friendly society . Hence, environmental protection and energy conservation are paramount for all countries around the world (Shen et al., 2019), in which exploitation and utilization of various renewable energy technologies, for example, wind Zhang et al., 2019) and solar (Liu et al., 2020), have been broadly focused. In particular, solar energy is considered to be one of the most sustainable and viable energy sources of the future (Chaibi et al., 2019), such that photovoltaic (PV) system is widely used for solar energy applications thanks to its distinct merits, for example, abundant resources, low cost, and pollution-free (Yang et al., 2016).
Particularly, measured current-voltage (I-V) data-based reliable PV modeling is extremely critical to dynamic behavior analysis of PV systems. Thus far, a series of PV cell modeling methods have been proposed (Jordehi, 2016) to characterize the output of PV systems for better performance analysis and prediction (Youssef et al., 2017), maximum power point tracking (MPPT) , and fault diagnosis . Two equivalent circuit models, single diode model (SDM) (Nunes et al., 2018) and double diode model (DDM) (Abbassi et al., 2018), are extensively adopted for the sake of simplicity, while more complicated triple diode model (TDM) (Qais et al., 2019) is barely investigated because it might increase computation burden. Nevertheless, as the TDM allows a more efficient and accurate analysis of the complex output characteristics of PV systems, this article validates the performance of all three PV models mentioned above. In particular, the accurate extraction of several electrical parameters related to the model is the most basic and critical step of reliable modeling. However, they are unavailable and changeable as insufficient electrical parameters provided by manufacturers and are only experimentally obtained under standard test conditions (STCs) (Xiong et al., 2018). Besides, their values are also time-varying due to degradation and faults of PV cells, which further increases modeling uncertainties. Hence, the aforementioned two shortcomings render parameter extraction thorny to obtain satisfactory results in practical applications.
To tackle such obstacles, the design of numerous methods consists of three main categories, namely, analytical methods (Majdoul et al., 2015;Torabi et al., 2017), deterministic techniques, and meta-heuristic algorithms (MhAs). Analytical methods are based on a series of mathematical calculations and a number of key points on the I-V curve, which owns a high degree of simplicity, but lacks high accuracy in different operating scenarios. Meanwhile, deterministic techniques, including iterative curve fitting (Villalva et al., 2009) and Newtonianbased methods (Li et al., 2017), can yield more accurate results, while they are extremely demanding in terms of model properties. Moreover, they are highly sensitive to initial operation conditions; thus, the inherent high nonlinearity and multimodality of PV systems always make them easy to be trapped at a local optimum. Hence, the limitations of the aforementioned two methods hinder them to maintain a stable and satisfactory performance on PV cell parameter extraction. In contrast, MhAs can effectively avoid these shortcomings thanks to their outstanding merits, such as high flexibility on problem characteristics (Nesmachnow, 2014), easy implementation (Roeva and Fidanova, 2018), and insensitivity to gradient information (Figueroa et al., 2020). Until now, a large number of MhAs are adopted in PV cell parameter extraction , such as genetic algorithm (GA) (Jervase et al., 2001), differential evolution (DE) , particle swarm optimization (PSO) (Ye et al., 2009), artificial bee colony (ABC) (Oliva et al., 2014), whale optimization algorithm (WOA) (Amroune et al., 2019;Dasu et al., 2019), backtracking search algorithm (BSA) , moth flame optimizer (MFO) (Allam et al., 2016), grey wolf optimization (GWO) (Yang et al., 2017), bird mating optimizer (BMO) (Askarzadeh and Rezazadeh, 2013), water cycle algorithm (WCA) (Kler et al., 2017),wind-driven optimization (WDO) (Derick et al., 2017), fireworks algorithm (FWA) (Babu et al., 2016), and various hybrids.
Except for the improvements focusing on mechanism and structure of algorithms to enhance the optimization performance, one should realize that all modeling techniques heavily rely on the number and accuracy of measured data. However, in order to improve the simulation accuracy, some parameters in the PV system are not provided by the manufacturer and therefore need to be extracted based on the I-V curve. Such lowdimensional data provided by the manufacturer, while saving computational resources, may also cause important data information to be lost during the simulation. Hence, it is imperative to develop effective data processing methods to enrich data samples. Recently, artificial neural network (ANN) (Grondin-Perez et al., 2014;Zhao et al., 2019) and deep learning strategy (Schmidhuber, 2015) show their great effectiveness in data analysis and prediction. This article adopts a learning algorithm called extreme learning machine (ELM) with a single-hidden layer feedforward neural network (SLFN) for output I-V data prediction (Huang et al., 2006), which can provide a more reliable fitness function to MhAs. The main contributions of this article can be summarized as follows: • The existing MhAs are directly adopted for PV cell parameter extraction via measured I-V data of PV systems, which are easy to be trapped at a low-quality optimum if measured I-V data are inadequate or distributed intensively. In contrast, the proposed data prediction-based MhA (DPMhA) can effectively solve this difficulty since measured I-V data can be extended via ELM-based data prediction.
a3 Vt • Several advanced MhAs with data prediction are applied for parameter extraction of PV cells, which are thoroughly validated via three different kinds of PV models. • Four case studies show that DPMhA can achieve simulation results with higher precision and stability compared with those only based on measured data.
The rest of this article is organized as follows: PV Cell Modeling and Problem Formulation illustrates mathematical modeling of PV cell and objective function. The overall introduction of data prediction-based MhAs is elaborated in Methodologies. Case study results on different PV models are provided in Case Studies. Last, conclusions are given in Conclusion.

PV CELL MODELING AND PROBLEM FORMULATION
Shockley diode-based equivalent circuits are always deemed as standard PV models, among which three most widely used models, that is, SDM, DDM, and TDM are discussed in this section.

Design of Mathematical Modeling
In general, there are minor differences in the model structures of the three models mentioned above, which are also systematically summarized in Supplementary Table S1 for a more detailed presentation.  As demonstrated in Supplementary Table S1, I L and V L represent the output current and output voltage of the PV cell respectively; I sh refers to the current of the parallel resistor R sh ; and the thermal voltage V T is calculated by where T represents the surface temperature; K is the Boltzmann constant, which has a value of 1.38 × 10 − 23 J/K; and q means the electron charge, respectively, with a size of 1.6 × 10 − 19 C. All the other variables are provided in Nomenclature.

Objective Function
The main objective of parameter identification for various PV models is to find suitable parameters so that the model can more accurately describe the output characteristics of the PV system and minimize the errors between the experimentally collected data and the simulation data, which can be evaluated quantitatively by means of an objective function. The root mean square error (RMSE) is chosen here as the objective function, which can be calculated as where x represents the solution vector of the unknown parameters to be identified and N denotes the number of experimental data, respectively. Error functions f (V L , I L , x) for three PV models are tabulated in Table 1.
From Table 1, for the sake of minimizing the difference between experimental data and simulated data, objective function RMSE (x) needs to be minimized by optimizing solution vector x. Note that the objective function value is inversely proportional to the solution quality.

METHODOLOGIES Data Prediction by Extreme Learning Machine
Principle of ELM ELM is a simple learning strategy for SLFNs that mainly depends on generalized inverse matrix theory (Huang and Siew, 2005), which randomly initializes the input weight with no adjustment requirement in subsequent operations (Huang et al., 2000;Huang et al., 2006). Besides, output weight is analytically determined by generalized inverse, which only requires a one-step calculation. Hence, compared with other normal feedforward network learning strategies, for example, back-propagation (BP) algorithm, the ELM can significantly enhance robustness, generalization ability, learning speed, and training accuracy. The main operation structure of ELM is demonstrated in Figure 1.
For N different training samples (x i , y i ) ∈ R n × R m , i 1, 2, 3 . . ., N, network output of standard SLFNs with K hidden neurons and an activation function g(ω i · x i + b i ) is calculated by (Huang et al., 2006) where ω i represents the connecting weight vector between the ith hidden layer neuron and input neuron; b i denotes threshold of the ith hidden layer of the network; β i means the connection weight between the ith hidden layer neuron and output neuron, respectively.
Since SLFNs can approximate training samples with zero errors for any ω and b when the number of neurons in the hidden layer equals the number of training data samples (Huang et al., 2006), for example, , and β i that can satisfy the relationship, as follows: Hence, the two equations above are rewritten more simply, as follows: Hβ T, where H represents the hidden layer output matrix; T denotes the expected output matrix; and β is determined by least square approach, as follows: When the hidden layer output matrix is column full rank, it yields where H † denotes Moore Penrose generalized inverse of the output matrix of hidden layer H.

Output I-V Data Prediction
For a PV cell, output current and voltage can be measured for parameter extraction, while the increase of measured data will lead to a high extraction accuracy. To employ ELM for output I-V data prediction, PV cell voltage and current are regarded as the input and output of ELM, respectively. Therefore, output I-V data prediction of a PV cell can be achieved by ELM with single input and single output, as described by Eqs. 3-9.

Data Prediction-Based Fitness Function
Based on prediction data, MhAs can implement a balance between global and local search with an updated fitness function. Due to all optimization, variables are limited within their lower and upper bounds; RMSE (2) is taken as the fitness function, which should also take the prediction data into account, as follows: where N p means the total count of the prediction data.

General Execution Procedure
The overall operation framework of DPMhA mainly consists of three parts, as illustrated in Figure 2. First, measured output I-V data of various PV cells are transferred to the ELM. Second, the ELM is trained by measured data to predict new data; thus, a more reliable fitness function can be established to evaluate the performance of various MhAs. Finally, MhAs implement relevant global exploration and local exploitation to find optimal PV parameters. Particularly, the detailed execution procedure of DPMhA is given in Table 2, in which the main differences between various algorithms are individual roles and searching mechanisms of global exploration and local exploitation.

CASE STUDIES
In this section, several well-established MhAs are utilized for parameter extraction of three PV cell models. In particular, a total of 26 pairs of measured I-V datasets utilized for simulation are acquired from Easwarakhanthan et al. (1986), which are measured on a 57 mm diameter commercial silicon R.T.C. France solar cell under weather condition (G 1000 W/ m 2 and T 33°C). This dataset has been widely used to test the techniques developed for parameter extraction. Many existing studies (Ye et al., 2009)-  were tested based on these 26 pairs of measured data. To guarantee a fair comparison with them, the proposed GNN is also implemented based on these 26 pairs of measured data. For the sake of verifying the optimization performance of MhAs based on insufficient measurement data, six sets of data were selected from 26 sets of measurement data in different ratios of 50, 60, 70, 80, 90, and 100% of the measurement data. The presented ELM is essentially a simple single-input and single-output network. Therefore, the pairs of measured data are adequate for training the ELM. To provide a reliable fitness function to MhAs, the total number of each dataset and the prediction data are set to be 50, for example, 24 prediction data for 100% dataset. In addition, each MhA is evaluated under two circumstances, that is, without data prediction (i.e., with only selected measured data) and with data prediction. Note that the measured I-V  Frontiers in Energy Research | www.frontiersin.org June 2021 | Volume 9 | Article 693252 6 data are not hard to be acquired. But it is difficult to acquire adequate high-accuracy data with a general measuring instrument. Hence, the measured I-V data cannot completely represent the I-V output feature of the PV cell. Besides, the added prediction I-V data can provide a reliable fitness function; thus, the optimization accuracy and convergence stability of each meta-heuristic algorithm can be improved.
The main parameters of meta-heuristic algorithms are the population size and the maximum number of iterations. To guarantee a fair comparison, these two parameters are set to be the same values for all MhAs under each PV model that is designed to be identical, in which their specific parameters (e.g., the maximum velocity in PSO) are set to be the default values. In order to fairly compare the performance of each algorithm, the maximum number of iterations and the population size of each algorithm were set to be the same. Specifically, the maximum number of iterations for each model was 300, while the  Results on SDM Table 3 shows statistical results of the average RMSE obtained by each algorithm with six measured datasets, where symbol "Y" represents MhAs are applied with data prediction and "N" represents the condition that without data prediction. This shows the average RMSE obtained by each MhA with data prediction is significantly smaller than that with only measured data, especially under 50% measured data. For instance, the average RMSE obtained by PSO with data prediction is 82.32% smaller than that without data prediction under 50% measured data, which verifies such data prediction strategy can significantly enhance searching efficiency and optimization accuracy.  Besides, the average RMSE obtained by each algorithm with six inadequate measured datasets and six prediction datasets is demonstrated in Figure 3, which indicates prediction data-based PV parameter extraction for SDM can obtain higher accuracy and stability.
Besides, Figure 6 provides convergence of each algorithm under different training data. It can be seen that WOA tends to converge prematurely at the initial stage, and PSO has difficulty in obtaining high-quality optimal solutions with 50% of the training data. Besides, most algorithms can hardly achieve stable and efficient convergence due to inadequate data, along with unsatisfactory convergence accuracy. In contrast, the increase  of training data helps them to gradually find high-quality solutions with higher convergence stability and searching efficiency as 100% data-based algorithms can better balance local exploitation and global exploration.

Results on DDM
The average RMSE of the DDM obtained over 100 runs by the different MhAs with six measurement datasets is shown in Table 4, which demonstrates that increased prediction data generated by the ELM can effectively improve calculation accuracy. For example, the average RMSE obtained by the GWO algorithm with data prediction is 48.19% lower than that without data prediction under 50% measured data scenario. This illustrates that the data prediction-based metaalgorithm can achieve highly increased stability in finding quality solution through increasing the amount of dataset with a desired solution accuracy; thus, such novel strategy can output desirable results when accuracy and reliability are both taken into consideration in the DDM. Figure 7 compares the average RMSE obtained by each algorithm under six inadequately measured datasets with that obtained under six prediction datasets in the DDM. One can easily find that each algorithm can find the global optimum more easily when experimental data are expanded by prediction data, upon which optimal values of those unknown parameters can be determined in a more accurate and stable way.
Boxplot of various algorithms is shown in Figure 8, upon which one can easily find that algorithms under 100% training data have fewer outliers and smaller upper/lower bounds in RMSE compared with that under 50% data. This indicates that increase of training data from data-based prediction can effectively enhance the quality of solution and stabilize global searching ability in PV cell parameter extraction. Figure 9 plot I-V and P-V curves acquired based on the best data prediction-based MhA (WCA) and actual data under 50% training data and 100% training data, respectively. This indicates that the model curves obtained by WCA are highly consistent with the actual data.
Moreover, Figure 10 provides convergence of all algorithms with data prediction, which illustrates that convergence speed of WOA is low and GWO tends to fall into a local optimum with 50% of the training data. On the contrary, convergence under 100% training data verifies that they can more stably find a better solution, especially for GWO. Table 5 presents the average RMSE obtained by each algorithm with six different measured datasets, which reveals that data prediction-based algorithms still outperform those based on measured data. For example, the average RMSE obtained by PSO with data prediction is 61.06% lower than that without data prediction under 80% measured data. Hence, data prediction-based algorithms own the most satisfactory performance in terms of the accuracy of the solution for the TDM. Figure 11 shows the comparison of the average RMSE of each algorithm under six inadequate measured datasets with that obtained under six prediction datasets in the TDM. It shows that the average RMSE obtained by WOA decreases by about 25% via data prediction, which verifies the effectiveness of data prediction in solution quality improvement. Figure 12 presents the boxplot of different MhAs, while Figure 13 show the best algorithm (WCA) and the I-V and P-V curves obtained with the actual data under different datasets, verifying the accuracy of the extracted PV cell parameters. Figure 12 clearly shows that the increase in the volume of data can significantly reduce the RMSE outliers achieved by each algorithm, while lowering the upper/lower limits. Hence, solution precision and stability can be greatly enhanced by increasing the amount of experimental data.

Results on TDM
Last, Figure 14 provides convergence of all algorithms with data prediction, which shows that 100% training data-based algorithms can achieve a proper trade-off between local exploitation and global exploration, while 50%-based algorithms are easy to be trapped at a local optimum.

Statistical Results and Analysis
The radars of the average RMSE obtained by each MhA with six groups of data at different scales are provided in Supplementary Figure S1, Supplementary Figure S2, and Supplementary Figure  S3, where symbol "+" represents MhAs with data prediction, which provides a more explicit illustration of stability of each algorithm with data prediction for PV cell parameter extraction in each model. One can see that the average RMSE obtained by each algorithm with data prediction is smaller compared with that obtained without data prediction at different scales of data. This effectively verifies the outstanding reliability of DPMhA for PV cell parameter extraction.

CONCLUSION
This article proposes a novel PV cell parameter extraction strategy is developed for three different PV cell models. The main three contributions/innovations in this article can be summarized as follows: • ELM-based data prediction allows MhAs to perform a more stable search for optimal solution for identification of PV cell parameter with inadequate measured output I-V data. • Three different types of PV cell models are adopted to reliably verify the practical enhancements and general feasibility of the strategy of DPMhA for PV cell parameter extraction. • Case studies demonstrate that DPMhAs can considerably improve the accuracy, robustness, and convergence rate of PV cell parameter extraction compared with those only based on original measured output I-V data.
Case studies show that ELM-based MhAs can effectively improve optimization accuracy and convergence stability compared with original MhAs utilizing untrained measured I-V data. Although the convergence stability of MhAs can be improved by extending the data through the ELM, the optimization accuracy of some MhAs still needs to be improved in some conditions due to the error between the generated artificial data and the real measurement data. Hence, the next work will aim to handle this issue by introducing some advanced neural networks with excellent generalization.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
BL was responsible for the design and revision of the thesis proposal, HC was responsible for writing the introduction and model presentation, and TT was responsible for writing the simulation experiments and simulation analysis.

FUNDING
This work was jointly supported by the Research and Development Start-Up Foundation of Shantou University (NTF19016).