Functional extreme learning machine

Introduction Extreme learning machine (ELM) is a training algorithm for single hidden layer feedforward neural network (SLFN), which converges much faster than traditional methods and yields promising performance. However, the ELM also has some shortcomings, such as structure selection, overfitting and low generalization performance. Methods This article a new functional neuron (FN) model is proposed, we takes functional neurons as the basic unit, and uses functional equation solving theory to guide the modeling process of FELM, a new functional extreme learning machine (FELM) model theory is proposed. Results The FELM implements learning by adjusting the coefficients of the basis function in neurons. At the same time, a simple, iterative-free and high-precision fast parameter learning algorithm is proposed. Discussion The standard data sets UCI and StatLib are selected for regression problems, and compared with the ELM, support vector machine (SVM) and other algorithms, the experimental results show that the FELM achieves better performance.


Introduction
Artificial neural network (ANN) is a parallel computing system which simulates human brain activity by using widely interconnected neuron structure and certain learning rules. Because of its strong self-learning, associative memory, adaptive and fault-tolerant ability, it can easily detect the complex nonlinear relationship between the dependent variable and the independent variables, and has large-scale parallel computing ability. Therefore, it has become a popular and useful model for classification, clustering, pattern recognition and prediction in many disciplines, and is a powerful tool to solve problems that cannot be solved by many traditional methods (Abiodun et al., 2018).
Artificial neural network (ANN) has gone through four stages of development, and hundreds of models have been established so far. It has achieved great success in applied research fields such as handwriting recognition (Baldominos et al., 2018), image annotation (Afridi et al., 2018) and speech recognition (Gautam and Sharma, 2019), et al. However, most ANNs are only simple simulation of biological networks, so they often appear inadequate in dealing with big data and complex tasks, and cannot be satisfactory in both processing speed and calculation accuracy. Among the hundreds of neural network models, traditional training algorithms are usually gradient-based, such as back-propagation neural network (BP) (Werbos, 1974). BP algorithm has been widely used in many fields because of its easy understanding and implementation. However, that gradient-based algorithm is easy to converge to the local minimum and cannot obtain the global optimal solution, because the solution it obtains is sensitive to the initial parameters and depends on the complexity of the feature space, and the iterative learning of the BP algorithm makes the convergence speed too slow. In recent years, Huang et al. proposed for the first time a single hidden layer feedforward neural network learning algorithm called Extreme Learning Machine (ELM) (Huang et al., 2006), which breaks through the commonly used feedforward neural network learning theories and methods. Compared with support vector machine (SVM) (Cortes and Vapnik, 1995), ELM tends to achieve higher classification accuracy with lower computational complexity (Li et al., 2019). Since ELM has the advantages of high learning accuracy, easy to use, easy to implement, and fast learning speed, it has been applied widely in ELM self-encoder (Yimin Yang and Jonathan, 2018), handwriting recognition (Tang et al., 2016), regression and classification (Huang et al., 2012), big data analysis (Sun et al., 2017), and many improved algorithms of ELM (Zong et al., 2013;Geng et al., 2017;Sattar et al., 2019;Gong et al., 2021;Kardani et al., 2021) have also emerged to deal with specific problems. Studies have shown that ELM, especially in some applications, has the advantages of simple structure, short training times, and high calculation accuracy compared with popular deep learning, and the obtained solution is the only optimal solution, which ensures the generalization performance of the network.
The extreme learning machine (ELM) theory has attracted extensive attention by scholars all over the world since it was proposed (Huang et al., 2012;Tang et al., 2016;Sun et al., 2017;Yimin Yang and Jonathan, 2018), and a lot of achievements have been made in its theoretical and applied research. Kärkkäinen (2019) proposed an ELM which conducts ridge regression using a distance-based, the experimental results show that the over-learning with the distance-based basis is avoided in the classification problem. Atiquzzaman and Kandasamy (2018) successfully used ELM for hydrological flow series prediction. Golestaneh et al. (2018) presented a fuzzy wavelet ELM, and its performance is better than ELM. Yaseen et al. (2019) used the enhanced extreme learning machine for river flow forecasting. Pacheco et al. (2018) used restricted Boltzmann machine to determine the input weights of ELM, which greatly optimized the performance of ELM. Christou et al. (2018) proposed a hybrid ELM method for neural networks, which is applied to a series of regression and classification problems. Murli et al. (2018) applied extreme learning machine to microgrid protection under wind speed intermittency. Artem and Stefan (2017) applied ELM to the credit evaluation of user credit cards, indicating that it is a valuable alternative to other credit risk modeling methods. Henríquez and Ruz (2019) used ELM to reduce the noise of near-infrared spectroscopy data, which was successfully applied. Mohammed et al. (2018) proposed an improved ELM based on competitive group optimization and applied it to medical diagnosis. Lima et al. (2017) proposed a variable complexity online sequential ELM, which was successfully used for streamflow prediction. Paolo and Roberto (2017)   Topological structure of general FELM.
Frontiers in Computational Neuroscience 02 frontiersin.org applied ELM to inverse reactor kinetics, and the experimental results show that ELM application has great potential. Ozgur and Meysam (2018) compared the performance of wavelet ELM and wavelet neural networks. Vikas and Balaji (2020) proposed a PIELM and successfully applied it to solve partial differential equations; Peter and Israel (2020) proposed a new morphological/linear perceptron ELM and implemented fast classification problems. So far, the ELM has been widely used in industry, agriculture, military, medicine and other fields. Although the research on ELM has made a lot of achievements, from the classification of achievements, there are many application achievements and few theoretical achievements, which greatly limit the application scope of ELM. In particular, there are still the following shortcomings in the ELM theory: (1) The weights randomly determined by hidden layer neurons have a great impact on the classification performance of the network, and the number of hidden layer neurons cannot be calculated by an effective algorithm. Although some researchers have proposed some optimization algorithms about ELM, these algorithms transform the steps to determine the number of hidden layer neurons into optimization problems, which are cumbersome and time-consuming. (2) In the learning and training of ELM, the regularization coefficient plays an important role, which requires people to manually determine the size before classification and recognition. However, there is no effective parameter selection method at present. In most cases, people use trial and error method to select the size of the regular coefficient. (3) Because the ELM has the defect of randomly giving the left weight and the hidden layer threshold, the regression model is prone to have low generalization performance and poor stability, which is crucial to classification problems.
Aiming at the shortcomings of the above ELM theory, this article takes the functional neuron (FN) model (Castillo, 1998;Guo et al., 2019) as the basic unit, intends to use the functional equation solving theory to guide the modeling process of extreme learning machine, and proposes a new type of functional extreme learning machine (FELM) theory. The functional neurons of the learning machine are not fixed, and General FELM learning process diagram.
Frontiers in Computational Neuroscience 03 frontiersin.org they are usually linear combinations of linearly independent base functions. In FELM, network learning can be achieved by adjusting the coefficients of base functions in neurons. For the parameters (coefficients) selection method, one simple, iterative-free and highprecision parameter fast learning algorithm is proposed. Finally, through simulation experiments on the regression problems of real standard test data sets, compared with the traditional extreme learning machine (ELM) and support vector regression (SVR), the approximation ability, parameter learning speed, generalization performance and stability of the proposed FELM are experimentally tested. The rest of the article is organized as follows: in Section "2. Functional extreme learning machine (FELM)", we describe the FELM modeling theory, parameter learning algorithm and the feasibility analysis of the modeling theory in detail. Section "3. Experimental results and analysis" conducts regression experiments to evaluate the performance of the proposed technology. Finally, we summarize and future work in Section "4. Conclusions and future works". Initial structure of FELM for predicting disease d = D (x, y, z). The FELM structure equivalent to

Functional extreme learning machine (FELM)
Taking the FN model as the basic unit, a kind of learning machine with better performance is designed based on the functional equation solving theory, which is called Functional Extreme Learning Machine (FELM). The model is different from the traditional extreme learning machine. The type and quantity of hidden layer activation function in the structure of FELM are not fixed and can be adjusted. Figure 1A is the functional neuron model, and Figure 1B is the M-P neuron model. Comparing Figure 1A and Figure 1B, it can be seen that compared with artificial neurons, functional neurons lack the weight information on the connection line and can have multiple outputs. The output of functional neurons is: (1) Functional neuron f can be a linear combination of arbitrary nonlinear correlation base functions: (2) where {ϕ j (x 1 , x 2 , . . . , x k ) | j = 1,2,...,n} is any given base functions, which can be learned. According to specific problems and data, different functions can be selected, such as trigonometric basis function and Fourier basis function; {a j |j = 1, 2, ..., n} is a parameter set, which can also be learned. It can be used to infinitely approximate the expected accuracy of functional neuron function f (x 1 , x 2 , ..., x k ).

FELM model
With functional neuron as the basic unit ( Figure 1A), the definition of general FELM is established: Definition 1: Any FELM is a binary ordered pair: FELM =< X, U >, where X is a node set, U = {< Y i , F i , Z i > |i = 1, 2, ..., n} is a functional neuron set on the node set X and satisfies: For any node X i ∈ X, it is an input node or an output node and at least one functional neuron belongs to the node set U.
According to the definition of FELM, the components of general FELM include: (1) Several layers of storage units: one layer input unit; one layer output unit; some intermediate units are used to store the information generated by functional neurons in the intermediate layer; all are represented by solid circles with corresponding names (i.e., {x i , y i , z i , ...} in Figure 2). (2) One layer or several layers of processing units: A processing unit is a functional neuron, which handles a set of input values from the previous layer of functional neurons or input units, and provides a set of input data for the next layer of functional neurons or output units. (i.e., {f i , g i , ...} in Figure 2). (3) Directional connection lines: it connects the storage unit and the processing unit, and the arrow indicates the flow direction of information.
All these elements together constitute the structure of the FELM, and determine the generalization ability of the FELM.
We based on the above definition and components of FELM, it is easy to design a general FELM. The network topology is shown in Figure 2: The output expression of FELM in Figure 2 is: z 1 = g 1 y 1 , y 2 , . . . , y l , z 2 = g 2 y 1 , y 2 , . . . , y l , . . . z m = g m y 1 , y 2 , . . . , y l , . . .
The Figure 2 shows the general FELM model, and the output expression (3) of the network is essentially a functional equation group. In turn, any functional equations can draw the corresponding functional learning machine. Therefore, it is concluded that any FELM establishes a one-to-one correspondence with the functional equation (group). Based on the correspondence between FELM and functional equation (group), the functional equation theory is used to guide the modeling process of FELM. The steps are as follows: Step 1. Based on the characteristics of the undetermined problem and the definition of FELM, the initial FELM model for solving the undetermined problem is established.
Step 2. Obtain the output expression of the initial FELM; this expression corresponds to a functional equation group; Step 3. Using the method of solving functional equations, the general solution expression is given.
Step 4. Based on the general solution expression of functional equations, the corresponding FELM is redrawn by using its one-to-one correspondence with FELM.
In this way, according to the above modeling steps of FELM, any type of FELM can be drawn, and the model establishes one-toone correspondence with functional equation (group). Moreover, Multi-input single-output single-hidden layer FELM.

Algorithms
Train (Zhang et al., 2009a) 5.  the functional equations are used to simplify and obtain any optimal FELM. The theoretical basis of the definition is based on the mathematical model of "binary ordered pairs" in discrete mathematics. Its physical meaning is similar to the layout structure of a printed circuit board (PCB). In the practical application of FELM modeling theory, based on the characteristics and data of the problem to be solved, the FELM (initial structure) of any problem to be solved can be obtained according to the above definition of general FELM and the theoretical guidance of modeling process by solving functional equation. Based on the definition and constituent elements of FELM, any type of FELM can be drawn, and one-to-one correspondence with functional equation (group) can be established. Therefore, using functional equation solving theory to guide the design process of FELM is supported by mathematical theory, which is correct and easy to operate. The unique structure of FELM fundamentally overcomes the shortcomings of the current extreme learning machine that the weights randomly determined by hidden layer neurons have a great impact on the classification performance of the network, and the number of hidden layer neurons cannot be obtained by an effective algorithm.

FELM learning algorithm
The FELM is based on the problem-driven modeling, without weight and threshold concepts. Its learning essence is to learn the network structure and parameters. Aiming at the parameter (coefficient) selection method, based on the parameter error cost function evaluation criterion, a simple, no iteration and high precision fast parameter learning algorithm is designed by using the theory of linear equations. The learning process of FELM is shown in Figure 3.
The learning process of FELM in Figure 5 is illustrated by a specific example: Set a disease d with three basic symptoms: x: fever, y: dry cough; z: fatigue. How to build a FELM to implement its prediction so that: d = D (x, y, z).
(1) Determine the initial structure of FELM. According to the knowledge and information (known data, prior knowledge of the problem and some characteristics of the function, etc.) of the problem, the initial structure of FELM is designed. In the process of diagnosing a disease with three characteristics, the order of symptoms asked by doctors is different, and three cases of the initial structure are shown as in Figures 4A-C: (2) Simplifying the initial structure of FELM. Since each initial network structure corresponds to a functional equation group, the FELMs equivalent to the initial network structure is found by using the characteristics of the solution of the functional equation, and the simple and optimal FELM equivalent to the initial network structure is selected.
The above examples in Figure 4 are essentially independent of the diagnostic order, which meet functional equation.
The general solution of Eq. 4 is: The FELM equivalent to functional Eq. 5 is shown in Figure 5.
In this way, the design network can be simplified by using the solution theory of functional equations, and the equivalent, a simple and optimal FELM can be obtained.
(3) Uniqueness of output expression of functional extreme machine. Before FELM learning, ensure the uniqueness of output expression. It is proved theoretically that for a given FELM, under the same initial conditions, the FELM has the same output value for any input value.

FIGURE 8
The prediction on f 1 (x).

Frontiers in Computational Neuroscience
The above example is still used to prove the equivalence of FELMs in Figures 4, 5. It is assumed that there are two functional neuron function sets: {k 1 , p 1 , q 1 , r 1 } and {k 2 , p 2 , q 2 , r 2 }, so that For any variable x, y, z, let , the solution of the functional equation is: (7) Such uniqueness is proved. The approximation error [(A-D) 10 −2 ∼10 −5 ] of FELM.
(4) Parameter learning algorithm design for FELM. In the general FELM in Figure 4, a multi-input single-output single-hidden layer FELM is selected as an example. Its network structure is shown in Figure 6: Let input X = [x 1 , x 2 , ..., x k ] and output y; each neuron function in the hidden layer f i , i = 1, 2, ..., p is a linear combination of any nonlinear correlation base functions, that is, where m is the number of functional neuron base functions. For the convenience of matrix representation. Let Then Eq. 8 can be written: when the function of output neuron has inverse function, it can also be expressed as a linear combination of base functions: The output of the FELM in Figure 6 is: Let the sample data be (X r , y r )and the error cost is: where s ri = X r ω i . If there are n groups of sample data, the sum of squares of error of the FELM model is: By changing the value of the base function coefficient a ij , E is minimized.
If f p+1 is a reversible function, the error sum of the FELM model can be expressed as: where s r,p+1 = y r .

Parameter learning algorithm
The optimal value of parameter coefficient a ij can be obtained by solving (15): = 2 p+1 i=1 m j=1 n r=1 ϕ ts (s rt ) ϕ ij (s ri ) a ij = 0.
In Eq. 18, the vector P is the desired parameter coefficients, but it will not be unique. In order to solve this problem, initial constraint conditions need to be given. Suppose the given constraints are as follows.
where s 0 = X 0 ω i , i = 1,2,..., P and s 0 = y 0 , i = P + 1; β i is any real constant. Therefore, by using the Lagrange multipliers technique, the following auxiliary function can be established.
The minimum model error sum of squares corresponds to.

FELM parameter learning algorithm analysis
The FELM is based on problem-driven, learning network structure and parameters. Its each step in the learning process is operable and realizable, and the learning process is suitable for any FELM. At the same time, the theoretical basis and mathematical derivation of learning algorithms of FELM are given. The parameter learning algorithm is simple, no iteration and high precision, which is convenient for engineers to use.
The learning process of FELM is completely different from that of the ELM, for its structure have no weight values and the threshold value of neurons. In the ELM, the input layer weights and hidden neuron thresholds of the network are randomly selected, but people can only choose the size of the regularization coefficient by trial and error method, because there is no effective parameter selection method. The characteristics of FELM structure and the process of parameter learning make the problem fundamentally solved. Based on some simple examples, the above shows the learning process of the structure and parameters of the FELM, and its research ideas can be extended to general situations.

Experimental results and analysis
In this section, the performance of the proposed FELM is compared with feedforward neural network algorithms such as the ELM and support vector regression (SVR) on approximating two artificial datasets and 16 benchmark real problems. For comparison, three variant algorithms of ELM (OP-ELM) (Miche et al., 2010), inverse-free ELM (Li et al., 2016), OS-RELM (Shao and Er, 2016) and the variant algorithm of SVR (LSSVR) are also added. Simulations of all algorithms are performed in the MATLAB 2019b environment running on an 11th Gen Intel(R) Core(TM) i5-11320H @ 3.20GHz and 16GB RAM.  The SVR, LSSVR, ELM, and OP-ELM source codes used in this experiment were downloaded from www.csie.ntu.edu.tw/cjlin/libsvm/,www.esat.kuleuven.be/sista/lssv mlab/;www.ntu.edu.sg/home/egbhuang/; and www.cis.hut.fi/proj ects/tsp/index.php?page = OPELM, respectively. We use the radial basis function as the kernel function for SVR and LSSVR. In SVR, two parameters are mainly optimized. For each problem we use different combinations of the cost parameter C and the kernel parameter γ to estimate the generalized accuracy (Huang and Zhao, 2018): C = 2 12 , 2 11 , . . . , 2 −1 , 2 −2 and γ = 2 4 , 2 3 , . . . , 2 −9 , 2 −10 (Huang et al., 2006). Therefore, on SVR, for each problem we try 15 × 15 = 225 (C, γ) parameter combinations, 50 trials for each combination, then calculate the root mean square error (RMSE) of the 50 results of the combination, take the combination with the smallest root mean square error among the 225 combinations as the best parameter combination, and the parameter optimization process of LSSVR is the same. LSSVR mainly optimizes the regularization parameter (C) and the kernel parameter (k p ), and the adopted ranges are the same as C and γ of SVR, respectively. The activation functions of ELM and its variants and the base functions of FELM will be set according to the following specific problems to be solved.

Artificial datasets
In the interval [−1.0, 1.5], 101 training samples (x i , f i ) were obtained by sampling at 0.025 intervals. In addition, with 0.025 as the interval, nine data points are obtained in the unlearned interval [1.505, 1.705] as the prediction points. In this example, FELM uses the base functions: {sin (x) , sin (2x) , sin (3x) , sin (4x) , sin (5x) , sin (6x)}. The optimal combination parameter of SVR is (C, γ) = 2 12 , 2 0 , the optimal combination parameter of LSSVR: C, k p = 2 12 , 2 −10 . ELM, OP-ELM and inverse-free ELM use the sig function as the activation function, and the activation function of OS-RELM is triangular basis function, because the commonly used sig function cannot obtain a feasible solution to this problem in a reasonable time. Trigonometrically-Activated Fourier Neural Networks (Zhang et al., 2009a) (hereinafter referred to as TAFNN) will also be added for performance comparison. The definition error of function as follows.
where expect t is the expected output, and predict t represents the actual network output, N is the number of sample points. And E/N is the average error. As shown in Table 1, the parts in bold are the optimal data, FELM has a competitive advantage in training time, and the total training error and the average training error are less than 6 compared algorithms (especially hundreds of thousands of times smaller than ELM, inverse-free ELM and OS-RELM, Figure 7 shows that the training results obtained by the above three algorithms are not satisfactory), but slightly worse than OP-ELM. However, the training time of FELM is more than 200 times faster than OP-ELM, and its hidden layer has only 6 parameters, so its model complexity is much lower than OP-ELM. Figure 7 shows the testing of 8 algorithms. It can be seen that the testing results of FELM are good, and the error between its output and the target output is small. The testing results of ELM, inverse-free  Comparison of test RMSE on the last 8 datasets when FELM uses ϕ 2 . (A) Diabetes data set, (B) housing data set, (C) machine CPU data set, (D) Mg data set, (E) quake data set, (F) servo data set, (G) strike data set, and (H) wisconsin B.C. data set.
Frontiers in Computational Neuroscience 16 frontiersin.org ELM and OS-RELM are different from the target output. In terms of prediction accuracy, FELM obtains the smallest total prediction error and average prediction error which shows that it has good generalization performance. A more intuitive comparison of predictions is shown in Figure 8. Therefore, compared with other comparison algorithms, FELM can obtain the highest prediction accuracy in the shortest time under the smallest network model.
As shown in Table 2, the parts in bold are the optimal data, except for the first E exp ect , under the other three E exp ect , the structural complexity of FELM is the lowest with the Legendre, because they use similar hidden layer functions. It can be seen from the table that the time required for the FELM optimization process

Frontiers in Computational Neuroscience
is short, indicating that of its learning speed is fast. Table 2 shows that the higher the required precision, the network complexity of ELM, inverse-free ELM and OS-RELM increase exponentially, while the model complexity of SVR has always been relatively large, and the structural complexity of LSSVR remains unchanged because all training samples have been used all the time. In contrast to FELM, its structural complexity will not increase significantly due to high precision requirements, but will increase slowly. Figure 9 shows the approximation of FELM under four different required accuracies.

Data sets and experimental settings
The 16 real benchmark datasets are selected because they cover various fields, and the data size and dimension are different. They are mainly obtained from the data archives of UCI Machine Learning (Asuncion and Newman, 2007) and StatLib (StatLib DataSets Archive, 2021). Table 3 lists the specifications of these datasets. In practical applications, the distribution of these datasets is unknown, and most datasets are not noise-free. For these 7 algorithms, 50 independent simulation trials are performed on each dataset, and the training and test data are randomly regenerated from their entire dataset, two-thirds training and one-third testing. Additionally, in our experiments, all inputs (attributes) and outputs (targets) are normalized to the range [−1, 1].
The ELM uses the sig function as the activation function, OP-ELM uses Gaussian kernel, and the proposed algorithm uses two different types of base functions: ϕ 1 = {1, x, x 2 , x 3 } and ϕ 1 = {sin(x), sin(2x), sin(3x)}, the rest of the comparison algorithms use the RBF kernel. The network complexity comparison of FELM, ELM and SVR is shown in Table 4, where FELM and ELM adopt the same network complexity on the same problem, it can be seen that in most cases, FELM is more compact than SVR. It should be noted that, for fairness, the network complexity of OP-ELM, inverse-free ELM and OS-RELM is also the same as ELM, and will not be repeated in the table. Finally, on the baskball, cloud and diabetes problems, the maximum number of neurons for OP-ELM is prespecified as 62, 70 and 26, respectively, because they have fewer training sets, and the remaining data sets are used a maximum number of 100 neurons.

Evaluation criteria
On the above 13 benchmark regression problems, the evaluation criteria of FELM, ELM and SVR adopt root mean square error:

Evaluation and analysis of experimental results
The FELM is compared with other algorithms to test RMSE under two different types of base functions, and the winners are shown in Tables 5, 6 in bold. As can be seen from Tables 5, 6, FELM achieves higher generalization performance than the other 6 algorithms on all problems. Except on the Wisconsin B.C. problem, the average test RMSE of FELM is 1 order of magnitude higher than the other algorithms on the other 15 problems. It is worth noting that in Table 6, LSSVR achieves the best training results, but on most problems, its test and training RMSE are 3 orders of magnitude different, while the training and test RMSE of FELM are of the same order of magnitude or only one order of magnitude worse. Figures 10-13 shows that whether FELM uses ϕ 1 or ϕ 2 , on Balloon, the curve of FELM and SVR are at the same level and below, which shows that although they obtain similar results, they are better than other algorithms. In Wisconsin B.C., The comparison curves of FELM, SVR and OP-ELM are also similar. But for other 14 problems, it can be seen from the figure that the curves of FELM are all at the bottom, and the fluctuations are gentler, while other algorithms have large fluctuations, indicating that compared with other algorithms, it not only obtains the highest accuracy, but also the network outputs of each independent trial are very close to the expected value, and the error is very small. Figure 14 shows the average time comparison of 7 algorithms on 16 datasets, where FELM−ϕ 1 and FELM−ϕ 2 use ϕ 1 and ϕ 2 base functions for FELM, respectively. Tables 7, 8 and Figure 14 show that the average training time of FELM on the two different types of base functions is close, and it is also similar to ELM, inversefree ELM and OS-RELM in learning speed and test time. But it is obvious from Figure 14 that FELM learns ten times or even more than a hundred times faster than OP-ELM, SVR and LSSVR on most problems.
In fact, according to the above experiments, it is obvious that FELM has better generalization performance than other comparison algorithms; at the same time, the RMSE of FELM changes milder or less, which means that FELM has stronger robustness. In addition, ELM has the advantage of fast learning speed, and the algorithm proposed in this article has been shown to not only generalize well, but also compete with ELM in learning speed.

Conclusions and future works
This article proposes a new type of functional extreme learning machine theory, the parameter learning algorithm without iteration makes the learning speed of FELM very fast. In our simulations, for many problems, the learning stage of FELM can be completed in less than a few seconds. Although the purpose of this article is not to compare functional extreme learning machine with ELM, SVR and their improved algorithms, we also make a simple comparison between FELM and six algorithms in the simulations. The results show that the learning speed of FELM can not only compete with ELM and its improved algorithms, but also be dozens or hundreds of times faster than SVR. As our experimental results show, FELM has higher test accuracy under the same network complexity as ELM and its variants. Because SVR usually generates more support vectors (computing units), LSSVR uses all training data, and functional extreme learning machine just needs few hidden layer nodes (computing units) in the same application. In applications requiring fast prediction and response capability, SVR algorithm may take several hours, so it is not suitable for real-time prediction, and the performance of FELM in this article seems to prove that it is suitable for this application. Compared with popular learning technologies, the proposed FELM has several important characteristics. (1) The training speed of FELM is very fast; (2) Fast parameter learning algorithm without iteration and with high precision; (3) Different function families can be selected according to specific problems, such as trigonometric function bases, Fourier basis functions, etc. In this article, we have proved that FELM is very useful in many practical regression problems, but the following two aspects can be studied in the future: under the actual engineering error, the purpose of optimizing the network is achieved by reducing the network complexity. The network parameters are obtained by matrix pseudo inverse method.