Voting based double-weighted deterministic extreme learning machine model and its application

This study introduces an intelligent learning model for classification tasks, termed the voting-based Double Pseudo-inverse Extreme Learning Machine (V-DPELM) model. Because the traditional method is affected by the weight of input layer and the bias of hidden layer, the number of hidden layer neurons is too large and the model performance is unstable. The V-DPELM model proposed in this paper can greatly alleviate the limitations of traditional models because of its direct determination of weight structure and voting mechanism strategy. Through extensive simulations on various real-world classification datasets, we observe a marked improvement in classification accuracy when comparing the V-DPELM algorithm to traditional V-ELM methods. Notably, when used for machine recognition classification of breast tumors, the V-DPELM method demonstrates superior classification accuracy, positioning it as a valuable tool in machine-assisted breast tumor diagnosis models.


Introduction
Extreme Learning Machine (ELM) (Huang et al., 2004) is a powerful machine learning algorithm that has emerged as a popular alternative to traditional neural networks [such as Back-Propagation (Haykin, 1998) algorithm (BP) and Levenberg Marquardt (Levenberg, 1944;Marquardt, 1963) algorithm] due to its speed, simplicity, and high performance.ELM is a single-layer feedforward neural network that uses random weight initialization and leastsquares optimization to learn from input data (Huang et al., 2006).The algorithm has shown remarkable results in a wide range of applications, from image recognition (Tang et al., 2015) and speech processing (Han et al., 2014) to financial forecasting (Fernández et al., 2019) and anomaly detection (Huang et al., 2015).
One drawback of the ELM algorithm is that the learning parameters of the hidden nodes are randomly assigned and remain unchanged during training, which may lead to a significant impact on its predictive performance and algorithm stability (Gao and Jiang, 2012;Lu et al., 2014).ELM might misclassify certain samples, particularly those near the classification boundaries.In an attempt to address this issue, Cao et al. (2012) proposed a voting-based variant of ELM, referred to as V-ELM.The main idea behind V-ELM is to perform multiple independent ELM trainings instead of a single training, and then make the final decision based on majority voting.However, this approach does not fundamentally resolve the problem of random determination of ELM's various parameters.Zhang et al. (2014) have highlighted that the performance of Extreme Learning Machine (ELM) is not always optimal when the input weights and hidden layer biases are chosen entirely at random.This randomness is also a significant factor contributing to the redundancy of neurons in the hidden layer of the ELM algorithm (Zhu et al., 2005).In response, scholars have proposed the use of swarm intelligence optimization (Lahoz et al., 2013;Figueiredo and Ludermir, 2014;Zhang et al., 2016), pruning methods (Miche et al., 2009(Miche et al., , 2011)), and adaptive algorithms (Pratama et al., 2016;Zhao et al., 2017) to optimize the ELM algorithm and enhance its overall performance.However, in practical applications, although these algorithms do succeed in optimizing the number of hidden layer neurons, they introduce a plethora of hyperparameters that typically require iterative optimization, thereby increasing the computational complexity of the algorithm and rendering it challenging to address real-time problems with high time constraints.To tackle this issue, this paper presents an improved algorithm known as Voting based double Pseudo-inverse weights determination Extreme Learning Machine (V-DPELM).The core concept of V-DPELM lies in the stochastic determination of output weights, while input weights are obtained through pseudoinverse calculations.Subsequently, the pseudo-inverse method is employed again to determine optimal output weights, ensuring that both input and output weights are optimal.The obtained DPELM algorithm is subjected to multiple independent trainings, and the final decision is made based on majority voting.
In the 21st century, breast cancer is increasingly recognized as a significant factor negatively impacting the overall quality of life for women worldwide.According to statistics from the World Health Organization (WHO), approximately 1.5 million women suffer greatly from the torment of breast cancer, with approximately 500,000 losing their lives to this disease (Fahad Ullah, 2019).The incidence and mortality rates of breast cancer exhibit a clear and alarming upward trend each year.Research has demonstrated the paramount importance of timely detection, diagnosis, and initiation of treatment in achieving favorable therapeutic outcomes for breast cancer (Lee et al., 2019;Aldhaeebi et al., 2020).Ten crucial features, including symmetry and fractal dimension of breast tumor lesions, play a vital role in determining the nature of the tumor, whether benign or malignant (Wang et al., 2016(Wang et al., , 2019)).Therefore, it is possible to extract relevant features closely associated with tumor characteristics from acquired patient samples.By employing the proposed V-DPELM algorithm for parameter optimization and subsequent breast tumor classification, the obtained classification and identification results can provide valuable references, assisting physicians in making diagnostic decisions and offering more accurate and rational assessments of patients' conditions.

V-DPELM algorithm design
In the section, we first review the basic concept of the traditional ELM algorithm in Section 2.1.Then, we analyzed the DPELM algorithm in Section 2.2.Finally, the new proposed V-DPELM algorithm will be presented in Section 2.3. .Brief review of ELM Extreme Learning Machine (ELM) is suitable for generalized Single Hidden Layer Feedforward Networks (SLFN).The structure of traditional ELM is similar to SLFN, consisting of three layers: input layer, hidden layer, and output layer.The essence of ELM is that it does not require tuning the hidden layer of SLFN.The structure of ELM is shown in Figure 1.
In the context of N arbitrary training samples {(x i , t i )} N i=1 , where each sample x i = (x i1 , x i2 , ..., x in ) T ∈ R n , t i = (t i1 , t i2 , ..., t im ) T ∈ R m , the resulting output of the ELM with L hidden nodes can be expressed as follows: Here, ω j = (ω j1 , ω j2 , ..., ω jn ) represents the weight vector of the jth neuron in the input layer, and b j is the bias associated with the jth neuron.h(.) indicates the activation function.Furthermore, β j denotes the linked weights between the jth hidden neurons and output neurons, β j = (β j1 , β j2 , ..., β jm ).
For all N samples, the equivalent canonical form of linear equation (1) can be expressed as: In Equation (2), T represents the desired output matrix for the training samples, and is the randomized matrix mapping.It is worth noting that the parameters (ω j , b j ) of the hidden layer neurons are randomly generated and remain fixed throughout the entire training process of ELM.
The ELM algorithm can be summarized as three steps as follow.
• Step 1: Randomly generate parameters for the hidden layer nodes.• Step 2: Calculate the output matrix H of the hidden layer.
• Step 3: Calculate the output weight using β = H † T, † represents the pseudo-inverse of the matrix. .

DPELM learning algorithm
Due to the random determination of input weights in traditional ELM, it has resulted in low classification accuracy and an issue of too many hidden layer nodes.Therefore, this section introduces a new method for determining ELM's weights, referred to as the double pseudo-inverse weights determination ELM (DPELM), aiming to enhance its classification accuracy and achieve a more stable structure.DPELM is similar to the traditional ELM network structure, which consists of input layer, hidden layer and output layer.Upon a more comprehensive analysis of the traditional ELM principle, Equation 1 can be reformulated as follows: where and represent the output weight matrix and the input weight matrix, respectively.Where Derivation process: Assuming the bias B and output weight β are randomly generated within the interval [a1, a2], and the activation function h(•) is strictly monotonous, the ideal should be equal to Since B and β are randomly generated, multiplying both sides of Equation 3 by β † results in: (4) By finding the inverse function of the activation function h(•), we can obtain: The above equation can be rewritten as: (5) Finally, multiplying equation 5 by X † simultaneously results in This concludes the proof.
Once the optimal has been determined, the formula β = T(h( X − B)) † can be employed to compute the value of β. .

V-DPELM model training process
Based on theoretical principles, the specific training process for V-DPELM model is outlined as follows: , where x i , t i , N represent the input vector, target vector, and the total number of samples, respectively.This step introduces essential parameters, including the hidden node output function h(ω, b, x), the count of hidden nodes L, and the number of independent training repetitions K.
• Step 2: Randomly initialize output weights β and hidden layer biases B within the interval [a1, a2].• Step 3: In the case where the training sample is determined, the optimal input weights are computed using the formula Step 4: Subsequently, upon obtaining the optimal input weights , the optimal output weights β are determined as β = T(h( X − B)) † . • Step 5: Repeat steps 2 to 4 for a total of K times to get K independent DPELMs model.Then, perform test tasks on these DPELMs, and the final result is obtained by aggregating the test results using a voting strategy.
The network structure of V-DPELM model is shown in Figure 2.
Algorithm 1 provides a specific introduction to the pseudo code of the V-DPELM method.

Experimental results and analysis
This section randomly selects 12 datasets from the UCI database to assess the classification performance of the improved Extreme Learning Machine algorithm.All experiments in this paper were conducted using Matlab 2016(a) on a regular PC with an Intel(R) Core(TM) i5-12500H CPU running at 3.60GHz and 16GB of memory.

. Experimental description
The present text conducts a series of experiments to evaluate the performance of the algorithm from various perspectives, including the efficacy of its categorization, the precision of its predictions, the requisite count of neurons within its hidden layers, and the stability of its resultant outputs.The datasets utilized in this research were sourced from the UCI (University of California, Irvine) repository, encompassing both binary classification and multi-classification datasets.It is important to note that the training and test data  The final class label of testing sample x test is within each dataset were randomly shuffled for each simulation experiment, ensuring unbiased evaluations.Detailed specifications of these 12 datasets are presented in Table 1. .

Experimental results and analytical discussion
In this subsection, we begin by employing the Iris dataset, the features of which are displayed in Table 1, to ascertain the efficacy of the V-DPELM algorithm.The corresponding outcomes are illustrated through Figures 3-5 and Table 2. Figures 3, 4 depict the graphs of the confusion matrix.Within these figures, the values along the diagonal of the matrix signify the correctly classified samples, whereas those located elsewhere indicate the misclassified samples.
It is evident that V-DPELM exhibits noteworthy proficiency in performing classification tasks, both in testing and training scenarios.Furthermore, as evident from Figure 5, the optimal classification accuracy reaches approximately 99.5% during testing    This finding is corroborated by Table 2. Specifically, when the count of hidden-layer neurons is set to 3, optimal and consistent classification accuracy is achieved.This phenomenon holds true for other cases as well.Regarding Table 2, there is an additional aspect that requires elucidation.In the context of assessing the presented growth methodology, the number of hidden-layer neurons in V-DPELM is tuned either manually, with an increment of 1, or automatically through the growth method.As demonstrated in the table, the proposed growth method effectively identifies the optimal structure for V-DPELM.Consequently, the effectiveness of V-DPELM in pattern classification is preliminarily affirmed.The impact of the number of neurons in the hidden layer on the predictive performance of both the traditional V-ELM and the algorithm proposed in this study is investigated through experimental comparisons.Initially, a subset of samples from each dataset is selected as training and testing data, with the division between them fixed throughout the experiment.The growing method is employed to determine the number of neurons in the hidden layer, where the accuracy is observed after each addition of one neuron.The corresponding algorithm is considered to have the best network structure when the accuracy remains unchanged or the change falls below a predefined threshold.Subsequently, the ELM algorithm and the algorithm proposed in this paper are executed 100 times within the optimized network structure, and the average classification accuracy is computed using the test dataset.In this experiment, the tangent function (tan) is chosen as the activation function, with its inverse function being the arctangent function (arctan).The comparative analysis of classification accuracy for different algorithms and the required number of neurons in the hidden layer to achieve the highest classification accuracy are presented in Table 3.
From Table 3, it can be observed that the algorithm proposed in this paper outperforms the traditional V-ELM algorithm in terms of classification performance, both in binary datasets and multi-classification datasets.The proposed algorithm achieves higher classification accuracy with fewer neurons in the hidden layer, resulting in a simpler network structure.This indicates that the analytical weight initialization method employed in this paper yields superior results compared to the random weight initialization method.Furthermore, to further analyze the impact of algorithm parameters on classification performance and algorithm stability, this study selects one dataset each from binary and multi-class problems for performance comparison.
The SL dataset, a multi-class dataset, and the Diabetes dataset, a binary classification dataset, are selected for this study.The training and testing sets for both datasets are fixed and unchanged throughout the experiments.The number of neurons in the hidden layer is set to increment from 1 to 100.For each additional neuron, the ELM algorithm and the algorithm proposed in this paper are executed 100 times.The experimental results are analyzed in terms of the mean, variance, and range, as depicted in Figures 6, 7.In these figures, the positions indicated by black pentagons and triangles represent the locations where each algorithm achieves the highest classification accuracy.
Observing Figures 6A, 7A, it becomes evident that the increase in the number of neurons in the hidden layer leads to an initial rapid rise in prediction accuracy for both the traditional V-ELM algorithm and the algorithm proposed in this paper.However, after reaching a certain point, the accuracy levels off or slightly declines.By considering the experimental findings and the Theorem presented in Huang et al. (2006), it can be deduced that the algorithm proposed in this study shares similar characteristics with the traditional V-ELM algorithm.Specifically, as the number of neurons in the hidden layer increases, the algorithm's fitting performance improves.Nevertheless, beyond a critical threshold, further augmenting the number of hidden neurons may cause overfitting on the training samples, resulting in a slower or even decreasing classification accuracy on the test samples.
Furthermore, a thorough examination of Figures 6, 7 reveals that, in both the multi-class SL dataset and the binary Diabetes dataset, the proposed algorithm demonstrates a faster rate of average classification accuracy improvement compared to the conventional V-ELM algorithm.Remarkably, achieving this progress requires a smaller number of neurons in the hidden layer.Additionally, the analysis of variance and range reveals that the proposed algorithm exhibits lower values for both metrics compared to the traditional V-ELM algorithm on the SL and Diabetes datasets.This finding suggests that the proposed algorithm possesses superior stability in comparison to the traditional V-ELM algorithm.

Application of V-DPELM in the diagnosis of breast tumors
In order to further validate the accuracy of voting based double pseudo-inverse weights determination extreme learning machine algorithm, this study applies it to the classification and recognition of breast tumor diagnosis.Multiple distinct algorithms are employed to train and recognize the same breast tumor training and testing sets, which are then compared against the performance of the method proposed in this paper.

. Experimental results and analysis
For the purpose of comparing algorithmic performance, three performance metrics were considered: the mean diagnostic rate for benign tumors (referred to as benign diagnosis rate), the mean diagnostic rate for malignant tumors (referred to as malignant diagnosis rate), and the average diagnostic accuracy rate.To ensure robustness of the comparison, independent experiments were conducted 20 times for each algorithm, including the proposed algorithm, V-ELM, Artificial Fish Swarm Algorithm-Extreme Learning Machine (AFSA-ELM), ELM, Learning Vector Quantization (LVQ), and Backpropagation Algorithm (BP).The average values of the benign diagnosis rate, malignant diagnosis rate, and overall accuracy rate were calculated and compared.It should be noted that the experimental results for V-ELM, AFSA-ELM, ELM, LVQ, and BP algorithms were sourced from Zhou and Yuan (2017).The comparative findings are summarized in Table 4.
From the findings presented in Table 4, it is apparent that the average accuracy rate achieved by the proposed algorithm surpasses that of the other algorithms.Although the benign diagnosis rate is slightly lower than that of the V-ELM algorithm, the malignant tumor diagnosis rate is considerably higher.These results highlight the efficacy of the proposed algorithm in rapidly and accurately identifying malignant tumors, thus mitigating the risks associated with delayed treatment and potential impacts on treatment efficacy resulting from misdiagnosis.

Conclusions
In the 12 randomly selected UCI datasets, the algorithm proposed in this paper, voting based double pseudoinverse weights determination extreme learning machine algorithm, exhibits varying degrees of improvement in classification performance compared to the traditional V-ELM algorithm.Among these datasets, the Diabetes dataset shows the greatest increase in classification accuracy, with a significant enhancement of 10.27%.On the other hand, the LD dataset demonstrates the smallest improvement, with a marginal increase of only 0.09% in classification accuracy.
Moreover, the improved algorithm achieves optimal classification accuracy with fewer hidden layer neurons compared to the traditional ELM algorithm, resulting in a simpler network structure.
Additionally, the improved algorithm exhibits reduced variance and range in both the SL and Diabetes dataset experiments, indicating enhanced stability.Furthermore, in the breast tumor classification and recognition experiments, the diagnostic performance of the proposed algorithm surpasses that of V-ELM, AFSA-ELM, ELM, LVQ, and BP methods.This observation highlights the advantage of the proposed algorithm in achieving high classification accuracy in breast tumor auxiliary diagnosis.Thus, the application of this method for breast tumor auxiliary diagnosis is deemed feasible.In addition, it is worth pointing out that processing multi-dimensional data can be a research direction for future work.

FIGURE
FIGUREELM network structure.

FIGURE
FIGURETraining confusion matrix of Iris dataset.

FIGURE
FIGURETest confusion matrix of Iris dataset.

FIGURE
FIGURE SL data set comparison experiment results.(A) Changes in classification accuracy.(B) Changes in range.(C) Changes in variance.

FIGURE
FIGURE Diabetes data set comparison experiment results.(A) Changes in classification accuracy.(B) Changes in range.(C) Changes in variance.
TABLE Specifications of classification datasets.
TABLE Classification performance of V-DPELM with di erent hidden layer neuron numbers in the Iris Dataset.
TABLE Comparisons of classification accuracy and number of hidden layer neurons of di erent algorithms.
TABLE Performance comparison of multiple algorithms.Data in this study were collected from an open data set published by the University of Wisconsin School of Medicine, including 569 cases of breast tumors, 357 benign and 212 malignant.In this paper, 450 groups of tumor data (282 benign cases, 168 malignant cases) were randomly selected as the training set, and the remaining 119 groups of tumor data (75 benign cases, 44 malignant cases) were selected as the test set.Each sample was composed of 30 data, including the mean, standard deviation and maximum value of 10 characteristic values extracted from the breast tumor sample data.