Kohonen neural network and symbiotic-organism search algorithm for intrusion detection of network viruses

Introduction The development of the Internet has made life much more convenient, but forms of network intrusion have become increasingly diversified and the threats to network security are becoming much more serious. Therefore, research into intrusion detection has become very important for network security. Methods In this paper, a clustering algorithm based on the symbiotic-organism search (SOS) algorithm and a Kohonen neural network is proposed. Results The clustering accuracy of the Kohonen neural network is improved by using the SOS algorithm to optimize the weights in the Kohonen neural network. Discussion Our approach was verified with the KDDCUP99 network intrusion data. The experimental results show that SOS-Kohonen can effectively detect intrusion. The detection rate was higher, and the false alarm rate was lower.


Introduction
With the rapid spread of the Internet, there has also been a rapid development of online systems for shopping, banking, making payments, stock trading, and so on. However, due to the openness of the network, forms of network intrusion are becoming increasingly diversified, so that networks and systems are experiencing ever more serious threats. Therefore, detecting network intrusion has become a critical issue in network security. In recent years, increasing attention has been paid by scholars all over the world to intrusion detection. The aim is to identify any behavior that could compromise the integrity, confidentiality, or availability of the system. It can be defined as identifying the people accessing a computer system (Shitharth and Winston, 2017). Current methods of network intrusion detection can be divided into two categories: misuse intrusion detection and abnormal intrusion detection. The capability of misuse intrusion detection mainly depends on the completeness of the detection knowledge base. Its shortcoming is that it cannot find unknown forms of intrusion. Abnormal intrusion detection is based on identifying a difference between the detected and acceptable behavior. Due to their continuous development, various swarm intelligence algorithms have been applied to intrusion detection, such as the genetic algorithm (Mabu et al., 2011), immune algorithm , ant colony optimization (Feng et al., 2014), and so on. However, as the "no free lunch" theorem (Wolpert and Macready, 1997) argues, none of these group intelligence algorithms is suitable for detecting all forms of intrusion. Thus, finding a better algorithm is still a hot topic for scholars in various countries.
The remainder of the paper is organized as follows. Section 2 describes the structure of a Kohonen neural network. Section 3 introduces a basic SOS algorithm. Section 4 considers the use of the SOS-Kohonen algorithm for intrusion detection. Section 5 describes the data preprocessing method. The simulation experiments and results are presented in Sections 6, 7 concludes and discusses future work.

Kohonen neural network
Finnish professor Teuvo Kohonen proposed an unsupervised self-organizing competitive neural network called a Kohonen neural network. It can achieve automatic clustering by using a self-organizing feature mapping to adjust network weights. A Kohonen neural network (De Almeida et al., 2013) consists of two feedforward layers, namely an input layer and an output layer. The input layer is mapped into a two-dimensional response mesh in the output layer based on weights. The topology of a Kohonen neural network is shown in Figure 1.
In a Kohonen neural network, the Euclidean distance of each neuron is obtained by calculating the input eigenvector for the corresponding output layer. The neuron with the smallest Euclidean distance is the superior neuron, and its connection weights are adjusted to make it closer to the original input vector. The area adjacent to the winning neuron is also adjusted by the connection weight to make it closer to the input vector.
In the training phase, each input vector X s is input into the network, and only those winning neurons closest to the current weight vector of the input receive a corresponding stimulus. The pattern vector X s is calculated as the minimum Euclidean distance from the selected winning neurons: where c represents the winning neuron and x si represents the ith coordinate of the input vector. In addition, the level of the ith weight of neuron j is denoted by w ji . The number of neurons in a Kohonen level is denoted by N × N. Once the winning neuron is selected, the corresponding weight w ji of each neuron j in the layer is updated according to the difference between the original weight and the input neuron, as follows: (2) where the learning rate is η, the weight of the previous generation of w ji is w old ji , and the number of neurons between neuron j and the superior neuron is represented by the topological distance d r . The size of the adjacent area d max decreases from the coverage of the entire network to the winning neurons as training progresses. In addition, the learning rate η changes during training: where n tot represents the total number of iterations; n epoch represents the current iteration times.

SOS algorithm
In nature, some organisms establish symbiotic relationship, which strengthens their ability to adapt to the environment, thereby enhancing their viability. The SOS algorithm (Cheng and Prayogo, 2014) simulates the symbiotic relationships found in nature. Each organism in the ecosystem passes through three phases in the SOS algorithm: mutualism, commensalism, and parasitism. In each phase, the organism is assumed to be in a symbiotic relationship with another random organism. The interactions between the pairs of organisms are used to adjust the fitness value. The result is an optimal solution to the problem. The process is described in the following sections.

Mutualist phase
In nature, the symbiosis between bees and flowers provides a mutual benefit, as both organisms can benefit. The formulae for 10.3389/fncom.2023.1079483 Structure of a Kohonen neural network.
updating organisms in mutually beneficial symbiosis are as follows: where X i and X j represent two of the organisms in the ecosystem. X best represent the best organism; Mutual_Vector represents the relationship between two organisms. The benefit factors are BF 1 and BF 2 , which have a value of 0 or 1. The unequal benefits obtained by the two parties from the symbiotic relationship are controlled by the benefit factors.

Commensal phase
An example of commensalism in nature is that between a remora and a shark. The remora benefits while the shark is neither harmed nor benefits. This symbiotic relationship is called partiality. The formula for the commensalism phase is where X i is the party that makes a unilateral gain and X j is the party that is not harmed.

Parasitic phase
In nature, parasitism occurs between mosquitoes and humans. The mosquitoes benefit, whereas the humans are hurt. In this stage, some of the dimensions of X i are randomly selected and replaced by random values within the search space to form the artificial parasiteParasite_Vector. In the population randomly selected, we compare the fitness of an individualX j (j = i) with Parasite_Vector, and keep the optimal organism as the new X j .

Proposed SOS-Kohonen algorithm for intrusion detection
The SOS algorithm is based on the natural phenomenon of symbiosis between various organisms. When a virus intrudes into a system, the relation between the system and the virus can be viewed as a symbiotic relationship between the virus data and the system data.
The initial weights of the Kohonen neural network are optimized with the SOS algorithm. The optimized Kohonen neural network can reduce the length of the error vector between the training sample and the weight vector. This process helps to avoid rigidity during training, which can improve the clustering ability of the Kohonen neural network. After SOS training, the Kohonen neural network identifies subclasses with similar input patterns. Each subclass is used to train a specific radial basis network, which results in a local adjustment of the weights of the radial basis network. This can reduce the training burden of the radial basis network and improve the classification of the sample. For a Kohonen neural network trained by a sample data set, only one neuron in the competing layer is activated. A radial basis network corresponding to the activated winning neurons is used as input. Currently, the only neuron in the output layer is the transient stability index.
The steps in SOS-Kohonen intrusion detection are as follows: Step 1. Initialize the training data set, the number of symbiotic species, and the number of iterations.
Step 2. The initial weights w are adjusted according to the Euclidean distance between the sample vector and the initial weight in the mutualist phase, commensal phase, and parasitic phase of the SOS algorithm.
Step 3. The Kohonen neural network is trained according to the initial weight w optimized by the SOS algorithm.
Step 4. The distance between the competing layer neuron j and the input vector X is calculated: Step 5. If the minimum distance has been reached, the competing layer neuron X, which matches the sample vector C, is the output neuron of the optimal matching.
Step 6. Adjust the node weight coefficients in node c and neighborhood vector x: The positions of neurons c and t are denoted by pos c and pos t , respectively. The distance between the two neurons is calculated in terms of norm(). η and r represent the learning rate and the neighborhood radius, respectively. They decrease linearly as the number of iterations increases.
Step 7. If the stopping condition is met stop, otherwise return to step 3.
Step 8. Read another test data set.
Step 9. Cluster the input test data set according to the trained weight W.
Step 10. Output the classification result. Pseudocode corresponding to the steps of SOS-Kohonen intrusion detection is given in Algorithm 1.
Initialize: Populate n organisms in the ecosystem with random values Input: Training data set Calculate the initial weights w by summing the points and output nodes of the Kohonen neural network Calculate the fitness of each organism Identify the best organism (X best ) in the initial population Define a stopping criterion (either a fixed number of generations/iterations or accuracy) while (t < MaxGeneration) for i = 1 to n Mutualist phase Choose organism j randomly other than organism i Determine the beneficial factor and mutual vector via Eqs. (6) Modify organisms X i and X j based on their mutual relationship via Eqs. (4) and (5) Calculate new weights Evaluate the fitness of the new solution Accept the new solution if the fitness is better End of mutualist phase Commensal phase Choose organism j randomly other than organism i Modify organism X i with the assist of organism X j via Eq. (7) Calculate    The detection processes of BA-Kohonen, CS-Kohonen, FPA-Kohonen, GWO-Kohonen, and PSO-Kohonen can be imitated by SOS-Kohonen.

Data preprocessing
In intrusion detection, the network data to be assessed have multiple attributes with inconsistent units of measurement. If such data are used directly for intrusion detection, the accuracy and speed will be reduced. Therefore, the input data are pretreated, that is, normalized. The specific preprocessing method is as follows: (1) The data are standardized so that the mean of each attribute is 0 and the variance is 1. The attributes of the initial network data are denoted by x ij , and x j and S j represent the mean and variance of the jth dimension, respectively. The attributes are standardized as follows: Frontiers in Computational Neuroscience 07 frontiersin.org The normalized formula for (10) is as follows: (2) Normalize formula (10) to the range [0, 1] is as follows: , i = 1, 2, . . . , n, j = 1, 2, . . . , m

Simulation experiments and analysis of results
To verify the effectiveness of SOS-Kohonen in detecting network intrusion by a virus, we ran two sets of tests with the proposed algorithm. The first verified the accuracy of SOS-Kohonen in classifying five types of virus. The second verified the ability of SOS-Kohonen to detect viruses hidden in normal data. The results for SOS-Kohonen were compared with results for Kohonen neural networks combined with one of five commonly used swarm intelligence algorithms: BA, CS, FPA, GWO, and PSO. The relevant parameters for these algorithms were set as follows: BA: As in Ref. (Xinshe, 2010), r 0 = 0.5, A = 0.5, α = 0.95, γ = 0.05. CS: As in Ref. (Yang and Deb, 2009), β = 1.5, ρ 0 = 1.5.

Experimental setup
The development environment for this test was MATLAB R2012a. The tests were run on an AMD Athlont (tm) II * 4640 processor with 4 GB of memory.

Simulation of virus classification by SOS-Kohonen
In this section, we tested the accuracy of SOS-Kohonen in virus classification. The standard network intrusion test data set contains five categories of virus data. We extracted 4000 training samples, as shown in Table 1 and Figure 2A. Each sample contained a 38dimensional feature that is used to represent the different attributes of the network intrusion data. Attack type 5 had the fewest training samples and type 2 had the most. Four subsets were randomly selected from the virus intrusion detection data set as test cases. The percentages for the five attack types in the four cases varied (Figure 2B). The number of samples in each case was different, as shown in Table 3.
For each case, we conducted 10 independent tests using each of the six group intelligence algorithms to determine the weights for the Kohonen neural network. It can be seen that the SOS-Kohonen algorithm has a preference better than BA-Kohonen, CS-Kohonen, FPA-Kohonen, GWO-Kohonen, or PSO-Kohonen, both in terms of optimal value and variance for the accuracy. It also had stronger robustness.
Figures 3C, G, K, O show the expected classification results for cases 1 to 4. Figures 3D, H, L, P show the actual results for these cases for SOS-Kohonen. Due to space constraints, we show the results only for the highest detection rate from the 10 independent runs. The detection rate is defined in Section "6.3. Simulation of virus detection by SOS-Kohonen." The red circles indicate differences between the actual detection and the expected detection. Figures 3D, L, P have only one error, whereas Figure 3H has five. Table 3 lists the detection rates for the six algorithms for the four cases. SOS-Kohonen had higher detection rates than the BA-Kohonen, CS-Kohonen, FPA-Kohonen, GWO-Kohonen, or PSO-Kohonen algorithms. It achieved an average detection rate of 99.4% in classifying intrusion data.

FIGURE 5
Specific proportions of the test data.
Frontiers in Computational Neuroscience 09 frontiersin.org Figures 3E, I, M, R illustrate the convergence of the six algorithms. It can be seen that SOS converged fastest and with the highest accuracy. Figures 3F, J, N, S are variance maps for each algorithm. The SOS algorithm had the strongest stability and highest robustness compared to the other algorithms.

Simulation of virus detection by SOS-Kohonen
This section uses the internationally accepted KDDCUP99 (KDD Cup 1999Data, 1999MIT Lincoln Laboratory, 2009;Aggarwal and Sharma, 2015) data set to verify the detection performance of SOS-Kohonen. The KDDCUP99 data set was established by the Lincoln Laboratory of the Massachusetts Institute of Technology. The data set was collected using tcpdump from a simulated network environment over 9 weeks. This database has become a benchmark for network intrusion detection and can be used in comprehensive tests of the performance of intrusion detection algorithms. Attacks in the data set include denial of service attacks (DOS), scan attacks (probe), remote user unauthorized access attacks (U2L), and unauthorized use of local super-privilege access attacks (U2R). We apply the internationally accepted detection rate and false alarm rate as evaluation indicators, which are defined as follows (Ganapathy et al., 2012;Lin et al., 2015): We randomly selected 6,000 samples as training data, including normal data and the four kinds of intrusion data. The percentages of these five types of data are given in Figure 4. Among them, normal samples were the most common and U2R samples the least common. We then randomly selected four subsets from the KDDCUP99 data set as test cases. The number of each attack type for each case are plotted in Figure 5 and listed in Table 4.
In this paper, 10 independent experiments were carried out for each algorithm for the four cases. As can be seen from Table 5, for cases 1, 2, and 3, the SOS-Kohonen algorithm has higher search accuracy than the other algorithms. In case 4, although the optimal value for SOS is slightly worse than that for GWO, the average of the 10 runs was still better than that of the other five algorithms. This shows that the SOS-Kohonen algorithm has a strong search ability and robustness as a whole. Figures 6C1, G1, K1, O1 show the expected test results for cases 1 to 4, and Figures 6D1, H1, L1, P1 show the actual test results for SOS-Kohonen. Due to space constraints, we show the results only for the highest detection rate from the 10 independent runs. The normal data are represented as blue dots, and the other colors represent the four types of intrusion data. The red circles indicate differences between the actual detection and the expected detection. Table 6 shows the average detection rates and average false alarm rates for the six algorithms for the four cases. It can be seen that SOS-Kohonen had a higher detection rate and lower false alarm rate than BA-Kohonen, CS-Kohonen, FPA-Kohonen, GWO-Kohonen, or PSO-Kohonen. The average detection rate of SOS-Kohonen was 94.51%. Figures 6E1, I1, M1, R1 show the convergence of the six algorithms. The convergence speed and accuracy of SOS-Kohonen were better than those of the other algorithms.  Figures 6F1, J1, N1, S1 show the variance of each algorithm. For each case, the SOS-Kohonen algorithm had the second best variance ranked second but the highest search accuracy. Overall, the SOS-Kohonen algorithm performed better than the other algorithms.

p-values from the Wilcoxon rank-sum test
Next, we ran the Wilcoxon rank-sum test (Derrac et al., 2011;Gibbons and Chakraborti, 2011;Hollander et al., 2013) for (Continued)  Cases 1-4 expected classification results are (C 1 ,G 1 ,K 1 ,O 1 ); actual classification results are (D 1 ,H 1 ,L 1 ,P 1 ); fitness evolution curves are (E 1 ,I 1 ,M 1 ,R 1 ); ANOVA test of optimal paths are (F 1 ,J 1 ,N 1 ,S 1 ); respectively.  SOS-Kohonen and the other five algorithms. We chose p = 0.05 as the level of significance. Table 7 shows the p-values for the four classification cases described in Section "6.2. Simulation of virus classification by SOS-Kohonen." Table 8 shows the p-values for the four detection cases described in Section "6.3. Simulation of virus detection by SOS-Kohonen" In Table 7, for case 3, the p-value for SOS vs CS is greater than 0.05. For case 4, the p-values for SOS vs GWO and SOS vs PSO are greater than 0.05. All other p-values are less than 0.05. In Table 8, the only p-values greater than 0.05 are for SOS vs GWO for cases 1, 3, and 4. Thus, for most of the eight tests cases, the differences between SOS and the other algorithms were statistically significant and not due to chance.

Analysis of results
In this paper, six common swarm intelligence algorithms were combined with Kohonen neural network and used to simulate the intrusion detection of network viruses. We ran two sets of tests. In Section "6.2. Simulation of virus classification by SOS-Kohonen, " we tested the classification accuracy of the algorithms. Figures 3C-S show the classification results, convergence, and variance maps for the four test cases. Table 2 lists the classification accuracy of the six swarm intelligence algorithms.
In Section "6.3. Simulation of virus detection by SOS-Kohonen, " we assessed the detection rate and false alarm rate for normal data and four attack types. Figures 6C1-S1 show the classification results, convergence, and variance maps for the four test cases. Table 6 compares the average detection rates and false alarm rates for the six algorithms.
In Section "6.4. p-values from the Wilcoxon rank-sum test, " we ran the Wilcoxon rank-sum test. For most of the eight tests cases, the differences between SOS and the other algorithms were statistically significant and not due to chance. Thus, the SOS-Kohonen algorithm is more effective than the other five swarm intelligence algorithms in detecting network intrusion by a virus.

Conclusions and future work
With the continuous development and popularization of the Internet, there is much more convenient access to network resources. However, this has led to a continuous increase in security problems due to virus intrusion. In this paper, we combined a swarm intelligence algorithm with a neural network to detect network intrusion by a virus. Our approach is described in detail, and it was tested with the international KDDCUP99 intrusion data set, which verified its effectiveness. Moreover, this is also a new method for detecting network intrusion by a virus. With the rapid development of cloud computing and big data, our future work will consider the application of SOS-Kohonen to heterogeneous distributed systems.

Data availability statement
The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.