Multi-person feature fusion transfer learning-based convolutional neural network for SSVEP-based collaborative BCI

Objective The conventional single-person brain–computer interface (BCI) systems have some intrinsic deficiencies such as low signal-to-noise ratio, distinct individual differences, and volatile experimental effect. To solve these problems, a centralized steady-state visually evoked potential collaborative BCI system (SSVEP-cBCI), which characterizes multi-person electroencephalography (EEG) feature fusion was constructed in this paper. Furthermore, three different feature fusion methods compatible with this new system were developed and applied to EEG classification, and a comparative analysis of their classification accuracy was performed with transfer learning-based convolutional neural network (TL-CNN) approach. Approach An EEG-based SSVEP-cBCI system was set up to merge different individuals’ EEG features stimulated by the instructions for the same task, and three feature fusion methods were adopted, namely parallel connection, serial connection, and multi-person averaging. The fused features were then input into CNN for classification. Additionally, transfer learning (TL) was applied first to a Tsinghua University (THU) benchmark dataset, and then to a collected dataset, so as to meet the CNN training requirement with a much smaller size of collected dataset and increase the classification accuracy. Ten subjects were recruited for data collection, and both datasets were used to gauge the three fusion algorithms’ performance. Main results The results predicted by TL-CNN approach in single-person mode and in multi-person mode with the three feature fusion methods were compared. The experimental results show that each multi-person mode is superior to single-person mode. Within the 3 s time window, the classification accuracy of the single-person CNN is only 90.6%, while the same measure of the two-person parallel connection fusion method can reach 96.6%, achieving better classification effect. Significance The results show that the three multi-person feature fusion methods and the deep learning classification algorithm based on TL-CNN can effectively improve the SSVEP-cBCI classification performance. The feature fusion method of multi -person parallel feature connection achieves better classification results. Different feature fusion methods can be selected in different application scenarios to further optimize cBCI.

Objective: The conventional single-person brain-computer interface (BCI) systems have some intrinsic deficiencies such as low signal-to-noise ratio, distinct individual differences, and volatile experimental effect.To solve these problems, a centralized steady-state visually evoked potential collaborative BCI system (SSVEP-cBCI), which characterizes multiperson electroencephalography (EEG) feature fusion was constructed in this paper.Furthermore, three different feature fusion methods compatible with this new system were developed and applied to EEG classification, and a comparative analysis of their classification accuracy was performed with transfer learning-based convolutional neural network (TL-CNN) approach.
Approach: An EEG-based SSVEP-cBCI system was set up to merge different individuals' EEG features stimulated by the instructions for the same task, and three feature fusion methods were adopted, namely parallel connection, serial connection, and multi-person averaging.The fused features were then input into CNN for classification.Additionally, transfer learning (TL) was applied first to a Tsinghua University (THU) benchmark dataset, and then to a collected dataset, so as to meet the CNN training requirement with a much smaller size of collected dataset and increase the classification accuracy.Ten subjects were recruited for data collection, and both datasets were used to gauge the three fusion algorithms' performance.

Main results:
The results predicted by TL-CNN approach in singleperson mode and in multi-person mode with the three feature fusion methods were compared.The experimental results show that each multi-person mode is superior to single-person mode.Within the 3 s time window, the classification accuracy of the single-person

Introduction
Brain-computer interface (BCI) is a human-computer interaction technology that allows people to directly communicate with a computer or control peripheral device without their surrounding muscles (Vaid et al., 2015).This technology is useful for patients with movement disorders and partial brain injuries, as it helps them realize simple operation and communication (Wolpaw et al., 2000).At present, electroencephalography (EEG)-BCI systems mainly include event-related potentials evoked by endogenous events based on cognitive function (Li et al., 2019), visually evoked potentials (VEP) based on visual stimulation (Mary Judith and Baghavathi Priya, 2021), and event-related area synchronization and event-based active motor imagery in the phenomenon of correlation synchronization (Munzert et al., 2009).Steady-state visually evoked potential (SSVEP) is one of the most popular EEG patterns in the field of BCI.Owing to its advantages such as high information transmission rate (ITR), low requirement on user training, and easy evocation, SSVEP is widely applied to various fields such as medical care, industries, communication, smart home, gaming, robotics, and vehicle control (Zhao et al., 2016;Angrisani et al., 2018;Dehzangi and Farooq, 2018;Farmaki et al., 2019;Nayak et al., 2019;Chai et al., 2020;Shao et al., 2020).
Single-person BCI system's performance is subject to individual differences between users and their physical or mental conditions, and this weakness becomes more prominent as BCI system develops further (Song et al., 2022).In contrast, multi-person-coordinated BCI can better serve the future socialized human-computer interaction and will most certainly dominate this field both in terms of research and application.Studies have shown that increasing the number of users can substantially improve BCI performance (Valeriani et al., 2016).In human behavior research, teams' performance is always better than that of individuals.The distinction in performance between teams and individuals is even greater when humans acquire diversified skills, judgments, and experiences under time constraints (Katzenbach and Smith, 2015).As single-person EEG signals have significant individual differences, by collecting multi-person EEG signals and fusing these signals in a reasonable way, signals with more distinctive features can be obtained, and the BCI performance can be improved.EEG signals from multiple subjects can significantly improve ITR in the system compared to single EEG signals (Bianchi et al., 2019).Subjects who need to stare at the stimulation area for a long time are prone to fatigue due to visual stimulation in SSVEP-BCI, which affects the quality of EEG signal acquisition, and this is particularly evident for some subjects (Peng et al., 2019).SSVEP-cBCI can make up for this deficiency by increasing the user dimension and improve the information transmission rate.Acknowledging this viewpoint, this paper explores three feature fusion methods, which include (1) parallel connecting features, (2) serial concatenating features, and (3) feature averaging.These approaches will be explained in detail in section "Methods."The three feature fusion methods aim to improve the signal-to-noise ratio by merging multi-person EEG information to get refined new features to enhance the BCI performance.
As a branch of machine learning, deep learning has achieved great success in solving problems in computer vision and natural language processing.It is different from traditional machine learning as it does not entail manual feature extraction (LeCun et al., 2015).Using gradient descent learning to optimize convolutional neural network (CNN) parameters successfully solved the problem of handwritten digit classification (LeCun et al., 1998).However, owing to the complexity of EEG signals, the application of deep Li et al. 10.3389/fnins.2022.971039learning neural networks in EEG signal detection is still in the exploratory stage.Cecotti and Graser (2010) developed a fourlayer CNN for P300 detection.At present, the SSVEP EEG signal classification method converts the original EEG signal through FFT and then inputs it into CNN for classification (Cecotti, 2011;Zhang et al., 2019;Ravi et al., 2020).As a superb CNN model designed for EEG, EEGNet exhibits good classification performance, but other models perform better in some moments.In this study, some details of the basic EEGNet were adjusted, and the network structure was modified to adapt to the newly created fusion features.The transfer learning (TL) training strategy using a THU benchmark dataset as the source task training set was adopted to initially train the parameters of the convolutional layer and build the basic feature extractor.Using the data collected by the laboratory as the target task training set and test set, the CNN parameters were further optimized to construct SSVEP-cBCI.In this paper, the classification model is trained with the TL-CNN method, which reduces the required amount of training collected data and improves the classification accuracy.And the feature fusion approach further improves BCI performance in classification accuracy, ITR and stability.
Section "Methods" elaborates on the personnel, equipment, and experimental paradigms associated with the experiments, the three multi-person features fusion methods, the specific structure of the modified CNN in this study, and its difference from EEGNet.Then the following part introduces the specific training method of TL.In section "Results, " the classification accuracy and ITR difference of the three feature fusion methods and those predicted by a single-person CNN are compared.Finally, some significant conclusions are drawn, and the specific usage of the three feature fusion methods in this experiment is analyzed.

Experimental setup
The structure of cBCI system The cBCI system mainly has two structural forms: distributed and centralized (Wang and Jung, 2011).In both systems, experiments are simultaneously conducted on more than one subject.In the distributed cBCI, subjects' EEG information is collected individually for subsequent data preprocessing, feature extraction, and pattern recognition through the corresponding BCI subsystem.The results corresponding to each subject are then transmitted to the integrated classifier, and the final decision is produced through decision-making layer's voting mechanism, while in the centralized cBCI, as shown in Figure 1, subjects' EEG information is collected individually for sequential data preprocessing and feature extraction.The EEG data features of all subjects are fused for pattern recognition to make the final decision for the group.The model adopted in this study is a centralized cBCI system, which does not rely on the voting mechanism of the distributed system, and classification is carried out with a CNN based on TL (TL-CNN).

Experimental paradigm
In this experiment, the EEG data were collected and transferred from the EEG amplifier to the software Curry8 (Neuroscan).Three electrodes were placed on O1, Oz, and O2 according to the International 10-20 system.Using the double mastoid as reference and ground electrodes, the impedance of all electrodes was reduced to below 5 k .The sampling frequency is 256 Hz, and a band-pass filter between 5 and 40 Hz is used in the data processing to filter out low-frequency noise and 50 Hz power frequency noise.
Ten healthy subjects (8 males, 2 females, 21-27 years old) participated in the experiments.All participants had normal or corrected vision.Four of them had participated in SSVEP experiments previously.All participants read and signed the informed consent forms.Subjects sat on a comfortable chair 60 cm in front of a standard 24inch monitor (60 Hz refresh rate, 1,920 × 1,080 screen resolution).The SSVEP stimulation interface is shown in Figure 2, and the four stimulation squares are all 50 × 50 pixels.The refresh frequency of the display equals integer multiples of the stimulation frequency of the four color blocks, which can ensure stable stimulation frequency and avoid frequency deviation.The stimulation frequencies of the four color blocks are 8.6, 10, 12, and 15 Hz, respectively.It was evidenced that stimulation frequencies of 10 and 12 Hz can stably induce high-amplitude SSVEP signals (Chen et al., 2015), and the stimulation duration was set to be 4 s.To avoid interference caused by simultaneous flickering of the four color blocks, the phases of the four color blocks are set as 1.35π, 0.35π, 0.9π, and 0.35π, respectively.Prolonged staring at the flickering stimulus color blocks made the subjects feel tired and distracted them, resulting in a frequency deviation of the SSVEP signal.To improve the concentration of the subjects and the quality of SSVEP EEG signals, random labels were used to remind the subjects to look at the corresponding stimulus squares.0.02 s after the five-pointed star appeared, the four color blocks started to flash.After the flashing, a rest time of 2 s was given to the subject to adjust the viewing angle.During the experiment, the subjects were asked to focus on the corresponding color block and blink as few times as possible.Each color block flashed twice in total, and there was a 1-min rest between two consecutive experiments.

Multi-Person feature extraction
The EEG data filtered and processed by the fourth-order Butterworth filter is converted from the time domain to the frequency domain by FFT transformation (Chen et al., 2015).Low-frequency (8.6, 10, 12, and 15 Hz) stimulation area was used in these experiments.The features of the frequency band from 6 to 32 Hz were selected from the FFTtransformed data to further filter out noise and improve feature quality.
The characteristics of the SSVEP signal are as follows: The input of the convolutional neural network is: The min-max normalization (discrete normalization) is conducted on the data of each channel (Ali et al., 2014) to avoid adverse effects on the classification accuracy owing to huge differences between values, ensure good performance of different data within the same neural network, and improve the robustness of the algorithm.

Feature fusion
This paper proposes three methods to fuse multiperson EEG features.As shown in Figure 3 to obtain the effective feature after data preprocessing and FFT transformation.The features of different subjects are connected in parallel, serial concatenation, or averaging.
In parallel feature concatenation, connection is made mainly the spatial domain, which implies more feature lead channels.In serial feature connection, connection is made mainly in the frequency domain, which implies that there is no change in the number of channels, but the domain scope expands greatly, and thus, the effective features are enhanced from the frequency perspective and the BCI performance improves.However, serial concatenation requires more training on epoch and convolution kernel to achieve the similar classification accuracy of parallel connection.It involves more complex algorithm, so it is more difficult to set up an online system by Python.The above two feature fusion methods are suitable for subjects with a known number of participants in the experiment, but when the number of participants in the brain group is unknown, different CNNs meant for various number of subjects should be set up and trained, which entails more input in the experimental preparation.This problem can be solved by adopting the third approach, feature averaging, that is, to get new features by averaging the normalized EEG frequency features of all subjects.The CNN using this approach shares the same structure of singleperson CNN, and its classification accuracy is superior to that of a single-person CNN but inferior to that of a two-person CNN.

Deep learning network construction
Improved the network structure of convolutional neural network This paper makes some adjustments on the EEGNet, changing the number of convolution kernels, the size of the convolution kernel, and the depth of the convolution layer.In this experiment, the collected data were used to predict the classification accuracy of the test set, and the EEGNet was modified to accommodate the data.In this study, the ordinary convolution layer was discarded, a depthwise convolution layer was added, and the pointwise convolution layer was changed into a small narrow-band convolution.The network consists of six consecutive layers, including four convolutional layers, one fully connected layer, and one softmax output layer (Jang et al., 2016).Network fitting is accelerated through batch normalization (Ba et al., 2016).Linear activation layer adopts ReLu function (Agarap, 2018).
Table 1 summarizes the modified CNN structure.The convolution kernel of the C1 convolution layer has size 3 × 1, and its function is to learn the linear combination and spatial filtering features between different channels.The method  of padding and zero-filling is adopted to prevent the loss of information caused by convolution (Dwarampudi and Reddy, 2019).The C2 layer does not use the method of padding and zero-filling and integrates multi-channel data into a single channel by convolution.The C3 convolution layer extracts features along the input frequency spectrum by convolution and acts as a band-pass filter.The C4 convolution layer also integrates frequency features without padding.Among them, the convolution operations of the C2 and C4 layers have achieved the down-sampling effect.After Flatten layer, a fully connected layer is followed by softmax for classification.

Network structure of feature fusion convolutional neural network
As can be seen in Table 2, compared with the single-person CNN, the difference between the two-person parallel feature concatenation CNN structure is that it increases the number of key channels, from three-channel to six-channel EEG data, which greatly increases the number of features.
Therefore, the two-person parallel feature connection CNN structure was added to unpadded convolution layers C2 and C3 in accordance with the single-person CNN  As can be seen in Table 3.The CNN structure used by the two-person serial feature connection method is similar to the single-person CNN structure.With the dual serial feature connection, the number of features input to the CNN is increased.This builds more feature extractors by increasing the number of convolution kernels to get better results.If the number of convolution kernels of the two-person CNN connected by serial features is the same as that of the singleperson CNN, the classification accuracy will drop by about 2%.

Transfer learning-based feature fusion strategy with different datasets
Compared with traditional machine learning algorithms, deep learning methods heavily rely on high-quality data.Obtaining sufficient high-quality datasets to train high-quality convolution kernel parameters is a critical problem to be solved in CNN setup.Transfer learning (Pan and Yang, 2009) gives an effective solution to this problem.The SSVEP EEGs collected in the THU benchmark dataset (Chen et al., 2015) exhibit good features and low error rates of subjects' operation, and thus, this dataset was used as the source dataset for initial parameters training on the model.In general, parameters in CNN are randomly initialized by training collected data directly.Compared with transfer learning, it requires a larger amount of data and training time to fit and get a satisfactory feature extractor.While using transfer learning methods, initial parameters can be constructed in a pre-training manner, and these parameters are usually derived from prior knowledge and hence can well perform the corresponding task.As a consequence, only a small amount of actual experimental data serve as the training set, and the model parameters are relearned through fine-tuning for the model to adapt to the actual Training strategy.
experimental data.This method can improve the classification accuracy of the model and effectively reduce the required size of experimental data collected in our laboratory to train the CNN.
The comparison among various fine-tune methods suggests significant differences in their stability but insignificant difference in their classification accuracy.Figure 4 shows that only the parameters of the deep convolutional layer and the fully connected layer are trained, while the parameters of the shallow CNN are frozen and not involved in the training.Since the feature distribution of the source task data (THU benchmark dataset) does not coincide with that in this experiment, finetuning on the parameters of the deep convolutional layer with a small learning rate can improve the feature extraction performance of the convolutional layer.
In the first step, training was conducted with the THU benchmark dataset as the training set, and a total of 720 samples were used, with 180 samples per stimulus.The network weights are learned in accordance with the Adam learning algorithm, which optimizes the network weights through backpropagation, and the cross-entropy function is adopted as the loss function (Zhang and Sabuncu, 2018).
The data of 24 people in the THU benchmark dataset is used as the pre-training dataset, and different combinations of multiple people are randomly used for feature fusion.After repeated experiments to verify,the different combinations of multiple people used for feature fusion during the pre-training of the initialized feature extractor do not have an impact on the final classifier.An initial pretraining learning rate of 0.001 was adopted.

Results
The 10 subjects were labeled as S1 to S10, and two adjacent subjects made up one group (e.g., S1 and S2 made up group C1, and S3 and S4 made up C2).Table 4 shows only one grouping case to show the fusion of features at different levels of feature quality, group members were interchangeable and tried different combinations.As the parallel feature connection method gives the best classification result with fused features and CNN, Table 4 only lists the classification of different subjects and subject groups in different time windows under parallel feature connection.The three-channel EEG data of the two people in each group were connected in parallel to obtain six-channel EEG data, and the six-channel CNN model was trained using the TL strategy.Table 4 shows that, based on the CNN classifier, the classification accuracy of both single-and two-person feature fusion declines as the time window decreases.Personal characteristics become more marked when the stimulation time is longer.This table compares the classification accuracy results of single-person CNN and five representative results of two grouping types with two people in a group: (1) Feature fusion of subject data with significant and insignificant features.That is, one of them yielded a high classification accuracy, but the other yielded a low classification accuracy.The final result is lower than the best single-person classification accuracy with significant features.(2) Feature fusion of subjects' data with only significant features.As the data features of the two people were both significant, the classification accuracy of the neural network was markedly improved.
Taking the 3-s time window as an example, the 10-person average classification accuracy of the single-person system CNN without TL is only 43.5%, but with TL, it can reach as high as 90.6%.The five-person average classification accuracy of the two-person CNN without TL is only 55.0%, while with TL, it reaches 96.6%.
The 10-person data containing S1-S10 were used as data sets for subsequent experiments and called the collected data set.The results of each model training and prediction are different.The collected data are randomly shuffled, and then feature fusion is performed to calculate the average classification accuracy and ITR through the 10-fold crossvalidation method, as shown in   also significantly outperformed the other two feature fusion methods (p < 0.05).Among these three methods, the parallel feature connection method always ranks first, with the highest classification accuracy and ITR.The serial feature concatenation method and the feature averaging method exhibit similar overall performance, but the feature averaging method is more flexible and requires less computation.It can thus be concluded that feature averaging is better than serial feature concatenation.The optimal scheme to set up cBCI is to train the corresponding parallel feature connection model with the TL method in advance when the number of subjects is known or to train the single-person model with the TL method and then apply the feature averaging method to this trained model when the number of subjects is unknown.
The classification accuracy of different time windows was averaged to obtain the total classification accuracy as shown in Figure 6.The total classification accuracy of the singleperson CNN is 80.4% as baseline, which is far lower than the total classification accuracy of the multi-person CNNbased three feature fusion methods.As can be seen from Figure 6, when the number of participants in the experiment increased, the total classification accuracy of the three feature Total classification accuracy of three fusion methods under different number of participants.The line "a" is the total classification accuracy of the single features.
fusion methods slightly improved.The fusion method of parallel feature connection invariably attained the highest total classification accuracy; the feature averaging method was always noted to be the second best, and the serial feature connection was found to be the worst.Why is the parallel method so good?Upon increasing the number of participants, owing to the increase in the number of features, the CNN used by the three feature fusion methods needs to be slightly modified, mainly by increasing the number of convolution kernels.However, as the parallel feature connection needs to continuously integrate the information of multiple-lead channels through convolution, more convolutional layers are added.It has been proven that the convolution method can integrate the features of multiple individuals and multiple leads in a nonlinear way, which is better than the method of feature averaging.Therefore, if the computing power of the computer allows, it is an excellent cBCI construction method to use the method of parallel feature connection to fuse the EEG features of multiple people and send them into the TL-CNN model.

Discussion
Visually evoked potentials have been extensively studied by researchers (Mary Judith and Baghavathi Priya, 2021).When subjects gaze at flickering visual stimuli with flicker frequencies exceeding 4 Hz, their cerebral cortical activity would be modulated, resulting in a periodic rhythm similar to that of the stimulus (Bondre and Kapgate, 2014).CNN-based EEGNet, which was specially designed for EEG, has been widely applied to classification tasks in various EEG paradigms: e.g., P300 VEP, falsely correlated negatively matched waves, motor-related cortical potentials, and sensorimotor rhythms.In this study, the original EEGNet was modified, and the accuracy of the classification mode was improved by increasing the number of convolutional layers and the number size of convolutional kernel of each convolutional layer.In addition, three different multi-person EEG feature fusion methods are proposed herein to integrate multi-person EEG information to improve BCI performance.Each of the three fusion methods was found to have merits and shortcomings.In summary, in the case of known multi-person BCI collaborations with a fixed number of people, the parallel feature connection method is the best choice because it involves smaller models and fewer training parameters than the serial connection method; also, compared with feature averaging method, it produces higher classification accuracy.When the number of subjects is unknown, the feature averaging method should be chosen, and feature averaging using multiple subjects can be directly applied to a trained singleperson CNN.Serial feature concatenation method is not suitable for the construction of online BCI system.
In this study, a small number of leads (e.g., O1, OZ, O2) were collected for setting up a multiple-person BCI system, which can facilitate the experimental preparation, reduce the subjects' fatigue, and improve the system's execution efficiency.Different from the voting and averaging methods of the existing distributed multi-person BCI systems, a complete single EEG data is constructed through multi-person feature fusion, and a transfer learning-based CNN is used to achieve classification in this new system.Compared with traditional CNN methods, the number of samples to collect is markedly reduced, and the accuracy is slightly improved.Moreover, a neural network with multiple narrow-band convolution kernels is constructed, and a multi-channel and multi-person feature fusion method is set up to extract the corresponding nonlinear features for fusion so as to improve the recognition accuracy of SSVEP-cBCI, further enhancing the classification accuracy and signal-to-noise ratio.The experimental results of the 10 subjects show that the CNN classification fusing two persons' features produces a higher SSVEP-cBCI recognition accuracy, and the TL-CNN-based two-person BCI effectively raises the classification accuracy and the robustness of BCI.The impact of individual differences in single-person BCI systems on system performance stability has been resolved.With the increase in the number of participants in the experiment, the total classification accuracy of the three feature fusion methods has been slightly improved, and the parallel feature connection method invariably exhibits the best performance.
The deep learning SSVEP-cBCI algorithm based on multiperson feature fusion established in this paper has been verified through offline system experiments and can be extended to realtime online systems in the future to complete real-time control of external equipment.Since the SSVEP-cBCI experimental paradigm requires multiple subjects to simultaneously fixate on the same flickering stimulus interface, a method of replicating Frontiers in Neuroscience 09 frontiersin.orgLi et al. 10.3389/fnins.2022.971039multiple monitors was employed, and the collected multiperson EEG data were used for subsequent data processing and identification by a microcomputer.In the case of a fixed number of multi-person BCI collaborations and the computing power is allowed, it is necessary to prepare multiple corresponding number of different participants CNN classification models, perform corresponding feature fusion (parallel feature connection) and classification model training, and use the trained models to complete real-time online experiments.This feature fusion method can maximize the performance of cBCI.If the number of multi-person BCI collaborations is not fixed, or the computing power is not allowed, or there is not enough corresponding CNN classification model established, then the multi-person features can be integrated by the method of feature averaging, and the single-person CNN model can be used to complete real-time online experiments.Parallel feature connection are suitable for high-precision tasks, such as controlling unmanned vehicles, which requires precise control of the vehicle's travel to avoid collision.The feature averaging method fits into rehabilitation centers.Different numbers of patients can send requests at the same time, and as the number increases, medical staff can make corresponding responses more accurately.Serial feature connection can be used as an alternative method to increase the robustness of cBCI.

FIGURE 1
FIGURE 1Centralized cBCI structure designed in this study.
Lawhern et al. (2018) designed EEGNet, a compact CNN specially for EEG signals, that is based on CNN and includes two parts: spatial feature extraction and frequency or time domain feature extraction.It can efficiently extract features from frequency-domain EEG information and send them to a neural network-based classifier, eliminating the need to manually extract two-part features.
The epoch was set to 80 and the mini-batch size of stochastic gradient descent is set to 16. Next, the pretraining model with initial weights was established for the experimental paradigm followed in this study.Based on the pretraining model, the epoch was then reset as 40 for training with the collected experimental data.A very large epoch makes a personal private network and reduces the generalization ability(Pan and Yang, 2009).

Figure 5 ,
respectively.It can be clearly seen from Figure 5 that the classification accuracy results of the two feature fusion methods and feature averaging method based on CNN invariably exceed that of single-person CNN in different time windows.Three multi-person fusion methods based on CNN ITR significantly outperformed singleperson CNN (p < 0.0001).Parallel feature connection ITR

FIGURE 5
FIGURE 5 Accuracy and ITR under different time windows.(A) Classification accuracy for different time windows.(B) ITR for different time windows.

TABLE 1
Single-person convolutional neural network structure.

TABLE 2
Two-person parallel feature connection CNN structure.

TABLE 3
Two-person serial feature connection CNN structure.

TABLE 4
Classification accuracy comparison of single-and two-person models under different time windows.