The Time-Robustness Analysis of Individual Identification Based on Resting-State EEG

An ongoing interest towards identification based on biosignals, such as electroencephalogram (EEG), magnetic resonance imaging (MRI), is growing in the past decades. Previous studies indicated that the inherent information about brain activity may be used to identify individual during resting-state of eyes open (REO) and eyes closed (REC). Electroencephalographic (EEG) records the data from the scalp, and it is believed that the noisy EEG signals can influence the accuracies of one experiment causing unreliable results. Therefore, the stability and time-robustness of inter-individual features can be investigated for the purpose of individual identification. In this work, we conducted three experiments with the time interval of at least 2 weeks, and used different types of measures (Power Spectral Density, Cross Spectrum, Channel Coherence and Phase Lags) to extract the individual features. The Pearson Correlation Coefficient (PCC) is calculated to measure the level of linear correlation for intra-individual, and Support Vector Machine (SVM) is used to obtain the related classification accuracy. Results show that the classification accuracies of four features were 85–100% for intra-experiment dataset, and were 80–100% for fusion experiments dataset. For inter-experiments classification of REO features, the optimized frequency range is 13–40 Hz for three features, Power Spectral Density, Channel Coherence and Cross Spectrum. For inter-experiments classification of REC, the optimized frequency range is 8–40 Hz for three features, Power Spectral Density, Channel Coherence and Cross Spectrum. The classification results of Phase Lags are much lower than the other three features. These results show the time-robustness of EEG, which can further use for individual identification system.


INTRODUCTION
Electroencephalography (EEG), along with the development of neuroscience and computer science, is becoming a new neuroimaging technique that can be used as an alternative method for individual biometric identification (Hema et al., 2008;Chuang et al., 2013). EEG signals reflect individual information about brain anatomy and function, and it can measure the synchronous activity of brain regions (Wolpaw et al., 2000;Rodriguez, 2015). Compared with other biometric identification approaches, such as face, fingerprint, as well as other types of biometric, the EEG-based identification system requires users to be alive and EEG signals are hard to be copied or be hijacked as its sophisticated enough (Wang et al., 2012;Akhtar et al., 2015;Llanos et al., 2019).
Electroencephalography signals were first recorded in 1924 by Hans Berger. The first research on inter-individual variation of EEG signals can track back to 1960s (Davis and Davis, 1936;Berkhout and Walter, 1968), and the relationship between EEG signals and genetic information has been confirmed for the first time (Poulos et al., 1999(Poulos et al., , 2001(Poulos et al., , 2002a. EEG signals can be quantified by different types of effective measures, such as event-related potentials (ERPs), spectra, functional connectivity as well as other parameters. These time-frequency domain measures can evaluate the inter-individual variability of brain activity. It is not easy to obtain inherent features from raw EEG signals as EEG signals are noisy and small amplitude (Nakanishi et al., 2009;Delpozo-Banos et al., 2015). There are some studies on the EEG-based identification system in recent years. Many analytical methods were used to assess the interindividual dependence for different types of EEG (Fraschini et al., 2014;Rocca et al., 2014;Alariki et al., 2018). Restingstate is a promising condition used as a biometric for individual identification as it generates synchronous oscillations in specific frequency ranges and compared with other acquisition protocols, it reduces fatigue and artifact since it does not require the active involvement of participants. Lots of studies focus on resting-state of eye open (REO) and closed (REC), and the studies indicated that resting-state EEG carrying interesting information in specific sub-bands have shown significant inter-individual difference especially using related spectral analysis (Abo-Zahhad et al., 2015;Busonera et al., 2018;Chan et al., 2018). Power spectrum of each single electrode can represent the brain oscillation in terms of physiological and cognitive functions (Ramaswamy and Mandic, 2007;Di et al., 2019), and it constitutes inherent information of each region through each channel in different frequency bands (Nakamura et al., 2017). Functional connectivity is another method which captures linear or nonlinear statistical dependencies between distinct channels.
Previous studies pay more attention to the difference of interindividual variance in one experiment and did not focus on the stability over time for individual identification (Pozo-Banos et al., 2014;Crobe et al., 2016;Zeng et al., 2018). But some features are susceptible to noise that can only be used for intraexperiment data. Therefore, the time-robustness of features used for individual identification is more important when using in the practical identification system (Arnau-Gonzalez et al., 2017;Schetinin et al., 2018).
In this work, we conducted three runs experiments and proposed four feature extraction methods. There are three sessions of REO and REC with time interval of 20 min in each experiment and at least 2 weeks for every two experiment. Support Vector Machine (SVM) was used as the classifier to verify whether the difference between participants and the similarity for different trials of the participant in each run or each fusion run. Then we assessed the stability and time-invariant for individual identification based on inter-run EEG data. Some frequency ranges were chosen to find an optimal frequency range that can obtain a better performance in the frequency range of 1-40 Hz. The results reveal that there is stability and time-robustness of features that we proposed for individual identification based on resting-state EEG data.

Participants
There are 10 participants (6 males) involved in the experiment, with average age of 21(±3). They are volunteers from Tianjin University. Participants have signed the consent form that include notice and individual right before the beginning of first experiment. The study is approved by local ethical committee at Tianjin University. Three sessions are recorded following by 20 min internals in which subjects conduct others protocols. Three run experiments were conducted for each participant and the time interval of runs is at least 2 weeks. The experiment procedure is shown in Figure 1, and the detail of three experiments is shown in Table 1.

Pre-processing
Pre-processing, including down-sampling, re-reference and filtering, is used for EEG data. Firstly, the raw data was downsampled from 1,000 to 100 Hz, and re-referenced to the mean of ear mastoids ((M1+M2)/2). Then, a bandpass filter of 1-40 Hz was applied. Finally, the data (450s) were epoched into 450 segments (1-s per segment) for each participant in each run.

Power Spectral Density
Power Spectral Density (PSD) is a non-parametric spectrum analysis that describes the distribution of a signal over frequency for stationary random process (Campisi and Rocca, 2015;Wang and Najafizadeh, 2016). The periodogram P (ω) is defined as: Where x n represents the EEG signal and fn is samples per unit time. t is the sampling interval.
The modified periodogram multiplies the series by a window function in order to reduce the leakage in the periodogram. The modified periodogram is defined as: h n x n e −j2πf n 2 , − 1 2 t < f ≤ 1 2 t FIGURE 1 | Experimental procedure. Where h n is a suitable window function and t is the sampling interval.
In this work, we use Welch's method to estimate the PSD of EEG signal. Welch's average estimation is a method based on modified periodogram. It divides the signal into overlapping segments and averages the estimates that are computed by modified periodogram. This method reduce variance of periodogram by averaging. Hamming Window was used and overlap was set as 0.5. The number of FFT is set as 100 (frequency sampling of signal is 100 Hz). Each segment was characterized by feature vector of PSD, which the size is N ch × N f . N ch = 18 represent the number of channels we used and N f = 40 represent the frequency points from 1 to 40 Hz. There are 450 feature vectors of PSD for each participant in each run.

Cross Spectrum Analysis
In this part, we estimate the spectral connectivity between channels and compute three features, amplitude spectrum, channel phase lag and channel coherence, to describe the spectrum connectivity between channels (Ghorbanian et al., 2013;Valizadeh et al., 2019). Cross spectrum is a frequency analysis of cross-correlation between two time series. The cross power spectral density is the distribution of power per unit frequency. It is defined as: Where R xy (m) is cross-correlation sequence and is defined as: The complex cross spectrum is obtained through each channel pair. Then we compute the amplitude spectrum and phase lag respectively. The size of amplitude spectrum for each segment is N p × N f , where N p = 171 means all channel pairs and N f = 40 means the frequency points from 1 to 40 Hz. There are 450 feature vectors of amplitude spectrum for each participant in each run. The size of phase lag is as same as the size of amplitude spectrum. There are 450 feature vectors of phase lag for each participant in each run.
Coherence estimate is a function which describes how well x corresponds to y in each frequency, with values 0 to 1. P xy is cross power spectral density and P xx , P yy are power spectral density.
The coherence is defined as: Where x and y represent two channels EEG data. The result shows the correlation between two channels at each frequency. The size of channel coherence is N p × N f , where N p = 153 represents all channel pairs between channels (exclude selfchannel coherence) and N f = 40 represents the frequency range from 1 to 40 Hz.

Pearson Correlation Coefficient
Pearson correlation coefficient (PCC) is a statistic method that can measure the correlation between two variables X and Y. Given a pair of variables X and Y, the PCC is defined as: Where cov is the covariance, σ X is the standard deviation of X and σ Y is the standard deviation of Y. µ is the mean and E is the expectation.

Support Vector Machine
Support Vector Machine (SVM) is a supervised learning method for classification or regression in machine learning (Chang and Lin, 2011;Hong et al., 2013). We are given a dataset of n points X = {X 1 , X 2 , · · · ,X n } and class labels Y = {y 1 , y 2 , · · · , y n }, where Y ∈ {+1, − 1}, indicating the class of point X. The hyperplane is to divide the group of points X i for which y i = 1 from the group of points X i for which y i = −1. It is defined as: Where ω represent the vector of the hyperplane. Support Vector Machine is a maximum-margin classifier so we can select two hyperplanes that separate the two classes of data. These two hyperplanes can be described as: The distance between two hyperplanes is 2 ||ω|| . In order to maximum the distance between the hyperplanes, we can minimum ω. It can be described as: the paradigm is based on PsychtoolBox in Matlab and the preprocessing of EEG data is based on EEGLAB in Matlab (Brunner et al., 2013). All programming codes of feature extraction and classification were written in Matlab.

RESULTS AND DISCUSSION
Biometrics is a heated topic and EEG-based biometric system which draw more attention in a few years. Although there are some researches about the EEG-based biometrics system, most of them just focus on the difference between participants in a single experiment, and ignore the stability and timerobustness of inter-experiments data independently (Koike-Akino et al., 2016;Wu et al., 2018;Özdenizci et al., 2019), which is much more important.
In this section, the relevant results are shown for all participants based on resting-state (REO and REC) EEG signals. Both four features extraction approaches which are described in Section-II are used in this section to investigate the stability of intra-run and inter-runs features. Figures of extracted features are visible in Section III-1, and related classification results for interrun and intra-runs features are showed in Section III-2 and III-3. Moreover, our mainly goal is to assess the stability and reliability of EEG features. We estimate spectral information of each single channel and functional connectivity with channel pairs by different feature extraction methods according to the previous works that spectral density of single channel and coherence measures of channel pairs can be useful features for identification with high accuracy (Rocca et al., 2014;Di et al., 2019;Valizadeh et al., 2019). In this work, we used the approaches which were given in Section II to obtain the features, and randomly selected 4 participants from all 10 participants to show the difference of features, visually. The method of PCC is used to measure the linear correlation for each feature, and the classifier of SVM is used to obtain the classification accuracy.

Features
In this part, values of four features of each participant are presented to show the difference. Power Spectral Density, Cross Spectrum, Phase Lags, and Channel Coherence, ξ PSD , ξ spectrum , ξ phase and ξ COH , are obtained refer to previous methods in Section II. In this work, each feature has 450 trials for REO and REC, and in order to reduce the noise, 90 trials for each condition of each participant were obtained by averaging every five trials. The intra-run coefficients are also calculated in this part for correlation analysis. Here we use Fisher's Z transformation to the Channel Coherence and logarithmic transformation to the PSD and Cross Spectrum values (Valizadeh et al., 2019).
The values of PSD, Cross Spectrum, Channel Coherence, and Phase Lags are visible in Figures 2-5, respectively. Four participants were randomly chosen for each feature. The X-axis represents the frequency range from 1 to 40 Hz, and Y-axis represents each single channel or channel pairs. The upper and bottom in Figures 2-5 show the condition of REO and REC, respectively. The change in color from yellow to blue corresponds to change of value from large to small. Power Spectral Density can reflect the brain activity for the position of EEG channels over scalp. All 18 channels are calculated for PSD. Cross Spectrum, Phase Lags, and Channel coherence can reflect functional connections of channel pairs. In this work, we get 171 channel pairs overall, with frequency ranges from 1 to 40 Hz, for Cross spectrum and Phase Lags of each participant, and 153 channel pairs (exclude 18 self-channel pairs), with 1-40 Hz, for Channel Coherence of each participant.
The values of PSD for participants on REO and REC are visible in Figure 2. As we can see, there is a numerical difference    between participants for REO and REC, respectively. The values of 1-10 Hz are higher than other frequency ranges for each channel of REO, and for REC, the values of 1-15 Hz are higher than other frequency ranges. Figures 3, 4 show similar conclusion. The feature values of 1-10 Hz are much higher than other frequency ranges for REO and REC, and for each figure of the same participant, from figures, we can see that a little less difference between REO and REC, except the frequency range of 10-15 Hz, in which the values of REC are much higher than that of REO. As for the feature of Phase Lags, there is distinct between participants, and the values of frequency range around 10 Hz are positive for REO, in which the values are negative in the same frequency range for REC.
The above shows the difference of intra-run data visually and statistically. Moreover, Pearson correlation coefficients (PCC) are calculated to show whether features of intra-subject have the similarity and features of inter-subjects have the difference statistically, respectively. In this part, as before, every five trials of each feature were averaged and finally got 90 averaged trials. The PCC results of PSD, Cross Spectrum, Channel Coherence, and Phase Lags are visible in Figure 6. The X-axis and Y-axis represent trials for all participants of the same experiment, and number of 1 to 10 represent the subject number. The coefficient values are ranged from −1 to +1, in which close to '0' represents lower correlation and close to "(±)1" represents higher correlation (positive or negative) of intra-run. The upper in Figure 6 shows the condition of REO. The bottom of Figure 6 shows coefficients for the condition of REC. To show the contrast significantly, the minimum values of figures were changed.
From the figures we can see that the diagonal of each figure, which means the intra-run correlation coefficients for each subject, shows a more significant correlation than the correlation of different subjects, although four features show the correlation of intra-run data in a different level. It seems that the correlation of two features, PSD and Channel Coherence, is more significant than the other two features, and the correlation of Phase Lags is less more significant in four features for intra-run data.

Classification Results
In this part, the classification results are shown using SVM as the classifier. The 10-fold cross-validation is used to obtain the average accuracies. Three runs are defined as RUN1, RUN2, and RUN3, respectively, and we also define four fusion runs which consist of three experiment data as F-RUN, in which F-RUN1 consist of data of RUN1 and RUN2, F-RUN2 consist of data of RUN1 and RUN3, F-RUN3 consist of data of RUN2 and RUN3, F-RUN4 consist of data of RUN1, RUN2, and RUN3. We divide the F-RUN into two sets, train set and test set, which both include part of two or three runs data. Table 2 shows the classification results of four features comprised PSD, Cross Spectrum, Channel Coherence, and Phase Lags, for two protocols of REO and REC, to investigate the stability for intra-run and fusion-runs.

Intra-Run
The classification results of intra-run and fusion-runs data are obtained using SVM. The results revealed in Table 2. The lowest accuracy can reach 80% and the highest accuracy can reach 100%. The accuracies of three features, PSD, Cross Spectrum and Channel Coherence, are approximately equal for intra-run or fusion-runs data on REO and REC. The classification results of Phase Lags based on REC for fusion-runs data, which only reach 80%, are lowest in the table, compared with other results. From the results of the table, given the interfere of noise, it seems that the features we used in this work are distinct for intra-run and fusion-runs data between different subjects.

Inter-Runs
The primary task of this work is to assess the stability and timerobustness of each feature we used for inter-runs EEG data. Further, we test the features of inter-runs respectively.
In this part, we mainly show the results of inter-runs classification. Here we define three conditions and investigate the time-robustness and stability of inter-runs features, independently. The conditions are: (1) Using RUN1 and RUN2 as train set and validation set, and RUN3 as test set; (2) Using RUN1 and RUN3 as train set and validation set, and RUN2  as test set; and (3) Using RUN2 and RUN3 as train set and validation set, and RUN1 as test set. We named these as COND1, COND2, and COND3, respectively, and use the abbreviations in the content behind. The classifier of SVM is used for all three conditions to show the stability of inter-runs features. The classification results of different features for inter-runs data, which are based on REO and REC, are visible in Tables 3-6. In this part, 13 frequency ranges were chosen as shown in tables. Four familiar frequency ranges refer to brain activity are used, including θ (4-7 Hz), α (8-13 Hz), β(13-20 Hz, 20-30 Hz), and a part of γ (30-40 Hz). The classification results of some combined ranges, including [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][8][9][10][11][12][13][14][15][16][17][18][19][20][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] Hz, are calculated, and the classification result of original range (1-40 Hz) is also calculated as a benchmark compared with the results of others. Table 3 shows the classification results of inter-runs PSD for REO and REC. As we can see that the results of the frequency range of 4-7 Hz are lowest (1, 18, and 20%) for three conditions on REO and REC, and the highest average result is at the frequency range of 13-40 Hz on REO, which can reach up to 84%. The results at 4-7 Hz, 8-13 Hz, and 4-20 Hz are lower than the results at 1-40 Hz, and the results of 4-30 Hz are equal to the results at 1-40 Hz, approximately. The results of the frequency range at 13-20 Hz, 20-30 Hz, and 30-40 Hz are higher than the results of 1-40 Hz, which means that the frequency ranges of these three ranges consist of inherent information about the difference between participants. Next, compared with the results of 4-20 Hz and 4-30 Hz, the results of frequency ranges at 8-20 Hz and 8-30 Hz are significantly increased. Considering the poor results of the frequency range at 4-7 Hz, it is believed that the frequency range at 4-7 Hz of PSD does not have the stability for identification. Compared with the results of frequency ranges at 8-30 Hz and 8-40 Hz, the results we obtained at 13-30 Hz and 13-40 Hz have increased. Therefore, we think that the frequency range of PSD that contains more stability information for inter-run data is 13-40 Hz. The optimized frequency range is at 13-40 Hz for REO, in which the average accuracy can reach 82.33%.
As for REC, the lowest accuracies are at 4-7 Hz, which are 1, 10, and 16%, for three conditions, respectively, and the highest average accuracies are at 8-40 Hz, which can reach 80%. Compared with the results of the frequency range at 1-40 Hz,    results of frequency ranges at 13-30 Hz and 13-40 Hz, which are higher than the results of 4-20 Hz and 4-30 Hz. Like the results of REO, the frequency range of 4-7 Hz contain less information about the stability for inter-runs feature of PSD, but other than the results of REO, the frequency range of 8-13 Hz seems to be related to inherent information for identification. Therefore, the classification results of REC show that it seems the frequency range at 8-40 Hz contains much information that can be used as an optimized frequency range of PSD for identification. Table 4 reveals the classification results of Cross Spectrum for inter-runs on REO and REC. From the table we can see that the accuracies of the frequency range at 1-40 Hz are much lower, which are around 50% for REO and around 30% for REC. For the results of REO, the lowest accuracies are at 4-7 Hz, which are 2, 10, and 10% for three conditions, respectively, and the highest average result is at frequency range of 13-40 Hz, which can reach 82.33%. The results of frequency ranges at 13-30 Hz and 13-40 Hz are higher than the results at 8-30 Hz and 8-40 Hz, which are higher than the results of 4-30 Hz. Like the results of PSD on REO, the frequency ranges of 4-7 Hz and 8-13 Hz of interruns are not suitable for individual identification. It seems that the frequency range of 13-40 Hz is an optimized range that can be used for inter-runs classification.
For classification results of Cross Spectrum on REC, the lowest results are at 4-7 Hz, which are 2, 10, and 7%, respectively. The highest results are at 8-40 Hz, which is as same as the frequency range of PSD on REC. The accuracies of Cross Spectrum on REC at 8-30 Hz and 8-40 Hz are higher than the results of frequency ranges at 13-30 Hz and 13-40 Hz, which are higher than the results of frequency range of 4-20 Hz and 4-30 Hz. Therefore, like the conclusion we obtained from PSD of REC, the frequency range at 8-40 Hz is an optimized range for interruns identification, which is much higher than the results at 1-40 Hz that the accuracies are only 20, 28, and 43% for three condition, respectively.
The classification results of Channel Coherence are visible in Table 5. As we can see that the highest accuracy can achieve 79% for REO, and 83% for REC. The lowest accuracy is less than 10% for REO and REC. The classification accuracy is lower when frequency range include the range of 4-7 Hz, such as 1-40 Hz, 4-20 Hz, 4-30 Hz, and 4-7 Hz. The result of 4-7 Hz is lowest than results of other frequency ranges. The results of 4-20 Hz and 4-30 Hz are significantly lower than results of 8-20 Hz and 8-30 Hz. These results show that frequency range of 4-7 Hz contain more irrelevant information than other frequency ranges for REO and REC.
For classification results of REO, the results of three frequency ranges, which are, 13-20 Hz, 20-30 Hz, and 30-40 Hz, are higher than frequency range of 1-40 Hz for three conditions, and it seems that each of these frequency ranges may contains part of information about individual stability and time-invariant. Results of combined frequency ranges (13-30 Hz and 13-40 Hz) show higher classification performance than other frequency ranges, which can reach 80% for three conditions, Therefore, there is no doubt that 13-40 Hz is a more appropriated frequency range of REO for inter-runs classification of Channel Coherence which can be used in individual identification. For classification results of REC, the highest average accuracy is at frequency range of 8-40 Hz, which can reach 80%, and the lowest accuracy is at 4-7 Hz. The frequency range of 8-13 Hz for REC seems contain some more related information about stability and time-invariant than that for REO. The appropriate optimized frequency range of REC is 13-40 Hz.
The classification results of Phase Lags show in Table 6. The results of all chosen frequency ranges show poor performance for inter-runs classification, and the highest accuracy only reach 60%, which is much lower than the classification results of other three features. Unlike the other three features, optimization of frequency range cannot get a satisfied performance for inter-runs classification. The classification results obtained for inter-runs data also much lower than the results we obtained for intra-run and fusion-runs classification, which can reach 80% or higher. Therefore, it seems that Phase Lags is not the useful feature of inter-run data for individual identification.
There are some limitations in this study. First, the number of sample size is relatively small. Considering it is a pilot study, the further study needed to verify the reliability of results. Second, the sex differences may influence the results and it will be investigated with extending the number of sample size in the further study.

CONCLUSION
In this paper, we mainly analyze the stability and time-robustness of resting-state EEG features for individual identification. The number of participants is 10 and three runs are conducted for each participant. The time interval between each experiment is at least 2 weeks.
The results show that: (1) The similarity of intra-individual and the difference of inter-individual for intra-run features based on REO and REC. Perfect classification results for intra-run and fusion-runs features on REO and REC.
(2) For inter-runs features classification of REO, the optimized frequency range is at 13-40 Hz for three features, which are PSD, Cross Spectrum and Channel Coherence. For inter-runs features classification of REC, the optimized frequency range is at 8-40 Hz for three features, which are PSD, Cross Spectrum and Channel Coherence. The classification results of Phase Lags are poor for REO and REO, and it seems not to be used for individual identification.
(3) The results suggested that features of PSD, Channel Coherence and Cross Spectrum are stability and time-invariant that can be used for individual identification and will help to develop a more stable identification system based on EEG data.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Tianjin University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.