A Deep Learning Approach for Mild Depression Recognition Based on Functional Connectivity Using Electroencephalography

Early detection remains a significant challenge for the treatment of depression. In our work, we proposed a novel approach to mild depression recognition using electroencephalography (EEG). First, we explored abnormal organization in the functional connectivity network of mild depression using graph theory. Second, we proposed a novel classification model for recognizing mild depression. Considering the powerful ability of CNN to process two-dimensional data, we applied CNN separately to the two-dimensional data form of the functional connectivity matrices from five EEG bands (delta, theta, alpha, beta, and gamma). In addition, inspired by recent breakthroughs in the ability of deep recurrent CNNs to classify mental load, we merged the functional connectivity matrices from the three EEG bands that performed the best into a three-channel image to classify mild depression-related and normal EEG signals using the CNN. The results of the graph theory analysis showed that the brain functional network of the mild depression group had a larger characteristic path length and a lower clustering coefficient than the healthy control group, showing deviation from the small-world network. The proposed classification model obtained a classification accuracy of 80.74% for recognizing mild depression. The current study suggests that the combination of a CNN and functional connectivity matrix may provide a promising objective approach for diagnosing mild depression. Deep learning approaches such as this might have the potential to inform clinical practice and aid in research on psychiatric disorders.


INTRODUCTION
Depression is a global public health problem, which has a relatively high lifetime prevalence, ranging from 2 to 15%, and is associated with significant morbidity (Ustun and Chatterji, 2001;Üstün et al., 2004). According to the latest data from the World Health Organization (2017) 1 , more than 300 million people are now living with depression 1 . Presently, the most widely used methods for depression diagnosis are based on Beck's Depression Inventory (BDI), the patient's self-report, the doctor's clinical experience, or some combination thereof. However, the accuracy of this diagnosis is often influenced by the doctor's proficiency and patient's cooperation, both of which are highly subjective. Critically, a subset of depression-mild depressionreceives far less attention than does depression, despite being more common than depression and often increasing in severity over time (Volz and Laux, 2000). This lack of attention leads to missed early detection and treatment and increases the mortality risk and likelihood that mild depression will evolve into major depression (Fogel et al., 2006;Lyness et al., 2006). Additionally, mild depression is not only a mental illness but also often a social problem (Li et al., 2016b). Therefore, studies of methods that might improve the early detection and treatment of mild depression are both necessary and meaningful.
With the emergence of more and more studies of depression using functional brain imaging, it is becoming clear that this psychopathology might be relevant to the distributed properties of large-scale cortical systems across many functionally connected cortical regions (Spence, 1995;Dalgleish, 2004;Wang et al., 2012). Recent studies have provided further evidence that depressed individuals tend to have altered brain functional connectivity, such as significantly decreased functional connectivity between right posterior insula and a series of sensory cortices (Hu et al., 2019), decreased brain activity in the dorsolateral prefrontal cortex, superior temporal gyrus, posterior precuneus, and posterior cingulate (Zhong et al., 2016), and increased subgenual cingulate-thalamic connectivity (Ajilore et al., 2015). To examine the functional connectivity of the brain, various metrics have been used, including coherence, correlation, phase locking value, and phase lag index, among others.
Electroencephalography (EEG) coherence is an effective measure of functional cortical connectivity and is used to calculate linearly dependent interactions between the frequencies of EEG signals derived from two electrodes or brain regions (Andrew and Pfurtscheller, 1996;Pfurtscheller and Andrew, 1999). This measure results in a symmetrical, two-dimensional (2D) matrix. High coherence between two EEG signals reflects synchronized neuronal oscillations (suggesting functional integration between neural populations), while low coherence indicates independently active populations (suggesting functional segregation) (Murias et al., 2007). Using EEG coherence to study the brain activity of patients with depression (Li et al., 2017), as well as those with Alzheimer's disease (AD) (Wada et al., 1998) and Parkinson's disease (Teramoto et al., 2016), has been quite successful. For example, Murias et al. (2007) observed an elevated EEG theta coherence in the frontal and temporal regions of the left hemisphere in individuals with autism spectrum disorder. In a study that used EEG coherence to assess resting state functional connectivity, Teramoto et al. (2016) found a decreased resting state functional connectivity between the frontal and parietal cortices, especially in the left hemisphere, of patients with Parkinson's disease. Yingjie et al. (Li et al., 2015) further used EEG coherence to investigate differences in brain functional networks between patients with depression and healthy controls, while they were processing emotional stimuli. The authors found that global EEG coherence in the gamma band was significantly higher in patients with depression than it was in healthy controls.
Correlation is an alternative method of calculating functional connectivity matrices and is often used to estimate the level of linear dependence between two electrode channels. In a study by Zhang et al. (2018) that measured functional connectivity, the authors calculated the Pearson correlation coefficients of the power spectral densities for the delta, theta, alpha, beta, and gamma bands. Following this, they constructed a square correlation matrix for each participant and each frequency band. Finally, the functional brain networks of healthy controls and patients with major depressive disorder (MDD) were constructed using the correlation matrix. The results revealed that compared to healthy controls, the patients with MDD showed significant randomization of the global network metrics.
A statistical method known as "phase synchronization" can also be used to measure coordinated activation across different brain regions. In biological signals, such as EEG time series, synchronization is measured by calculating the phase locking value (PLV). Mormann et al. (2000) reported on the application of phase synchronization methods to biological time series data representative of brain electrical activity in patients with epilepsy. They observed characteristic spatial and temporal shifts in synchronization that appeared to be strongly related to pathological activity. In particular, the authors reported distinct differences in the degree of synchronization between recordings from seizure-free intervals and those from the intervals just before an impending seizure.
The phase lag index (PLI) is another measure of asymmetry in the distribution of phase differences between two signals. The PLI is an alternative measure of statistical interdependencies between time series that reflects the strength of their coupling by detecting consistent, non-zero phase lag between the two times series. Stam et al. (2007) found that PLI performed well for detecting relative increases in synchronization between the pre-seizure and seizure epochs. In addition, upon analyzing the EEG signals, the authors found that the average PLI in the beta band was significantly lower in patients with AD (15 subjects) than it was in healthy controls (15 subjects).
The current mainstream method for studying brain functional connectivity is to convert a functional connectivity matrix into a graph via graph theory analysis. After completing a characterization of the topological properties of the graph, the clustering coefficient and characteristic path length-two indices that characterize a graph and correspond to the two basic principles of brain functional organization, namely, functional separation and integration (Friston, 2009), respectively-are used to distinguish between patients with neurological disorders and healthy controls. In addition, these two indices can comprehensively reflect the small-world characteristics of the network. Randomization of network topology and distribution of the small-world network architecture have been consistently shown in AD, schizophrenia, and depression. Stam et al. (2006) found that the characteristic path length in AD patients was significantly longer than that in healthy controls and demonstrated that AD is characterized by loss of smallworld network characteristics. Rubinov et al. (2009) found a randomization of the small-world network structure in schizophrenia. One EEG study showed that the loss of small world characteristics in the sleep functional brain network in MDD indicates a disruption of topological organization caused by this disease (Zhang et al., 2011). In addition, Leistedt et al. (2009) reported that the healthy controls' neural network is closer to the ordered part of the rewiring scale, while the depressed patients' brain network during sleep is closer to the random part of the scale. Although changes in brain function connectivity in MDD have been known, the functional brain network structure of mild depression is unclear. Therefore, we analyzed the functional brain network of mild depression through graph theory.
Deep neural networks have recently achieved great success in the widespread application of large-scale image- , video- (Karpathy et al., 2014), and text-based recognition tasks (Hermann et al., 2015;Zhang and Lecun, 2015). Convolutional neural networks (CNNs) lie at the core of the best current architecture processing methods for both image and video data, primarily due to the advantages of CNNs in processing 2D input data (Cecotti and Graser, 2011). CNN has good performance in the field of biological image classification, and the features learned from CNN are often better than handcrafted features (Bello-Cerezo et al., 2019). The studies of Nanni et al. (2019a;2019b) show that ensemble system of handcrafted and learned features can boost the performance of CNN in bioimage classification. In neural signal classification, several studies have used different methods to convert EEG signals into image representations. One, in particular, used the short-term Fourier transformation method to convert EEG time series into 2D images and combined 1D CNNs and stacked autoencoders to classify EEG motor imagery signals (Tabar and Halici, 2016). This approach yielded a 9% improvement over the winning algorithm. A new representation of EEG signals was proposed by Bashivan et al. (2015) that preserves the structure of EEG data across space, time, and frequency bands. In this approach, 3D electrode locations were projected onto a 2D surface using azimuthal equidistant projection, and the spectral power within three prominent frequency bands was extracted for each location. This was then used to form topographical maps that were subsequently combined to form three-channel images. Finally, these three-channel images were input into a deep convolutional recurrent neural network to further classify the EEG signals. Inspired by these studies and the powerful ability of CNNs to process 2D data, we innovatively applied a CNN to the 2D data form of functional connectivity matrices and constructed a classification model for mild depression as a depression recognition method other than graph theory.
In addition, research on facial emotion processing plays a significant role in the study of emotion and cognition in patients with depression. For example, David et al. (Rubinow and Post, 1992) conducted an experiment where 17 patients with depression and 31 healthy controls were asked to recognize seven affective states within images of facial expressions. The results revealed that patients with depression were significantly impaired in their ability to recognize facial affect; specifically, they made significantly (or nearly significantly) fewer correct matches for sad, happy, and interested face items. Similarly, James et al. (Cavanagh and Geisler, 2006) examined mood-relevant emotion processing in individuals with depression using eventrelated potentials. Mixed-model analyses of variance revealed significantly reduced P3 amplitudes and P3 latencies for happy faces in participants with depression. The authors interpreted these findings as providing evidence for a diminished cognitive processing ability during emotion discrimination in individuals with depression. Collectively, these studies reveal differences in the brain activity of patients with depression and healthy individuals during common face-processing tasks. Therefore, in the current study, we used a facial expression paradigm to investigate differences in brain functional connectivity between individuals with mild depression and healthy controls.
In the present study, we first studied the differences in the brain functional network between the mildly depressed group and the normal control group using graph theory. We calculated the four functional connectivity matrices of coherence, correlation, PLV, and PLI and converted them into binary undirected graphs. We calculated the characteristic path length, clustering coefficient, and the small-world properties of the brain functional network of the two groups to investigate the differences in these indices between two groups. Then, we proposed a novel approach aimed at improving the ability to recognize individuals with mild depression. Inspired by the proven utility of CNNs in image processing, we used a CNN to process EEG signals. EEG time series signals were converted into images through calculating functional connectivity matrices, and then these images from the two groups were used to classify individuals with mild depression and healthy controls in a CNN.

Coherence Analysis
Coherence is defined as the spectral cross-correlation between two signals normalized by their power spectrum. Coherence is computed mathematically as: where n is the number of data points in a trial. A and ϕ are the amplitude and phase of the signal, respectively. The numerator term represents the cross-spectral density on a single trial between the signals x and y at frequency f. The denominator represents the square root of the product of the power estimates on a single trial of the signals x and y at frequency f. The coherence can then be concisely defined as: where Sxy(f) is the cross-spectral density of the signal, and Sxx(f ) and Syy(f ) are the power spectral density of signals x and y, respectively. Values of Coh xy will always satisfy 0 ≤ Coh xy (f ) ≤ 1, where 0 represents no coupling, and 1 indicates maximum linear interdependence between two signals.

Correlation Analysis
Correlation (Pearson correlation coefficient) is used to estimate the level of linear dependence between two electrode channels in the time domain. The correlation is given by the following: where Cov(x, y) is the covariance between electrodes x andy, σ x and σ y are the standard deviations of the electrodes x and y, respectively. Corr xy has a value between 1 and −1, where 1 is a total positive linear correlation, 0 is no linear correlation, and −1 is a total negative linear correlation. The greater the absolute value of Corr xy , the stronger the correlation.

Phase Locking Value Analysis
The PLV (Mormann et al., 2000) assumes that the signal amplitude and phase are statistically independent, and thus, only the phase synchronization is used to estimate a possible functional interaction between the EEG signals of two channels. When Ax(.,.) = Ay(.,.) = 1 in Eq. (1), the PLV is obtained (Lachaux et al., 1999), as follows: The value of PLV is between 0 and 1, where 0 represents a lack of synchronization, and 1 represents perfect phase synchronization.

Phase Lag Index Analysis
The PLI (Stam et al., 2007) is a measure of the asymmetry of the distribution of phase differences between two EEG signals and is defined as follows: where n is the number of data points, and φ x (f , k) − φ y (f , k) represents the phase synchronization between the signals in channels x and y at the frequency f. It is essential to know the instantaneous phase of the two signals involved, which can be achieved using the analytical signal based on the Hilbert transform (Bruns, 2004), to compute the phase synchronization.
The PLI xy (f ) ranges between 0 and 1, where 1 indicates perfect phase synchronization and 0 indicates either no coupling or coupling with a phase difference centered around 0 mod π. PLI, contrary to the other methods, basically disregards perfect coupling (zero phase) or phase opposition coupling, which cannot be distinguished from no coupling. Four kinds of metrics, coherence, correlation, PLV, and PLI, were calculated using all 128 EEG channels through the Neurophysiological Biomarker Toolbox 2 . For each trial, four 128 × 128 matrices were obtained (128 was the number of EEG channels). For each kind of metric, we considered the following EEG bands: delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (30-70 Hz).

Complex Network Analysis
Applying graph theoretical analysis to a functional connectivity matrix is to convert the matrix into a binary undirected graph. The functional connectivity matrix can be converted to a graph by considering a threshold T. If the functional connectivity metric between a pair of channels exceeds T, then there is an edge, otherwise there is no edge. The choice of T has an important impact on the constructed graph, i.e., a low T-value will result in a densely connected network, while a large T-value will result in a sparse network. Since there is no unique method to select the appropriate threshold, we studied the entire range of values of T (0 < T < 1, with increments of 0.025 for coherence, correlation, and PLV and 0.005 for PLI) and repeated the full analysis for each value of T.
Once we converted the functional connectivity matrix into a graph, the next step is to characterize the graph in terms of its characteristic path length L and its cluster coefficient C. These two indices correspond to the two basic principles of brain function organization, that is, functional integration and segregation (Friston, 2009). Functional integration reflects the brain's ability to organize and combine information from different brain regions, and could be well measured by the characteristic path length L, which is obtained by calculating the average shortest path length between all pairs of nodes. Functional segregation reflects the ability to process information in a specialized way. Clustering coefficient C is often used to measure functional segregation. "C quantifies the number of connections between the nearest neighbours of a node as proportion of the maximum possible number of connections" (Strogatz, 2001). After clustering coefficients of all nodes are calculated, the average is taken as the clustering coefficient of the network.
To establish the network topology characteristics, the ratios C/Cr and L/Lr are calculated as a function of the threshold T, where Cr and Lr represent the values of C and L for matched random networks. Random networks have the same nodes and connectivity as the original network, whereas their choice of connected nodes is completely random. Following the previous studies (Watts and Strogatz, 1998;Stam et al., 2006;Li et al., 2015) that generated 20 random networks, 20 matched random networks are generated here for each actual network of each subject, and their average Cr and Lr are calculated to compare with the actual network. When values of C/Cr are significantly greater than 1, while the values of L/Lr are close to the value of 1, the small-world network organization is evident. In addition, Humphries et al. (2005) defined a single scalar index S to quantify the "small-world-ness" of a network, S = γ λ = C/C r L/L r , and a value of S greater than 1 is often used as a simple indicator of smallworld organization.
The software used in this study was the Brain Connectivity Toolbox (Rubinov and Sporns, 2010) for calculating two graph theoretical measurements.

Input
We used only the upper triangular part of the 128 × 128 functional connectivity matrices due to the symmetry of matrices and computational efficiency of neural networks. This triangular part contains 8,256 elements, of which 128 are on the diagonal. As the elements on the diagonal are the coherence/correlation/PLV/PLI values between an electrode and itself, we excluded these 128 elements. Since the input to the CNN is generally square, we organized the remaining 8,128 elements into a square and made the size of the square as small as possible. In addition, considering that the functional connectivity metric located in the last part of the upper triangle is calculated based on the electrodes distributed on the face, they are more affected by myoelectricity and ocular electricity. We, thus, organized the 8,128 elements line by line into a 90 × 90 matrix and discarded the last 28 elements, which are calculated based on the eight electrodes E121, E122, E123, E124, E125, E126, E127, E128. The transformation process is depicted in Figure 1. A new 90 × 90 matrix was transformed into a single-channel image and served as the CNN's input. It is worth mentioning that the elements in the functional connectivity matrix are arranged according to the increasing electrode number, and the neighborhood relationship between the electrodes did not represent the neighborhood information of 128 electrodes on HydroCel Geodesic Sensor Net. In other words, the location information of the electrodes was not preserved in the functional connectivity matrix. In addition, studies such as those mentioned in this review (De Aguiar Neto and Rosa, 2019) involve little information on the location of the electrodes when performing depression recognition. Similarly, in our study, the key piece of information for depression recognition is the functional connectivity value between the electrode pairs rather than the location of these values. So the element rearrangement during this transformation process will not have any obvious influence on mild depression recognition.
In addition, inspired by the work of Bashivan et al. (2015), we merged the 90 × 90 single-channel functional connectivity matrices into an image with three channels. We used the three bands with the best classification performance for each functional connectivity matrix as the red, green, and blue channels to generate a three-channel image that was then fed into the CNN (Figure 2).

Architecture
It is known that CNNs are multi-layer neural networks with several convolution-pooling layer pairs and a fully connected output layer (Tabar and Halici, 2016). A standard CNN (LeCun et al., 1990) is designed to recognize the shape in the images. As shown in Figure 3, the CNN model used here mainly included the following types of layers: convolution, pooling, activation function, fully connected layer, and softmax layer. Below, we provide brief descriptions of each of these.
In simulating orientation-selective simple cells in the primary visual cortex of the brain (LeCun et al., 2010), convolution is considered to be an indispensable component of CNN frameworks. The input data x is a tensor with K channels. There are K filters of weights W generating K output y, as shown in Eq. (6). In our network architecture, K = 1 or K = 3, K = 32.
In the present study, we adopted two stacked convolutional layers as the basic structure of the CNN. Each convolutional layer used 32 3 × 3 small filters with a convolution stride of one. Convolution layer inputs were padded with one pixel to preserve the spatial resolution after convolution.
Pooling was also an important operation in the CNN framework. Multiple pooling methods serve to simulate complex cells in the brain's visual cortex (LeCun et al., 2010). In practice, the pooling method known as max pooling has been shown to work better than does average pooling. Specifically, max pooling calculates the max response of each feature map in a patch of size p × p as follows: Max pooling is performed over a 2 × 2-pixel window here with a stride of two. Additionally, we selected a rectified linear unit as an activation function for our CNN model, since it has performed both accurately and quickly in other CNN models (Dahl et al., 2013;Donahue et al., 2014). Its output is given by the following formula: The max pooling layer is followed by a fully connected layer with 512 hidden cells, the last layer of which is a two-way softmax layer. The softmax layer is usually used for classification in CNN, and several classifications use several ways of softmax. In our study, since our goal is to distinguish between depression and normal subjects, that is, two classifications, we used twoway softmax. Softmax changes the output of the original neural network into a probability output to represent the probability that a sample belongs to different categories. Assuming that the output of the original neural network is y 1 , y 2 , . . . y n , then the output after softmax processing is: Frontiers in Neuroscience | www.frontiersin.org FIGURE 2 | Three-channel input of the convolutional neural network (CNN). The three bands with the best classification performance among the five bands (delta, theta, alpha, beta, and gamma) of each type of functional connectivity matrix were used as the red (R), green (G), and blue (B) channels of the image to generate a three-channel image that was then used as input for the CNN.

Training
Most of the code used in the present study was written in Python (version 3.5). The CNN model was trained and tested based on a lightweight library named Lasagne 3 , which was 3 https://github.com/Lasagne/Lasagne used for building and training the neural networks in Theano (Al-Rfou et al., 2016).
The network was trained with the Adam algorithm (Kingma and Ba, 2014), which has demonstrated a competitively fast convergence rate in the training of neural network (Bashivan et al., 2015). The learning rate was 10 −3 , and the decay rates of the first and second moments were 0.9 and 0.999, respectively. The weight initialization method we used was Xavier (Glorot and Bengio, 2010), which has been shown to be effective in network training (Jiao et al., 2018). We also used early stopping to monitor the performance of the model over the validation sets. Dropout  with a probability of 0.5 was used across all fully connected layers. The effectiveness of this dropout to reduce overfitting in deep neural networks with millions of parameters has been shown previously  and in neuroimaging applications specifically (Plis et al., 2014). Because of implicit regularization imposed by smaller convolution filter sizes, the network requires fewer epochs to converge. Given this, our model was trained over 15 epochs with a batch size of 30.

EXPERIMENTS Participants
Fifty-one students (36 males, 15 females) aged between 18 and 24 were recruited from Lanzhou University and participated in the study. All of them had no prior history of psychopathology and had normal or corrected-to-normal vision. All participants were interviewed by psychologists after completing an investigation in the psychological screening system of Lanzhou University. According to psychologists, 24 of them were considered depressed, and the remaining 27 were healthy. Also, participants were asked to finish the Beck Depression Inventory test-II (BDI-II) (Beck et al., 1996) before experiment. Analysis of BDI-II showed that the BDI-II scores in the depression group ranges from 14 to 28, corresponding to mild depression, whereas the BDI-II scores of the healthy group were all lower than 13. In order to ensure that the number of samples was balanced for the two groups, 24 healthy participants were selected to comprise

Materials and Procedures
The stimuli used in the experiment were derived from the China Facial Affective Picture System (Gong et al., 2011), a subsystem in the standardized emotional stimuli picture system. We selected pictures of 45 neutral faces and 15 negative faces, including three each of angry, sad, disgusted, surprised, and fearful faces. The experiment consisted of two blocks, emotional block (Emo_block) and neutral block (Neu_block). Each block contained 15 trials. Each trial in the Emo_block contained an emotional expression and a neutral facial expression picture. Each trial in the Neu_block contained two pictures of neutral facial expressions. Each picture appeared randomly on either the right or left side of the screen. All faces were presented without hair, glasses, beards, or other facial accessories, and the two facial expressions used in each trial were combined into one image and presented on the screen. All 30 images were processed with the Adobe Photoshop CS6 software (Adobe Systems Incorporated, San Jose, CA, United States), and the size (1,280 × 738 pixels, 10.84 × 6.25 cm) and gradation were made uniform (Figure 4). Instructions were displayed on the screen before the beginning of each block. Further, four practice trials identical to the real trials were included to ensure that all participants understood the experimental procedures. Each trial was presented for 6 s. A black background was also presented for 2 s between every two trials. The participants were comfortably seated 60 cm from a 17-inch liquid crystal display monitor with a resolution of 1,024 × 768 and instructed to view each trial freely. A 2-min rest period followed the Neu_block, which was followed by the Emo_block. The whole protocol took approximately 7 min.

EEG Acquisition and Data Preprocessing
Electroencephalography signals were collected using a 128channel HydroCel Geodesic Sensor Net (Electrical Geodesics, Inc.). The EEG electrodes were placed according to the HydroCel Geodesic Sensor Net128 Channel Map (Version 1.0) and referenced to Cz. The impedance of all electrodes was maintained below 60 k (Ferree et al., 2001), and EEG signals were continuously recorded at a sampling frequency of 250 Hz. Because the signal gathered between two trials was not valid, each participant's continuous EEG signals were divided into 30 6-s segments according to marks in the time series. All EEG signals were high-pass filtered at a 0.5-Hz cutoff frequency and low-pass filtered at a 70-Hz cutoff frequency. Eye movement and muscle activity artifacts were discarded using Net Station Waveform Tools. Furthermore, because ocular artifacts are presented in the frequency band between 0 and 16 Hz, they overlap with the alpha rhythm frequency band (8-13 Hz). This study, therefore, used FastICA to eliminate ocular artifacts, as this approach has previously been shown to be effective in delineating between overlapping frequency bands (Hu et al., 2011). We used MATLAB R2010a (Mathworks, Natick, MA) to process all data.

Results of Network Analysis
In this section, our results show a significant difference in the small-world property between the two groups by showing the relationship between the small-world property and the thresholds of the two groups (Small-World Property section); statistical analysis of the mean clustering coefficient for each electrode was performed to determine the specific distribution of functional connectivity differences between the two groups in the brain (Statistical analysis of the mean clustering coefficient for each electrode section). In The Topological Structures of the Networks section, we visualized the functional connectivity differences between the two groups using coherence as an example. Figure 5 shows the mean small-world index S as a function of threshold T for the two groups. For coherence, a significant difference between the two groups was found in the delta band under Neu_block and in the theta band under Emo_block. The value of S in the mildly depressed group was significantly lower than that in the healthy control at most of the T-values points (indicated by asterisks). For the correlation, PLV, and PLI, we found a similar difference only in the delta band under both blocks.

Small-World Property
In addition, from the separate analysis of γ and λ, we found that the γ of both groups was greater than 1, but the difference between the groups was not significant. The λ of both groups was less than 1 and close to 1, indicating that the brain networks of both groups showed small-world characteristics; however, the λ of the mild depression group was significantly higher than that of the healthy control group, showing that the functional brain network of the mild depression group deviated from the smallworld network. Table 2 shows the statistical analysis of the mean clustering coefficient based on independent samples t-tests for each electrode for coherence and PLV under Neu_block and Emo_block. For both metrics, we observed a similar pattern, that is, in the delta band, the clustering coefficient of the mild depression group was significantly higher than that of the healthy control group for the electrodes distributed in the temporal and frontal regions, but the results were exactly the opposite for the electrodes distributed in the central and parietaloccipital regions; in the beta band, the mild depression group had significantly lower clustering coefficient in the parietaloccipital region than the healthy controls under Neu_block and Emo_block, and the difference was mainly found in the right hemisphere. We observed a different pattern for correlation. Differences between the two groups were observed in all five bands, and we listed those common electrodes on which there are differences between the two groups in the five bands in Table 3. Similarly, the mild depression group had a significantly lower clustering coefficient than had healthy controls under Neu_block and Emo_block, and the differences were mainly found in the parietal-occipital region of the right hemisphere. The above statistical analysis was performed at a threshold T of 0.15. For the PLI, no differences between the two groups were found on all five bands.

The Topological Structures of the Networks
In order to visualize the distribution of coherence differences between the two groups, we plotted the topological structures of functional networks between the mild depression group and healthy controls at a threshold of 0.1 in the delta band under Neu_block. Figure 6 was drawn based on the absolute value obtained by subtracting one coherence matrix from another one. When the absolute value was greater than the threshold of 0.1, there would be a connection in the figure. At a threshold of 0.1, the difference in coherence between the two groups in the delta band can be clearly seen. Most of the connections are distributed in the central, temporal regions and parietal-occipital region of the right hemisphere, and a few connections are distributed in the frontal region.

Classification Results
To evaluate the proposed CNN classification model, 24-fold cross-validation was adopted. In each fold, the functional connectivity matrices from one control and one participant with mild depression were used for testing, while matrices from remaining participants were utilized as training data and for validation. In each fold, the functional connectivity matrices of another mild depression and healthy control participants were used for validation. The functional connectivity matrices of the FIGURE 5 | The mean small-world index S as a function of the threshold T for the two groups under Neu_block and Emo_block. For the four metrics of coherence, correlation, phase locking value (PLV), and phase lag index (PLI), the values of S in the mildly depressed group were significantly smaller than those in healthy controls. The title of each subgraph is denoted as "metric_band_block." The horizontal axis presents different thresholds. The values under asterisks (*) denoted the values of S at different thresholds when the difference between two groups was significant at these thresholds. **The value of p smaller than 0.05; *the value of p between 0.05 and 0.1.
Frontiers in Neuroscience | www.frontiersin.org   remaining 44 participants were used for training. This strategy allowed us to avoid records from one participant being divided into both training and test sets and, thus, yielding a falsely high classification accuracy. The classification performance of the validation data was used for selecting the hyperparameters (such as learning rate, learning rate decay, regularization coefficient, number of iterations, weight initialization, etc.) and as a stopping criterion in training to avoid overfitting of the training data.

Classification Performance of the Four Functional Connectivity Matrices Using the CNN
We assessed the classification accuracy of the functional connectivity matrices using our CNN method. Since the initial weights of CNN were randomly initialized and different initial weights will train a slightly different model, we repeated the training and testing procedures nine times and calculated the mean and standard deviation of nine test accuracy for further analysis. Coherence performed best under the Neu_block and Emo_block (77.78% for the two blocks). The second highest classification accuracy was achieved with the PLV (74% in the gamma band for the Neu_block, 73.33% in the delta band for the Emo_block). Correlation yielded a 71.46% accuracy for the Neu_block and a 65.83% accuracy for the Emo_block. The PLI achieved the lowest accuracy for both the Neu_block (63.41%) and the Emo_block (55.60%). The classification accuracy of each functional connectivity matrix is shown in Table 4A.
Receiver operating characteristic (ROC) curves are commonly used to present the results of binary decision problems in machine learning (Davis and Goadrich, 2006). AUC is the area under an ROC curve and has a value between 0 and 1, with a greater AUC value indicating a better classification ability of the model. In order to compare the classification performance of the four input forms more intuitively, ROC curves for four functional connectivity matrices were obtained in the delta, theta, alpha, beta, and gamma bands. These are shown in Figures 7A-E. From these ROC curves, we were able to obtain the same results as those displayed in Table 4A.

Performance Comparison Between Our Method and the Self-Established Baseline
In order to further demonstrate the efficiency of our model, we input the four functional connectivity metrics into the four classic classifiers that are widely used in computer aided detection systems for depression (Hosseinifard et al., 2013;Li et al., 2016a;Cai et al., 2018), namely, BayesNet (BN), logistic regression (LR), k-nearest-neighbor (kNN), and random forest (RF), to classify participants into one of two classes, mild depression or healthy. The accuracy obtained from these four classifiers served as the baseline. These classifiers all used the default parameter values  implemented in Weka (Witten et al., 1999). The values of k used in kNN were 1, 5, and 10. The evaluation method we used herein for classification accuracy also underwent 24-fold crossvalidation. Since the inputs for the above four classifiers were feature vectors, we calculated the mean value of all elements in the functional connectivity matrix for each trial, and the mean value for all trials constituted an N × 1 (N represents the number of trials) vector that was then input into the above four classifiers for classification. We only show the highest classification accuracies among the four classifiers in Table 4B. Table 4B revealed that when functional connectivity metrics were input to the classic classifiers in the form of a feature FIGURE 6 | The results of the distribution which was the difference of the coherence in the delta band. Red nodes represent 128 electrodes, and the red lines between nodes show the difference in coherence at the threshold of 0.1. The thicker the red line is, the more the threshold is exceeded.
vector, the recognition rates obtained were obviously lower than the accuracy of the coherence, correlation, and PLV obtained using the CNN. This fully demonstrated the improvements in classification accuracy that our method provided compared to the self-established baseline when the functional connectivity metrics were input into the CNN in the form of 2D data for classification. However, this improvement did not appear in the PLI of the Emo_block.
Classification Performance of the Three-Channel Functional Connectivity Matrix As described above, for each functional connectivity matrix, we used the three bands of the top three classification performance as the R, G, and B channels to generate a three-channel image. We then input this three-channel image into the CNN for it to learn and classify. The bands used for each functional connectivity matrix are shown in Table 5.
The classification accuracy and ROC curves for the threechannel functional connectivity matrix obtained using the CNN are shown in Table 6A and Figure 7F. Compared with the results of the single channel, the three-channel functional connectivity matrix, which integrated information from three frequency bands, slightly increased the classification accuracy of coherence and correlation in the Neu_block and Emo_block. However, there was no increase in the PLV or PLI accuracy, which performed similarly to a single-channel construct. Figure 8 displayed the comparison of the results of the delta, theta, alpha, beta, and gamma bands, as well as the three-channel coherence that performed the best among the four functional connectivity matrices. The obvious accuracy improvement of coherence through integrating three EEG frequency bands can be observed in Figure 8.   We also applied the three-channel strategy to the functional connectivity metrics in the form of feature vectors. The functional connectivity metrics on the three bands shown in Table 5 formed an N × 3 feature vector (N represents the number of trials, and three represents the three bands) that was input into the four classic classifiers for classification. The classification accuracies (the highest classification accuracy among the four classifiers) are shown in Table 6B. Compared to the accuracy of the single band that was obtained using the classic classifiers shown in Table 4B, no improvement was provided. This demonstrated that the feature vector form of the functional connectivity metrics did not exhibit the advantages that integrating the three channels in the classic classifiers did and further confirmed the benefits of combining the CNN with functional connectivity matrices.

DISCUSSION
In the present study, we first illustrated the abnormal organization of functional connectivity network in mild depression by graph theory. Second, we proposed a novel approach of using a CNN to process functional connectivity matrix data in order to identify individuals with mild depression.

The Differences in the Small-World Network Between the Mildly Depressed Group and the Normal Control Group
Through graph theory, we found that the small-world index of the mild depression group was significantly lower than that of the healthy control group. Small-world networks are characterized by a high clustering coefficient and a short path length (Watts and Strogatz, 1998), while the mild depression group has a lower clustering coefficient and a larger characteristic path length than the healthy control group, indicating that the functional brain networks of the mild depression group deviate from smallworld networks. These findings were consistent with previous neuroimaging studies using graph analysis to study depression (Leistedt et al., 2009). The lower clustering coefficient of the mild depression group implies that the local connectedness of networks in mild depression is relatively spared. In addition, short path lengths ensure effective interregional integrity or prompt information transmission in brain networks, which Bold indicates that the accuracy obtained based on the three-channel functional connectivity matrix is higher than that of single channel in Table 4 (A).
constitutes the basis of cognitive processes. Thus, the increase in the path length associated with the disease may be attributed to the degeneration of fiber bundles for information transmission (Bai et al., 2012). In addition, the statistical analysis results of the clustering coefficients for each electrode showed that the difference between the two groups was mainly found in the parietal-occipital region of the right hemisphere. The study has shown that the right hemisphere is hyperactive in depression (Hecht, 2010). The parietal region and the right occipitotemporal cortex are the locations of the amygdala and hippocampus. Positron emission tomography studies have shown that the resting blood flow of the amygdala in patients with major depression increased by about 6% (Drevets et al., 1992). Jin et al. (2011) observed an abnormal hyperactive amygdala in depressed adolescents compared to healthy controls. In addition, research reported that the hippocampal volume of depressed patients was significantly reduced compared with healthy controls and the hippocampal neurons of these patients atrophy (Lee et al., 2002). Our findings are consistent with these previous results.
However, it is worth noting that, in the delta band, the clustering coefficient of the mild depression group was higher than that of the healthy control group in the frontal area. Some researchers have suggested that one function of prefrontal cortex is to modulate or inhibit amygdala activity (Davidson, 2004). Ochsner et al. (2002) reported a strong inverse relationship between activation of the prefrontal cortex and the amygdala when subjects were requested to voluntarily downregulate their negative affect. Our results and previous findings suggest abnormal frontal activity in mild depression.
Moreover, through the analysis of small-world properties and clustering coefficients, it is found that the difference between the two groups was mainly in the delta band. Knyazev (2012) reported that delta oscillations are prominent only in early human development stages and during slow-wave sleep. In waking adults, delta oscillations are overshadowed by more advanced processes associated with higher-frequency oscillations. However, delta oscillations are more pronounced in pathological states caused by detrimental environmental factors, developmental pathology, or damage to brain tissue, such as depression.

Coherence Recognizes Mild Depression Best
In the present study, we used different methods to construct functional connectivity matrices and compared the performance of four of these to identify cases of mild depression. Our results demonstrated that coherence was most effective in identifying mild depression using a CNN. By using Eqs. (1) and (3), we determined that the PLV was the amplitude-normalized coherence. There are two perspectives on coherence and PLV. Researchers who support the use of the PLV often claim that the phase synchronization reflected by the PLV is more stringent than that for coherence because the latter confuses the consistency of the phase difference with amplitude correlation. This may be true from a mathematical point of view, but one might argue that when there are no amplitude correlations, it would be more "difficult" to get a meaningful non-zero coherence value in the absence of consistent phase differences. For example, when none of the separately observed cross-spectral density estimates has phase synchrony, even if there is perfect amplitude correlation, the expected value of the vector averages will be relatively small. On the other hand, if cross-spectral densities of all individuals are estimated to be strongly phase-synchronized, the expected value of their vector averages is still perceptible even in the absence of amplitude correlations. In addition, those in support of using the coherence also believe that in the case of coherence, the observations with large-amplitude products are given stronger weight, meaning they are favoring those observations that have a higher-quality phase difference estimate. This actually assumes that a higher amplitude reflects a higher signal-to-noise ratio of the source of interest, and thus a better-quality phase estimate (Bastos and Schoffelen, 2016). A better-quality phase estimate may result in coherence achieving better classification performance than the PLV when using a CNN. However, the classification performance of the PLI in our CNN was relatively poor. This may be because the PLI is sensitive to the chosen length of the selected epochs (Fraschini et al., 2016), which was 6 s in the present study. This may have been too short to allow the PLI to stabilize.

The Three-Channel Functional Connectivity Matrix Improved Recognition Performance
Based on the data shown in Tables 4A, 6A, it is clear that the three-channel functional connectivity matrix used herein improved the accuracy of coherence and correlation slightly, achieving an accuracy similar to that of single channels for the PLV and PLI. We suspect that human cognitive processes involve different EEG rhythms, making it reasonable that neural networks learn the information contained across several integrated EEG spectrums. However, this conjecture requires further, future verification using additional data sets.

CONCLUSION AND FUTURE WORK
In summary, the present study first illustrated that some abnormal organizations in the functional connectivity network of patients with depression also appeared in individuals with mild depression. Specifically, compared with healthy controls, the mild depression group has a larger characteristic path length and a lower clustering coefficient, indicating that the brain functional network of mild depression deviated from the smallworld network. Second, we proposed a computer-aided method by which a CNN was used to learn information relevant to the functional connectivity matrices evident in individuals with mild depression such that they could be readily identified. This is an innovative approach other than the existing graph theory for the use of functional connectivity matrices for depression recognition. We primarily considered functional connectivity matrices that reflect altered brain functional connectivity in patients with mental illnesses using a 2D data structure given the advantages of CNNs in processing 2D datasets. The classification results of our method showed that coherence, correlation, and the PLV can effectively recognize mild depression using a CNN and that the recognition performance of coherence was superior to the other functional connectivity metrics, obtaining a classification accuracy of 80.74%. The proposed method can provide an auxiliary diagnosis of mild depression and offers great promise. In the future, we are committed to implement this method as an online depression detection system. Once an individual's EEG signal is collected, it is used to determine whether the individual has mild depression, a disease that is not easily detectable and diagnosable. This approach may be used to improve and hasten the detection of individuals with mild depression, ultimately permitting quicker treatment.
While the present study offers significant benefits, it has some limitations that warrant discussion. First, in addition to the functional connectivity matrices used in the present study, there are other various connectivity metrics available, such as the imaginary part of coherency (Nolte et al., 2004), partial directed coherence (Baccalá and Sameshima, 2001), directed transfer function (Kamiñski et al., 2001), phase slope index (Nolte et al., 2008), and Geweke's extension of Granger causality to the frequency domain (Brovelli et al., 2004). Further investigation of these metrics and their ability to detect the signatures of mental illnesses such as depression must continue in the future. Second, we used the functional connectivity metrics generated by all 128 pairs of electrodes available to us here. It is necessary to further explore which electrodes most obviously dictate functional connectivity matrix differences between healthy control individuals and those with mild depression. This would allow greater reductions to the number of electrodes used for classification and pave the way for real-time, online depression detection.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Lanzhou University Second Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XL and RL conceived the project idea. BH supervised the project. XZ provided critical suggestions for the statistical analysis. RL and YW performed the experiments and surveyed the literature, to check whether our findings were consistent with previous publications, and wrote the manuscript.