Statistical Analysis of Graph-Theoretic Indices to Study EEG-TMS Connectivity in Patients With Depression

Aim The objective of this work was to demonstrate the usefulness of a novel statistical method to study the impact of transcranial magnetic stimulation (TMS) on brain connectivity in patients with depression using different stimulation protocols, i.e., 1 Hz repetitive TMS over the right dorsolateral prefrontal cortex (DLPFC) (protocol G1), 10 Hz repetitive TMS over the left DLPFC (G2), and intermittent theta burst stimulation (iTBS) consisting of three 50 Hz burst bundle repeated at 5 Hz frequency (G3). Methods Electroencephalography (EEG) connectivity analysis was performed using Directed Transfer Function (DTF) and a set of 21 indices based on graph theory. The statistical analysis of graph-theoretic indices consisted of a combination of the k-NN rule, the leave-one-out method, and a statistical test using a 2 × 2 contingency table. Results Our new statistical approach allowed for selection of the best set of graph-based indices derived from DTF, and for differentiation between conditions (i.e., before and after TMS) and between TMS protocols. The effects of TMS was found to differ based on frequency band. Conclusion A set of four brain asymmetry measures were particularly useful to study protocol- and frequency-dependent effects of TMS on brain connectivity. Significance The new approach would allow for better evaluation of the therapeutic effects of TMS and choice of the most appropriate stimulation protocol.

Transcranial magnetic stimulation (TMS) is a noninvasive neuromodulation technique that is applied as an alternative therapy in drug-resistant patients to restore impaired neural connections. The combination of TMS and electroencephalography (EEG) allows for the non-invasive study of changes in brain connectivity (Shafi and Pascual-Leone, 2012).
Recent studies examining connectivity using EEG data recorded in patients with depression have predominantly used non-directional measures (i.e., coherence; Leuchter et al., 2012), various measures of phase synchronization (e.g., Phase-Locking Value; Zuchowicz et al., 2019), Phase Lag Index (Olbrich et al., 2014), Katz's and Higuchi's fractal dimensions, and other nonlinear methods or their combinations (Acharya et al., 2015). The application of directional measures, such as Directed Transfer Function (DTF) (Kaminski and Blinowska, 1991), allows for better understanding of the mechanisms involved in neural network re-organization associated with changes in brain states in different conditions among healthy persons or patients suffering from psychiatric disorders. To our knowledge, only one study has applied DTF and graph-theoretic indices to study the effects of TMS on functional connectivity in depression using a standard stimulation protocol, i.e., 10 Hz repetitive TMS (rTMS) applied over the left dorsolateral prefrontal cortex (DLPFC) (Olejarczyk et al., 2020). Other stimulation protocols have been used in only a few other studies. In particular, previous research has shown that the application of 1 Hz rTMS over the right DLPFC causes a shift in frontal alpha power asymmetry toward the right hemisphere (Valiulis et al., 2012).
Most prior studies have performed statistical analyses on indices based on graph theory using analysis of variance (ANOVA) or a non-parametric test equivalent. Some authors have applied various classifiers to differentiate between patients with depression and controls by applying a combination of different non-linear connectivity methods to extract features from EEG signals (Acharya et al., 2015). However, to-date, none of the classifiers used in statistical analyses have utilized a wide range of graph-theoretic indices derived from the adjacency matrices of a connectivity measure.
In the present study, we compared the results of the connectivity analysis evaluated by DTF and graph-theoretic indices using three different TMS protocols: (1) 10 Hz rTMS over the left DLPFC, (2) 1 Hz rTMS over the right DLPFC, and (3) intermittent theta burst stimulation (iTBS). Moreover, we applied a novel approach to the statistical analysis of graph-theoretic indices using the k-NN rule combined with the leave-one-out method and a statistical test using a 2 × 2 contingency table. This approach is novel because it calculates the significance level for the dependence between the class label and a feature vector to evaluate whether this dependence exists at all. This is particularly important in the case of high misclassification rates, which is relevant for the present study.
The primary objective of this work was to apply a novel method of statistical analysis (Harnisz et al., 2020). The usefulness of this method was demonstrated by analyzing the impact of various TMS protocols on EEG connectivity patterns in patients with depression. The advantage of the proposed method is the ability to identify statistically significant differences between conditions or stimulation protocols using a vector comprised of the optimal combination of features found among all analyzed features. This approach is in contrast to non-parametric tests, which rely on a single feature.

Patient Recruitment and Diagnosis Based on Psychiatric Tests
EEG data were collected in the Republican Vilnius Psychiatric Hospital (RVPH) following approval by the local ethics committee. TMS was applied to treat patients who were diagnosed with drug-resistant depression. One hundred twentysix patients with a drug-resistant depressive disorder were recruited from the RVPH. Eligible participants ranged in age from 18 to 75 years. Participants were split into three treatment groups, and demographic data for the three treatment groups are reported in Table 1. The three groups consisted of: (1) 35 patients (30 females, 5 males) who underwent 1 Hz rTMS over the right DLPFC, (2) 77 patients (55 females, 22 males) who underwent 10 Hz rTMS over the left DLPFC, and (3) 14 patients (11 females, 3 males) who underwent iTBS. All patients were screened according to the Diagnostic Criteria for Research (DCR) of International Classification of Diseases-tenth edition (ICD-10). All participants provided written informed consent prior to taking part in this study. Inclusion criteria for patients with depression were as follows: (a) history of at least one previous affective episode, according to the Diagnostic Criteria for Research of ICD-10, and (b) a total score greater than 18 on the 17-item Hamilton Depression Rating Scale (HAM-D; Hamilton, 1960), and (c) resistance to at least two different antidepressant medications or their combinations. There were no gender restrictions; however, we did aim to achieve a balance of genders during recruitment. Exclusion criteria were as follows: (a) history of medical condition(s) that could entail cognitive deterioration, (b) history of epilepsy, (c) recent drug or alcohol abuse, (d) pregnancy, (e) diagnosis of a learning disability, (f) inability to understand or complete the cognitive assessment, or (g) any other contraindication(s) for TMS. After participants were enrolled into the study, previously administered antidepressant treatments were maintained. However, psychotropic medications that can affect the stimulation threshold-particularly benzodiazepines-were tapered. Psychotropic drug doses were required to be stable for at least 2 weeks prior to initiating the first TMS session and for the entire duration of the treatment.

TMS
TMS procedures were applied using the MagVenture Magpro X100 TMS stimulator with the MagVenture Cool Coil B65 liquid cooled figure-eight coil. 280 µs biphasic impulses were used during the stimulation.
TMS is a broad term that includes different types of stimulations, such as single pulse stimulation for motor threshold evaluation or therapeutic rTMS. iTBS is one type of rTMS protocol (Schwippel et al., 2019). As a comparison to a commonly used 10 Hz rTMS (G2), we applied two other protocols: iTBS over the left DLPFC (G3), and 1 Hz rTMS over the right DLPFC (G1). Physiologically, iTBS is expected to facilitate the activity of the stimulation site by delivering 20 excitatory stimulation trains, which is similar to the 10 Hz condition. However, iTBS is expected to facilitate this activity in a shorter time period as compared to the 10 Hz condition (i.e., 3 min vs. > 15 min). A shorter duration of stimulation, and thus more comfort to the patient, is the main advantage of this protocol.
The intensity of rTMS was set according to the resting motor threshold. The resting motor threshold was defined as the lowest intensity needed to generate a visible twitch of the thumb of the participants' relaxed hand. Stimulation intensities were set at 100% for high frequency rTMS over the left DLPFC, 120% for low frequency rTMS over the right DLPFC, and 80% for iTBS rTMS over the left DLPFC. All rTMS frequencies greater than or equal to 5 Hz are considered high frequency (HF), whereas all rTMS frequencies less than or equal to 1 Hz are considered low frequency (LF).
HF rTMS consisted of twenty 10 Hz stimulation trains that each lasted 8 s and were spaced at 40-s intervals (1,600 impulses in total) (Rossi et al., 2009). LF rTMS consisted of 1 Hz stimulation in 2 trains, each lasting for 60 s, and spaced at 3-min intervals (120 impulses in total) (Klein et al., 1999). iTBS consisted of 2 s 50 Hz three-pulse bursts that were presented at 5 Hz frequency, applied in 20 trains and spaced at 8-s intervals (600 impulses in total) (Huang et al., 2011;Schwippel et al., 2019). The stimulation site was chosen by the psychiatrist according to the prevailing symptoms of depression. In particular, the left DLPFC was chosen in the case of adynamic depression, i.e., depression with dominating apathy symptoms, whereas the right DLPFC was chosen in case of anxious depression (Neznamov et al., 2001;Hunter et al., 2019;Minzenberg and Leuchter, 2019;Zorzo et al., 2019;Balderston et al., 2020). Right DLPFC activity seems to be directly linked to the manifestation of anxiety (Balderston et al., 2020). Indeed, 1 Hz rTMS over the right DLPFC is often used to treat anxious depression, as well as anxiety disorders and posttraumatic stress disorder (PTSD) (Zorzo et al., 2019). The HF rTMS and iTBS stimulation protocols were selected randomly. The imbalance in number of participants across the two stimulation sites (see Table 1) is due to practical limitations. In particular, the duration of classical 1 Hz protocol is shorter than the 10 Hz protocol, and thus does not require a shortened TBS alternative. All patients underwent 10-30 procedures.

EEG Registration and Preprocessing
EEG signals were recorded for 10 min during an eye-closed resting-state condition prior to rTMS and again 20-30 min after the last rTMS session. EEG data were collected in an electrically shielded booth using EBNeuro Galileo Mizar apparatus. A cap with twenty round bridge type Ag/AgCl electrodes was placed according to the international 10-20 system. The Fpz electrode was used as the ground electrode and ear electrodes were used as the reference. EEG recordings were filtered with low frequency (0.53 Hz), high frequency (70 Hz), and notch (50 Hz) filters. Data were digitized at 256 Hz at 12 bits. For further analysis, 30 s EEG intervals without artifacts before and after stimulation in all 126 patients with depression were chosen by an expert. In this study we decided to perform rather the analysis of between-subject than within-subject effects.

Connectivity Analysis
The DTF is a directed linear connectivity method that is based on Granger Causality but defined in the frequency domain. DTF allows one to determine the localization of sources and the direction of EEG activity propagation (Kaminski and Blinowska, 1991). The model order was estimated using the Akaike information criterion (AIC) defined as follows: where V is the noise variance matrix, N is the window size, d is the model order, and k is the number of EEG channels.
In the calculation of DTF, the following relation should be satisfied to guarantee the quality of fitting of the model (Blinowska and Kaminski, 2006): k * d < 0.1 * N. In the present study, the model order d was set to 10, the number of EEG channels k was 20, and the window size was 256 Hz * 30 s = 7,680 samples. Thus, the above relation was satisfied. The DTF was calculated using HERMES software (Niso et al., 2013) 1 . The significance of DTF values was determined using surrogate data analysis. The amplitudes of signal were maintained, while the frequency relationships were modified by shuffling of the data in frequency domain. The signal values were then obtained again from surrogate data that was transformed to the time domain. Significance of the values was evaluated by comparing the results from the original signal with results obtained from surrogate data. The number of surrogates was set to 100. Adjacency matrices were obtained from data with a p-value below the 0.05 threshold.
The matrices were analyzed using graph-theoretic indices. All indices derived from matrices of adjacency for DTF were calculated independently for each of the 126 patients separately for five frequency bands and for 10 thresholds, i.e., 10 values obtained from the matrices that contained 0-90% of the strongest connections. The indices dependent on the EEG channel were averaged across all twenty channels.
A disadvantage of the analysis of graph-theoretic indices is their dependence on degree index. Results are typically reported as graphs of index values as a function of the threshold Olejarczyk and Jernajczyk, 2017), or as index values averaged across all thresholds (Olejarczyk et al., 2020). In this paper, we propose to include the values of indices calculated for each of 10 thresholds in the classification step instead to take an average value. The k-NN classifier can automatically select the thresholds for which the indices yield the best classification results. However, frequently, an average threshold does not allow for the separation of two populations despite the presence of significant differences between indices for some of thresholds. This may also occur if only an increasing or decreasing trend exists between indices across a larger range of thresholds.

Novel Approach in Statistical Analysis of Graph-Theoretical Indices
We examined differences in the TMS effect between two conditions (i.e., before and after TMS), as well as, differences between the three stimulation protocols (i.e., G1, G2, G3) separately, across five frequency bands.
The error rate estimated by the leave-one-out method was used as a feature selection criterion. The data matrices for the feature selection contained 700, 1,540, and 280 rows for G1, G2, and G3, respectively. These sizes were obtained according to the following formula: number of patients × 10 thresholds × 2 conditions. The first column of this matrix contained the class label, which was either the TMS condition (pre, post) or the protocol number (G1, G2, G3). Next, there were 21 columns that corresponded to each of the 21 features.
We proposed a novel approach in the statistical analysis of graph-theoretic indices. Our approach combines two methods. One method was adapted from an area known as Pattern Recognition, and the other was a non-parametric test based on a 2 × 2 contingency table: chi-square, V-square, chi-square with Yates correction, or Fisher's exact test (Stanisz, 2006). To form the 2 × 2 contingency table, we used the k nearest neighbor (k-NN) rule, which is the most popular method in Pattern Recognition (Cover and Hart, 1967). The k-NN rule is also considered to be one of the most powerful rules and offers very good performance (Carpenter and Grossberg, 1996).
The k-NN rule is simple to implement and does not require a training session if the feature set is established. The main disadvantage of the k-NN rule is a slow classification phase. The k-NN rule is used in the classification of objects, described by some features that assume numerical values. A k-NN classification rule (or "classifier") is based on a reference set that contains objects with known class membership. The object to be classified can be any object that is outside of the reference set. This object is assigned to the class that is most heavily represented from among its k nearest neighbors in the reference set, for example, using a measure of Euclidean distance. The value of k can be established experimentally or taken as a small natural number. In the present study, we assumed k = 3, as suggested by Wilson (1972). An ideal classifier is a very rare case, and we need to account for potential misclassifications. In 1965, Lachenbruch (1965) first described the leave-one-out method-a method for estimating the error rate in small data sets. This technique was also described in 1982 by Devijver and Kittler (1982). The leaveone-out method allows for the classification of each object, x, in the reference set, R, using the k-NN rule, which is based on a reference set R-{x}, i.e., the reference set minus the object being currently classified. If for r objects the assigned class differs from the true one, then an error rate will be err = r/N, where N denotes a number of objects in the reference set. In addition to calculating the error rate, a confusion matrix (defined in Table 2) can be computed.
The confusion matrix is typically used to calculate sensitivity and specificity of the classification. In our case, the confusion matrix is used to calculate the significance level using the chisquare test or one of its modifications. Recommendations for choosing the correct modification can be found in the Statistica software manual (Stanisz, 2006). The significance level calculated on the basis of the aforementioned contingency table concerns the dependence between two qualitative variables: the true class and the assigned class. Both variables (i.e., the true class and the assigned class) assume values of either 1 or 2. Generally, the assigned class represents the "voice" of features. Thus, the obtained significance level can be treated as concerned to the dependence of the true class on the analyzed set of features.
Not all features can be easily matched to available classes. Some features can be redundant or correlated, which can lead to an increased error rate. Therefore, it is reasonable to perform feature selection, which consists of reviewing various feature combinations. There are two main feature selection procedures: (1) forward feature selection and (2) backward feature selection. The former consists of choosing a single feature that offers the lowest error rate and then sequentially adding additional features if the error rate subsequently decreases. The latter starts with a complete set of features and sequentially rejects the single feature if this action results in a decrease in the misclassification rate. In the present study, we applied the forward feature selection procedure.
All steps of the data analysis are shown in the flow chart provided in Figure 1. The first step of the statistical analysis was to select the best set of features for each of the 15 classes that were formed from all possible combinations of the 3 protocols and 5 frequency bands. Then, an appropriate non-parametric test with Bonferroni correction based on 2 × 2 contingency table was performed to test for statistically significant differences between two conditions (i.e., pre-and post-TMS) for vectors of selected features in each of these combinations.
The purpose of the second step was to find possible interactions between the 15 classes that were previously considered. Non-parametric tests, such as Kruskal-Wallis or Mann-Whitney U tests, are equivalent to a one-way analysis of variance for independent variables. Therefore, these nonparametric tests allow for the comparison between different pairs of protocols for a given band, or different pairs of bands for a given protocol. In contrast to non-parametric tests which analyzes individual features, the k-NN classification analyzes a vector of features. As a result of the k-NN classification,

Contingency table
Assigned to class 1 Assigned to class 2 True class 1 r11 r12 True class 2 r21 r22 the confusion matrix for 15 classes was obtained using all 11 features that were selected in the first step of analysis, i.e., all features that appeared in Table 3. Both conditions (i.e., preand post-TMS) were considered in this classification due to the relatively limited dataset. Then, an appropriate non-parametric test with Bonferroni correction was applied based on 2 × 2 contingency table. The non-parametric test aimed to identify statistically significant differences between each pair of bands in each stimulation protocol, and each pair of protocols in every frequency band relative to the set of 11 features.

Feature Selection Results
The feature selection and condition (i.e., before and after TMS) classification was performed for every pair of stimulation protocols (G1, G2, G3) and frequency bands (δ, θ, α, β, γ). Altogether, 11 of the 21 features were selected, including the asymmetry measures (DFP, IFP, DLR, ILR), clustering coefficient (C), global efficiency (E global ), and the assortativity coefficients (add, aio, aoi, aoo, aii). The sets of selected features were found independently for every pair of stimulation protocol and frequency band. Each set starts from a feature that offers the lowest error rate. Further features were included in such a way that the new set would allow to reduce maximally the error rate. The selected features, the error rate, the confusion matrix for the factor CONDITION and the corresponding p-values are reported in Table 3.
Using the non-parametric tests based on 2 × 2 contingency table, we found p-values that were less than 0.0001 for each contingency table presented in Table 3. Due to the observed high error rates (i.e., ranging from 0.161 to 0.318), we decided to investigate whether there was, in fact, a dependence between the class labels and sets of features. Out of 21 features, the forward feature selection required a review of 231 feature combinations. Accounting for Bonferroni correction, the results can be considered statistically significant if p < 0.0002 (i.e., 0.05/231). Thus, the sets of selected features allowed for the identification of statistically significant differences between conditions for each stimulation protocol, and in each frequency band. It is worth noting that each of the asymmetry measures appeared in sets of selected features for each stimulation protocol in higher frequency bands (i.e., alpha, beta, and gamma). Clustering coefficient, in contrast, was only observed in alpha and gamma bands. The sensitivity and specificity of our method ranged from 0.68 to 0.87, and from 0.68 to 0.83, respectively. Sensitivity and specificity for detecting stimulation impact was calculated from the confusion matrices reported in Table 3.

Classification of Groups
The set of 11 features previously selected were used for the classification of 15 classes that were formed from all possible combinations of the three stimulation protocols and five frequency bands. The results of this classification are reported in Table 4.
The values reported in Table 4 were used to construct 2 × 2 contingency tables for every possible pair of frequency bands in each stimulation protocol, and for every possible pair of stimulation protocols in each frequency band. The 2 × 2 contingency tables tested for statistically significant differences between bands for individual stimulation protocols, and for differences between protocols in each frequency band. The observed p-values were less than 0.0001 for each contingency table. Thus, there were statistically significant differences between each pair of bands in each stimulation protocol, as well as, between each pair of protocols in each frequency band.

DISCUSSION
The published literature predominantly applies ANOVA or nonparametric equivalents to assess indices based on graph theory. However, ANOVA assumes that the variables are normally distributed and if not, non-parametric tests are applied. Nonparametric tests are equivalent to a one-way ANOVA in that a single index can be considered. Here, we observed that a specific set of features (i.e., indices) may allow for better differentiation between conditions as compared to relying on a single feature (Olejarczyk et al., 2012 The values on the diagonal are marked in bold. ranges of features. Nevertheless, it is possible to separate these conditions by using a distance between the neighbors as a selection criterion. Moreover, the application of the k-NN rule allows for the identification of the best set of features without assuming that the features are normally distributed. We chose the leave-one-out method because it is recommended for limited data sets, such as in the present study. Finally, the confusion matrix and chi-square test (or its modification) were used to calculate a significance level for the dependence between two qualitative variables: the true class and the assigned class. The assigned class represents the "voice" of the feature set. The last step is a completely new and universal approach that can be applied in other situations, such as the normal/Gaussian distribution of features or larger data sets. This approach can also be applied in conjunction with other modern classifiers Rafiei and Adeli, 2017).
There are a lot of connectivity measures and indices based on graph theory (Rubinov and Sporns, 2010;Niso et al., 2013). Their usefulness depends on a particular application. In this study we were interested in finding a set of features (indices) allowing for characterizing the changes in brain connectivity evoked by TMS in patients with depression. We have found that 11 from 21 indices were useful for this purpose. They allowed for differentiation between 15 classes being a combination of 3 protocols and 5 frequency bands. Then, we found a set of the best indices for every class separately. These sets can be used in clinical practice for evaluation of the TMS effect for a given protocol or to find differences between protocols. Interestingly, four asymmetry indices were common for three protocols in higher frequency bands (alpha, beta and gamma). This finding is in line with a known characteristic of depression, i.e., alpha hypoactivity of the left and hyperactivity of the right frontal areas during the resting state (Henriques and Davidson, 1991;Davidson, 1998;American Psychiatric Association, 2013;Allen and Reznik, 2015;Mumtaz et al., 2015;Van der Vinne et al., 2017;Kustubayeva et al., 2020). The results of our study showed that also the fronto-posterior asymmetry is important in depression. Moreover, the therapy effect depends on the protocol choice. Thus, the evaluation of initial state of patient with depression by these indices could be helpful for decision support. However, future studies should be performed for more epochs, in a larger sample size with a more balanced groups of patients to test the usefulness of this new methodological approach in clinical practice.

CONCLUSION
Here, we applied a novel approach to perform statistical analysis of graph-theoretic indices using the k-NN rule combined with the leave-one-out method and a statistical test for the 2 × 2 contingency table. This method allowed for the selection of the best feature set to study protocol-and frequency-dependent effects of TMS on brain connectivity.
The main purpose of this study was to develop a new method of statistical analysis of graph-based indices derived from adjacency matrices of DTF using EEG data recorded in patients with depression before and after TMS. However, this is a general method that could be applied with other data as well.

DATA AVAILABILITY STATEMENT
The datasets of this manuscript are not publicly available owing to the policy of Republican Vilnius Psychiatric Hospital.
Requests to access the datasets should be directed to VV, vladas.valiulis@gmail.com.

ETHICS STATEMENT
The approval of the protocol was obtained from the Ethics Committee of Republican Vilnius Psychiatric Hospital and a written informed consent was provided by all patients.