Diagnosis of Brain Diseases via Multi-Scale Time-Series Model

The functional magnetic resonance imaging (fMRI) data and brain network analysis have been widely applied to automated diagnosis of neural diseases or brain diseases. The fMRI time series data not only contains specific numerical information, but also involves rich dynamic temporal information, those previous graph theory approaches focus on local topology structure and lose contextual information and global fluctuation information. Here, we propose a novel multi-scale functional connectivity for identifying the brain disease via fMRI data. We calculate the discrete probability distribution of co-activity between different brain regions with various intervals. Also, we consider nonsynchronous information under different time dimensions, for analyzing the contextual information in the fMRI data. Therefore, our proposed method can be applied to more disease diagnosis and other fMRI data, particularly automated diagnosis of neural diseases or brain diseases. Finally, we adopt Support Vector Machine (SVM) on our proposed time-series features, which can be applied to do the brain disease classification and even deal with all time-series data. Experimental results verify the effectiveness of our proposed method compared with other outstanding approaches on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and Major Depressive Disorder (MDD) dataset. Therefore, we provide an efficient system via a novel perspective to study brain networks.


INTRODUCTION
The functional Magnetic Resonance Imaging (fMRI) technique provides an opportunity to quantify functional integration via measuring the correlation between intrinsic Blood-Oxygen-Level-Dependent (BOLD) signal fluctuations of distributed brain regions at rest. The BOLD signal is sensitive to spontaneous neural activity within brain regions, thus it can be used as an efficient and noninvasive way for investigating neurological disorders at the whole-brain level. Functional connectivity (FC), defined as the temporal correlation of BOLD signals in different brain regions, can exhibit how structurally segregated and functionally specialized brain regions interact with each other. Therefore, the brain network analysis using fMRI data will provide great advantages to automated diagnosis of neural diseases or brain diseases.
Some researchers model the FC information as a specific network by using graph theoretic techniques. Differences between normal and disrupted FC networks caused by pathological attacks provide important biomarkers to understand pathological underpinnings, in terms of the topological structure and connection strength. The network analysis has been becoming an increasingly useful tool for understanding the cerebral working mechanism and mining sensitive biomarkers for neural or mental diseases. Zeng et al. (2018) propose a new switching delayed particle swarm optimization (SDPSO) algorithm is proposed to optimize the SVM parameters. Using graph theories, the brain network analysis provides an effective solution to concisely quantify the connectivity properties of brain networks, where each node denotes a particular anatomical element or a brain region, and each edge represents the relationship between a pair of nodes, such as anatomical, functional or effective connections (Friston, 2011). The anatomical connection typically corresponds to white matter tracts between many pairs of brain regions. The functional connection corresponds to magnitudes of temporal correlations in activity and occurs between some pairs of anatomically unconnected regions, which may reflect linear or nonlinear interactions, as well as interactions within different time scales (Zhou et al., 2009). The effective connection represents direct or indirect causal influences of one region on another region, which may be estimated from observed perturbations whether synchronous or asynchronous (Friston et al., 2003). As a brain network analysis approach, the graph theory offers two important advantages (Tijms et al., 2013). One is that it provides quantitative measurement, which can preserve the connectivity information in the network and thus reflect the segregated and integrated nature of local brain activity. The other is that, it provides a general framework for comparing heterogeneous graphs constructed by different types of data, such as anatomical and functional data.
However, these graph theory approaches have many drawbacks that must be overcome. First, the graph theory has many limitations, on the one hand, common graph theory features such as edge weights, path lengths and clustering coefficients (Rubinov and Sporns, 2010;Chen et al., 2011) usually focus on local topology structure and lose their global topology characteristics (Sanz-Arigita et al., 2010;Jie et al., 2018); on the other hand, each node in the brain networks is uniquely corresponding to a specific brain region, mostly ignoring the label information of each node (Jie et al., 2018). Second. the functional connectivity is more sensitive to local information rather than the global topology, but some recent studies (Hutchison et al., 2013;Leonardi et al., 2013;Zeng et al., 2013Zeng et al., , 2014Allen et al., 2014) indicate that the FC network contains rich dynamic temporal information. To be more concrete, for each brain region, a sliding window approach is performed to generate a set of BOLD subseries on schizophrenia disease diagnosis  and others Wee et al., 2016). Third, the raw functional data is underutilized, building brain network from raw data may lose the temporal or context information. For example, Pearson's Correlation Coefficient (PCC) is the simplest and most commonly scheme in functional connectivity estimation, which is the covariance of the two variables divided by the product of their standard deviations. Clearly, according to the mathematical definition, the PCC value is context-independent or order-independent in time series, not considering nonsynchronous information under different time dimensions.
In view of the above, the fMRI time series not only contains specific numerical information, but also involves contextual information and global fluctuation information. In this paper, we propose a novel time-series model based on Jensen-Shannon divergence for identifying the brain disease via fMRI data, and the flow chart is shown in Figure 1. First, we calculate the discrete probability distribution of co-activity between different brain regions with various intervals in multi-scale time series data. Second, the contextual information is taken into account in analyzing the correlation and causality among the fMRI data. Third, we design a novel method based on time-series to measure the similarity between two object co-activity intensity of brain functional connectivity. Finally, we adopt Support Vector Machine (SVM) on our proposed time-series features, which can be applied to do the brain disease classification and even deal with all time-series data. Experimental results verify the effectiveness of our proposed method compared with other outstanding approaches on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and Major Depressive Disorder (MDD) dataset. The rest of this paper is organized as follows. We start by a brief review of dataset and pre-processing. Then, we formulate the problem and present our proposed method. Finally, experimental results are reported, followed by the conclusion of this work.

MATERIALS AND METHODS
In this section, we introduce the flow of our method. First, we preprocessed the original data, removed the noise from the original data, and segmented the fMRI image data through the brain region template. Next, we extract information or features from the perspective of functional connection between brain regions. To overcome the shortcoming of traditional Pearson Correlation Coefficient (PCC) methods, we propose a novel framework for feature extraction of brain functional connection. Then, through feature selection, we use the classification model for predicting brain disease. Finally, we discuss parameter settings in the model.

Dataset
We carry out experiments on two different datasets. One is a public Alzheimer's Disease Neuroimaging Initiative database (Jack et al., 2010), and another one is a volunteer experiment of Major Depressive Disorder (Geng et al., 2018). In the data pre-processing, we deal with the raw data by a widely used software package (SPM12), and then divide one brain into 116 brain regions.

MDD
In volunteer experiment, we use a total of 60 subjects, including 31 volunteers with Major Depressive Disorder (MDD) (22 females and 8 males, aged 60.5 ± 11.2 years, range 25 − 65 years) and 29 healthy volunteers (18 females and 11 males, aged 50.1 ± 10.6 years, range 25 − 65 years). Those major depressive disorder subjects without comorbidity had a minimum duration of illness more than 3 months. Each participant provided written informed consent and the study was conducted in accordance with the local Ethics Committee.

Pre-processing
We perform image pre-processing for the fMRI data using a standard pipeline, carried out via the statistical parametric mapping (SPM12, www.fil.ion.ucl.ac.uk/spm/software/spm12/) software package on Matlab. The data pre-processing procedure includes slice timing, realign, segment, normalization and bandpass filtered. For more detailed data pre-processing procedure, please refer to website.
The whole brain of each subject in fMRI space is parcellated into 116 brain regions of interest (ROI) according to the Automated Anatomical Labeling (AAL) template. This atlas divided the brain into 78 cortical regions, 26 cerebellar regions and 12 subcortical regions according to anatomy, details in literature (Tzourio-Mazoyer et al., 2002). For each of the 116 ROIs, the mean time series was calculated by averaging the Blood-Oxygen-Level-Dependent (BOLD) signals among all voxels within the specifically ROI. There exist many similar templates such as Brainnetome template (Fan et al., 2016) and Harvard-Oxford template.

Feature Extraction
After pre-processing, how to excavate the location and cause of lesions is the focus of our research and attention. The most common method is to calculate the correlation between two brain regions through Pearson Correlation Coefficient (PCC), and analyze lesions by observing the changes of correlation. However, the PCC value is context-independent or order-independent, that is not considering nonsynchronous information at different time intervals. Here, we first give a basic introduction to PCC, and then elaborate on our approach.

Pearson Correlation Coefficient
Pearson's correlation coefficient (PCC) is the simplest and most commonly scheme in functional connectivity estimation. For any two brain regions, the coordination degree of blood-oxygen-level dependent fluctuation is calculated as the functional connection strength between these two brain regions. Typically, in the case of the AAL template, this step extracts the 6,670-dimensional features. Mathematical definition is the covariance of the two variables divided by the product of their standard deviations, as follows: Clearly, according to the formula, the value of the Pearson's correlation coefficient is context-independent or orderindependent in time series, which it only limits alignment at the same time, so information about the time dimension or context is missing.

Multi-Scale Functional Connectivity of Brain Regions
We extract the discrete probability distribution of co-activity in time series data. First, we use the function φ(·) to evaluate temporal dynamic property of the time series data. In addition, we convert φ(·) to g(·), defined as follows: where f (·) represents a mapping function that makes use of prior knowledge in order to map the original time series into another specific form, g(·) represents the function to evaluate temporal information after the mapping operation.
We utilize the prior knowledge in order to map the original multivariate time series data into another specific form, such as a mapping of numeric, state and character. The mapping function is defined as follows: where A k denotes the original time series data, and ϕ denotes the prior knowledge. In the multivariate time series data A k , the correlation value between T k i and T k j is defined as follows: In addition, the correlation value between T k i and T k j in interval I t = [r t , s t ] is defined as follows: Notably, it is obvious that C k φ(·) (i, j, I t ) = C k φ(·) (j, i, I t ). Generally, we explore the correlation of time series data in multiple intervals. Let C k φ(·) ∈ R N×N×T denotes the multi-scale weighted correlation coefficient in multivariate time series data A k . Here, C k φ(·) is a 3-order tensor, N is the number of time series data, T is the number of intervals.
Next, we transform the tensor C k φ(·) into a discrete probability distribution P k φ(·) for analyzing co-activity in multi-scale time series data, as follows: where p k φ(·) (i, j, I t ) represents the proportion of correlation value between i-th time series data and j-th time series data based on function φ(·) in interval I t , defined as follows:

Classification Model for Predicting Brain Disease
In disease prediction, the number of samples is limited, but the feature dimension is usually large, so we need to both compress the feature space to improve the accuracy and analyze the etiology with more meaningful features. We use t-test for feature selection, and then we use Support Vector Machine (SVM) as the learning model, which is described in detail as follows.

Feature Selection
We use the two-sample t-test as the feature selection method. We assume that one feature of positive and negative samples is subject to the distribution of the same mean, and we set the significance parameter p = 0.05.

Support Vector Machine
We adopt Support Vector Machine (SVM) technique developed by Cortes and Vapnik (1995) for solve the binary classification problem. Also, various kinds of binary classification model can be applied in many other biomedical prediction problems (Guo et al., 2014(Guo et al., , 2015(Guo et al., , 2016Ding et al., 2016aDing et al., ,b, 2017aLiu et al., 2016;Zeng et al., 2016;Shen et al., 2017a,b;Xuan et al., 2017;Pan et al., 2018). The decision function is shown as follows: where K(A k , A i ) represents our proposed novel time-series kernel function, and α i is calculated as follows: where C is a regularization parameter that controls the tradeoff between margin and misclassification error.

Model Parameter
In practice, we make more detailed discussion for parameters in our method. We discuss some prior knowledge and assumptions in our problem of Alzheimer's disease and Major Depression Disorder diagnosis, and some details need to be clarified. The time series data not only carry specific numerical information, but also include contextual and fluctuation trend information.
Here, due to the BOLD imaging principle, we pay more attention to the time points of high activity state, that is, time points with high values in time series. We define a dynamic or soft threshold to distinguish whether a time point is active or not, that is, converting a numeric sequence into a state sequence or 0/1 sequence. For all active time points in one set of time series, we count the number of time points of simultaneous responses in other sets of time series. Moreover, we analyze the co-active between two sets of time series in asynchronous. As we get more details with asynchronous analysis, we'll get more essential information. In the experiments, it is also proved by the higher classification accuracy.

Time Series Mapping
We adopt a empirical rule to indicate the dynamic threshold, called three-sigma method (WalterA, 1986). This method converts a numeric sequence into a state sequence, the dynamic threshold represented as follows: and In a multivariate time series A k , we calculate a corresponding dynamic threshold th(T k i ) for each set of time series T k i . Then, for a set of time series T k i , we convert a numeric sequence into a 0/1 sequence according to mapping function f (·), as follows: The magnitude of η indicates the sensitivity of our method to the active state. In our experiment, η is set to 1.

Correlation Function φ
The correlation function represents the relationship between a couple of time points in time series. In disease diagnosis, we only focus on co-activity, that is, both brain region i in time point m and brain region j in time point n are in active states. To be more concrete, t k i,m and t k j,n are greater than the threshold th(T k i ) and th(T k j ), respectively.
Corresponding to Formula 2 above, φ(·) in our experiment is:

Interval Set I
For a collection of multiple intervals I, we extract local information by the element of interval, that is, greater element, more detailed information. Easy to be over-fit and sparse; if the element of interval is little, we may lose some key information. Also, for a interval I t ∈ I, if I t is close to zero, it means that two time points that we're interested in are very close; if I t is far from zero, it indicates that we extract long-distance asynchronous information.

RESULTS
Our experiment consists of three parts. To proof the effectiveness of our approach, we perform on automated diagnoses of Alzheimer's disease and Major Depressive Disorder, respectively. We evaluate the classification performance using the leave-oneout cross-validation (LOOCV). And also, we adopt Accuracy, Sensitivity, Specificity and AUC as evaluation standards. First, we compare the results of the traditional PCC method and our feature extraction method in the two data sets of AD and MDD. Then, we compare the effects of different classifiers. Finally, we compare our approach with some recent research works.

Comparison of Different Features
Here, we compare the performance of traditional PCC method and our feature extraction method to analyze fMRI data. In addition to feature extraction, we use the same experimental steps and parameters, including preprocessing, feature selection and classifier. The results are shown in Table 1.
On Alzheimer's disease and major depressive disorder database, we compare our method to traditional PCC method, and classification results are summarized in Table 1. The information extracted by our multi-scale functional connection (Multi-Scale FC) method is used for predicting brain disease, which is obviously higher than the traditional PCC method. On Alzheimer's disease dataset, our method achieves best specificity of 0.9268. Moreover, by combining PCC and our method, we achieve better results, with ACC of 0.8935 and AUC of 0.8748. On MDD dataset, our method also achieve the best results, but the difference is that PCC and multi-scale functional connection are actually lower when combined. The experimental results indicate that our approach is more effective than traditional PCC or graph theory feature-based methods. Combining different methods will yield better results, but there is also a risk of over-fitting.

Comparison of Different Classifiers
In this part, we use the feature extraction model in the previous step to compare the performance of different classifiers. Specifically, we compare three classifiers: random forest (RF), logistic regression (LR) and support vector machine (SVM). The results are shown in Table 2.  In this part, we use our proposed multi-scale functional connection method to extract features, and compare the results of different classifiers. Comparing these three classifiers, SVM can achieve the highest AUC in both AD dataset and MDD dataset, the best ACC can also be obtained on the AD data set, which is generally a stable classifier. In addition, RF can obtain the best ACC on the MDD dataset, and LR can obtain the best Spe on the AD dataset. Overall, all three classifiers can achieve good accuracy, indicating that the information extracted by our method is effective and stable.

Comparison of Different Existing Methods
We compare our proposed method to recent outstanding studies. Baseline represents the traditional graph theory featurebased method. Moreover, the state-of-the-art methods represent three major groups of graph kernels on edge, subtree and shortest-path, respectively. These graph kernel belong to the Weisfeiler-Lehman graph kernel framework (Shervashidze et al., 2011), denoted as WL-edge, WL-subtree and WLshortestpath, respectively. In addition, in the Alzheimer's disease diagnosis, we also compare with the graph kernel method with shortest-path (Shortest-path) (Borgwardt and Kriegel, 2006), the sliding window method (FON: 70-length sliding window with 1-step)  and the sub-network kernel method (SKL) (Jie et al., 2018). In the Major Depressive Disorder classification problem, we compare to the method of Geng et al. (2018).
On Alzheimer's Disease Neuroimaging Initiative database, we compare our method to seven existing methods, and classification results are summarized in Table 3. Our method achieves best accuracy of 0.8876 and best AUC of 0.8562.  Table 4. Our method achieves best accuracy of 0.9000 and best AUC of 0.9295. However, the accuracy values for Baseline, Shortest-path and method of Xu et al. are 0.6167, 0.7833, and 0.8667, respectively. Also, the AUC values for these three methods are 0.6514, 0.8135, and 0.9103, respectively. Comparing to these methods, our method achieves accuracy improvement of 0.0333 and AUC improvement of 0.0192, respectively. The experimental results indicate that our approach is far better than traditional graph methods, and slightly better than the current outstanding methods.

CONCLUSIONS
The fMRI time series data not only contains specific numerical information, but also involves rich dynamic temporal information. However, those previous graph theory approaches focus on local topology structure and lose contextual information and global fluctuation information.
Here, we propose a novel multi-scale functional connectivity for identifying the brain disease via fMRI data. We calculate the discrete probability distribution of co-activity between different brain regions with various intervals. Also, we consider nonsynchronous information under different time dimensions, for analyzing the contextual information in the fMRI data. Therefore, our proposed method can be applied to more disease diagnosis and other fMRI data, particularly automated diagnosis of neural diseases or brain diseases. Experimental results verify the effectiveness of our proposed method, so we provide an efficient system via a novel perspective to study brain networks.In the future, parallel computing (Zou et al., 2017), computational intelligence (Xu et al., 2017;Zou et al., 2017) and neural networks (Song et al., 2018;Xu et al., 2018) can be considered with the growing of dataset.

DATA AVAILABILITY
Publicly available datasets were analyzed in this study. This data can be found here: http://adni.loni.usc.edu/. The results and codes for this study can be found in the https://github.com/ guofei-tju/Multi-Scale-FC-Frontier-in-NeuroSci.git.