Test-retest reliability of white matter structural brain networks: a multiband diffusion MRI study

The multiband EPI sequence has been developed for the human connectome project to accelerate MRI data acquisition. However, no study has yet investigated the test-retest (TRT) reliability of the graph metrics of white matter (WM) structural brain networks constructed from this new sequence. Here, we employed a multiband diffusion MRI (dMRI) dataset with repeated scanning sessions and constructed both low- and high-resolution WM networks by volume- and surface-based parcellation methods. The reproducibility of network metrics and its dependence on type of construction procedures was assessed by the intra-class correlation coefficient (ICC). We observed conserved topological architecture of WM structural networks constructed from the multiband dMRI data as previous findings from conventional dMRI. For the global network properties, the first order metrics were more reliable than second order metrics. Between two parcellation methods, networks with volume-based parcellation showed better reliability than surface-based parcellation, especially for the global metrics. Between different resolutions, the high-resolution network exhibited higher TRT performance than the low-resolution in terms of the global metrics with a large effect size, whereas the low-resolution performs better in terms of local (region and connection) properties with a relatively low effect size. Moreover, we identified that the association and primary cortices showed higher reproducibility than the paralimbic/limbic regions. The important hub regions and rich-club connections are more reliable than the non-hub regions and connections. Finally, we found WM networks from the multiband dMRI showed higher reproducibility compared with those from the conventional dMRI. Together, our results demonstrated the fair to good reliability of the WM structural brain networks from the multiband EPI sequence, suggesting its potential utility for exploring individual differences and for clinical applications.


INTRODUCTION
The concept of the "human connectome" has been recently proposed and has provided a new perspective to investigate the brain's structural and functional systems (Sporns et al., 2005). As the anatomical substrate of brain function, the structural brain connectome describes brain wiring patterns and is fundamentally important for revealing the mechanisms of how the brain works. Recent studies have suggested that the human white matter (WM) structural network can be mapped in vivo using diffusion MRI (dMRI) tractography techniques and quantified by graph-theoretical analysis (Hagmann et al., 2007;Bullmore and Sporns, 2009;Gong et al., 2009a). The quantitative graph metrics of structural brain networks are suggested to be closely related to individual cognitive performances (Li et al., 2009;Wen et al., 2011) and sensitive to the processes of normal development (Hagmann et al., 2010) and aging (Gong et al., 2009b), as well as neuropsychiatric diseases (Lo et al., 2010;Shu et al., 2011;Zalesky et al., 2011;Bai et al., 2012;Cao et al., 2013), suggesting that network metrics may be potential biomarkers for clinical applications.
Recently, some promising fast-collecting imaging techniques, such as multiband EPI (mEPI), have been applied in the dMRI data acquisition (Moeller et al., 2010). This new sequence can accelerate acquisition by simultaneously imaging multiple slices in the human brain, while not significantly sacrificing spatial resolution or the SNR (Moeller et al., 2010;Xu et al., 2013). This sequence is being applied in the recently launched human connectome project aiming to acquire a large sample of healthy subjects with the goal of uncovering individual differences in brain circuitry related to behavior (van Essen et al., 2012). However, before successfully charting the human connectome using this new sequence, studies must determine whether connectivity properties conserved across the population can be reproducibly quantified in an individual over multiple scanning sessions and

TEST-RETEST DATASETS
The multiband test-retest pilot dataset was publicly available from INDI (http://fcon_1000.projects.nitrc.org/indi/pro/eNKI_ RS_TRT/FrontPage.html). The dataset includes 24 subjects whose phenotype information is presented in Table 1. All individuals included in the sample underwent semi-structured diagnostic psychiatric interviews and completed a battery of psychiatric, cognitive and behavioral assessments. Written informed consents were obtained from all participants. The study was approved by the Nathan Kline Institute Institutional Review Board. Recently, the test-retest resting-state functional MRI (rs-fMRI) data in this dataset has been used to examine the reliability of regional functional homogeneity (Zuo et al., 2013) and the reliability of global hubs in human voxel-wise functional networks (Liao et al., 2013). To exclude the potential effects of health issues, the data of seven subjects with current/past psychiatric disorders and four subjects without diagnostic information were discarded. Moreover, one subject was excluded due to brain atrophy and one subject lacked one repeated session; therefore, data from 11 healthy subjects (3 females, mean age 32.9 ± 12.5 years) were left for further analyses (marked in Table 1).

DATA ACQUISITION
Each participant received test-retest dMRI scans (at least 1 week apart) using a Siemens Trio 3T scanner. The dMRI data were acquired using a recently developed mEPI sequence (Moeller et al., 2010;Xu et al., 2013): repetition time (TR) = 2400 ms, echo time (TE) = 85 ms, 64 slices, slice thickness of 2 mm, FOV = 212 × 180 mm 2 , voxel size of 2 mm isotropic, b value = 1500 s/mm 2 , 128 gradient directions with 9 b = 0 images, multiband acceleration factor = 4, averages = 1, total acquisition time = 5:58 min. A T1-weighted image was obtained with an magnetization prepared rapid gradient echo (MPRAGE) sequence [TR = 2500 ms, TE = 3.5 ms, inversion time (TI) = 1200 ms, acquisition matrix = 256 × 256, voxel size of 1 mm isotropic]. Additionally, the test-retest rs-fMRI data were also acquired, but were not used in the present study. For each dMRI scan, the data quality was checked by visual inspection to avoid the distortions caused by magnetic field inhomogeneities.

DATA PREPROCESSING
The preprocessing of dMRI data consisted of the following steps: eddy current and motion artifact correction, estimation of the diffusion tensor, calculation of the fractional anisotropy (Smith et al.). The eddy current distortions and motion artifacts in the dMRI dataset were corrected by applying an affine alignment of each diffusion-weighted image to the b = 0 image. After that, the diffusion tensor elements were estimated by solving the Stejskal and Tanner equation; then, the reconstructed tensor matrix was diagonalized to obtain three eigenvalues (λ 1 , λ 2 , λ 3 ) and eigenvectors, and the corresponding FA of each voxel was calculated. All of the processes were performed with the FDT toolbox (Behrens et al., 2003) of FMRIB Software Library (FSL, http:// www.fmrib.ox.ac.uk/fsl) (Smith et al., 2004).

STRUCTURAL SEGMENTATION AND WM TRACTOGRAPHY
First, the structural T1-weighted image was first segmented into gray matter (GM), WM and cerebrospinal fluid (CSF) in the CIVET pipeline (http://wiki.bic.mni.mcgill.ca/index.php/ CIVET). Then the individual T1-weighted image was coregistered to the b = 0 image through a linear transformation which is applied to the segmented WM mask. Within each WM voxel, eight seeds were started and evenly distributed over the volume of the voxel. A streamline was started from each seed following the primary diffusion direction from voxel to voxel, thus reconstructing the WM fibers. The tractography was terminated if it turned at an angle greater than 45 degrees (Mori et al., 1999). Tens of thousands of streamlines were generated to etch out all of the major WM tracts. Diffusion tensor tractography was implemented with the Diffusion Toolkit (http://trackvis.org/) using the "fiber assignment by continuous tracking" method (Mori et al., 1999) and was visualized in the TrackVis program (http:// trackvis.org/).

NETWORK NODE DEFINITION
To investigate the effects of different parcellation schemes on the network topological architecture and reliability, we used the two most common cortical parcellation methods (surfaceand volume-based parcellations) to define network nodes. Both parcellation methods were based on the volumetric Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002) in which 80 cortical areas were selected ( Table 2).
1) Volume-based parcellation: the detailed procedure of the volume-based parcellation has been previously described (Gong et al., 2009a;Shu et al., 2011) and was performed using SPM software (http://www.fil.ion.ucl.ac.uk/spm/software/ spm8). Briefly, the coregistered T1-weighted image was nonlinearly normalized to the nonlinear asymmetric ICBM152  (Fonov et al., 2009) in the Montreal Neurological Institute (MNI) space. The inverse transformations were used to warp the AAL atlas from the MNI space to the diffusion native space. Discrete labeling values were preserved with the nearest neighbor interpolation method. 2) Surface-based parcellation: The surface-based parcellation was performed using the CIVET pipeline (http://www.bic.mni. mcgill.ca/ServicesSoftware/CIVET). A detailed description of the analysis can be found in He et al. (2007). The T1-weighted image was registered into the stereotaxic space using a linear transformation (Collins et al., 1994) and was further segmented into GM, WM, CSF and background using an advanced neural net classifier (Zijdenbos et al., 2002). The internal surfaces of GM and the interface of WM and GM, each consisting of 40,962 vertices in the brain per hemisphere, were then automatically extracted using the Constrained Laplacianbased Automated Segmentation with Proximities (CLASP) algorithm (MacDonald et al., 2000;Kim et al., 2005).The labels of the cortex were assigned by a surface-based AAL atlas on the average 150 normal brains template (MacDonald et al., 2000).
Using the above procedures, we obtained 80 cortical regions (40 for each hemisphere; Table 2) of each subject in diffusion native space through two parcellation methods, each representing a node of the network. In addition to the parcellation scheme using 80 nodes in AAL template (L-AAL), we also used a highresolution (∼1000 parcels) parcellation (H-1024) by randomly subdividing the AAL atlas into 1024 regions with equal size both in the volume and in the average cortical surface of 150 normal brains. Therefore, for surface and volume-based parcellations, both L-AAL and H-1024 WM networks with different nodal scales were constructed (Figure 1).

NETWORK EDGE DEFINITION
Based on whole-brain tractography and cortical parcellation, two regions were considered structurally connected if at least one fiber streamline with two end points were located in these two regions. For the weighted WM networks, we defined the fiber number (FN) of interconnecting streamlines between two regions as the weights of the network edges (Shu et al., 2011;Cheng et al., 2012;van den Heuvel et al., 2012). Therefore, both L-AAL and H-1024 FN-weighted WM networks from surface-and volume-based parcellations were constructed for each participant, respectively (Figure 1).

NETWORK ANALYSIS
To characterize the topological organization of WM structural networks, several graph measures were considered, as follows: network strength (S p ), global efficiency (E glob ), local efficiency (E loc ), shortest path length (L p ), clustering coefficient (Van Essen et al.) and small-world parameters (λ, γ, and σ) (Rubinov and Sporns, 2010). For regional characteristics, we considered the nodal strength and nodal efficiency (Achard and Bullmore, 2007). Moreover, we investigated the rich-club organization of WM networks (van den Heuvel and Sporns, 2011). For a recent review on the uses and interpretations of these network measures, refer to Rubinov and Sporns (2010). See Appendix for the detailed definitions and mathematical expressions of the graph metrics used in the present study. All network analyses were performed using inhouse GRETNA software (http://www.nitrc.org/projects/gretna/) and visualized using BrainNet Viewer software (http://www.nitrc. org/projects/bnv/) .

TRT RELIABILITY
To evaluate the TRT reliability of the network metrics between two sessions, a measurement of ICC was employed. The ICC value was calculated as (Shrout and Fleiss, 1979): where σ bs is the between-subject variance, σ ws is the within subject variance, and m represents the number of repeated measurements (here, m = 2). ICC is a normalized measure which has a maximum of 1. The ICC values were categorized into five common intervals (Landis and Koch, 1977): 0 < ICC ≤ 0.2 (slight), 0.2 < ICC ≤ 0.4 (fair), 0.4 < ICC ≤ 0.6 (moderate), 0.6 < ICC ≤ 0.8 (substantial), and 0.8 < ICC ≤ 1.0 (almost perfect). Negative ICCs, implying negative reliability (i.e., completely non-reliable), are theoretically difficult to interpret (Rousson et al., 2002) and reasons for negative ICC values are unclear (Muller and Buttner, 1994). Therefore, we set negative ICCs to zero, as suggested in other test-retest studies using the ICC (Kong et al., 2007;Braun et al., 2012).

STATISTICAL ANALYSIS
To test the differences of the reliability of network properties derived from different procedures of network construction and the reliability differences across regions and edges, the repeated ANOVA was performed with SPSS software (version 13.0; SPSS, Chicago, Ill). Moreover, the correlation of the network metrics between the two sessions was calculated by Pearson's correlation using an in house Matlab (The MathWorks, Inc.) program.

TRT RELIABILITY FROM CONVENTIONAL dMRI AND SUBSAMPLED MULTIBAND dMRI
To compare the reproducibility of network metrics between multiband dMRI and conventional dMRI, we further investigated the TRT reliability of WM networks constructed from a conventional dMRI dataset with 30 gradient directions (conv-dMRI-30grad), Moreover, to remove the possible effects of the number of gradient directions on the reliability and make results more comparable, we also investigated the TRT reliability of WM networks constructed from subsampled multiband dMRI data with 30 gradient directions (multi-dMRI-30grad).
1) Conventional dMRI dataset: Eleven right-handed subjects (3 females, mean age 28.0 ± 5.0 years) without history of neurological or psychiatric disorders were included. Each participant received test-retest dMRI scans (at least 1 week apart) using a Siemens Trio 3T scanner at the Imaging Center for Brain Research, Beijing Normal University. The dMRI images were acquired using a single-shot twice-refocused spin-echo  6) The reconstruction of all WM fibers in the brain was performed using deterministic tractography using the Diffusion Toolkit (C). (7) The weighted networks of each subject were created by computing the number of streamlines that connected each pair of brain regions. Both low-(L-AAL) and high-resolution (H-1024) WM networks based on different parcellation approaches (surface and volume) were constructed for each subject (H), which are represented by the abbreviations of SurL, SurH, VolL, and VolH, respectively.
Based on the conventional and subsampled multiband dMRI datasets, both the high-and low-resolution weighted WM networks with surface and volume based parcellations were constructed with the same procedures as performed for the original multiband dMRI dataset (multi-dMRI-128grad). Then the ICC values of the global network metrics from each dMRI dataset were calculated.

TRT RELIABILITY OF BINARY WM NETWORKS
To remove the possible effects of the weighting scheme on the inter-subject variability, both the high-and low-resolution WM networks with surface and volume based parcellations from the multiband dMRI were binarized and global metrics based on the unweighted networks were calculated. Then the ICC values of the global network metrics from two sessions were computed.

RESULTS
First, we examined the architectural characteristics of weighted WM structural networks for the new multiband sequence. Then Frontiers in Human Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 59 | 5 the TRT reliability of WM structural networks derived from the multiband dMRI data was investigated and reported in four levels: global metrics, regional metrics, structural connectivity and richclub organization.

CONSERVED TOPOLOGICAL ARCHITECTURE
For the L-AAL network constructed from surface-and volumebased parcellations, the WM networks are sparse with a group mean sparsity of 17.5 and 20.1%, respectively. For the H-1024 WM network, the sparsities are about 1.4 and 1.9% for different parcellations. Low wiring cost of the structural connectivity network is observed, consistent with findings from conventional EPI sequence (Gong et al., 2009a;Bullmore and Sporns, 2012). Compared with random networks, the brain WM networks showed the similar shortest path length and higher clustering ( Table 3), suggesting a prominent small-world architecture regardless of different strategies for network construction. Together, these results indicate that WM networks obtained from multiband dMRI data exhibit conserved topological architecture as those derived from conventional dMRI data (Table 3). Figure 2A shows the TRT reliability of global network metrics under different procedure choices. Generally, most global network parameters exhibited moderate to high reliability (ICC > 0.52) regardless of the construction procedure. Only the lambda from L-AAL network with surface-based parcellation had a relatively low reproducibility (ICC = 0.22). Global network measures can be further classified into first and second order metrics where the first order metrics include strength, L p , C p , global and local efficiency, and the second order metrics include small-world parameters (λ, γ, and σ), which are normalized by the metrics of random networks (Bassett et al., 2011). Using a repeated ANOVA in which order was treated as a categorical factor and parcellation and resolution were treated as repeated measures, we found that the first order metrics, such as strength and efficiency, are more reliable than the second order metrics (p = 0.0009, Partial Eta Squared = 0.86) ( Figure 2B).

TRT RELIABILITY OF GLOBAL NETWORK METRICS
Given that particular choices of construction options (i.e., cortical parcellation and network resolution) can make significant differences in network topological parameters, we next evaluated which construction scheme performed the best at modeling the brain networks from the perspective of TRT reliability. A Two-Way repeated ANOVA in which parcellation and resolution were treated as repeated measures showed a significant main effect of parcellation (p = 0.002, Partial Eta Squared = 0.81), where post-hoc comparisons confirmed that the volume-based parcellation yielded more reproducible results than the surfacebased parcellation ( Figure 2B). Meanwhile, a significant main effect of resolution was found, which revealed an increasing  reproducibility of global metrics at finer spatial resolutions (p = 0.002, Partial Eta Squared = 0.82) regardless of parcellations ( Figure 2B). No significant interactions of parcellation × resolution were found (p > 0.1) ( Figure 2B). Figure 3 shows the nodal strength (A) and efficiency (B) of all regions (averaged over subjects) from the surface-(top) and volume-based parcellations (bottom). Between two sessions, highly significant correlations of nodal properties across all nodes were observed (all r > 0.94). Moreover, highly similar distributions of hub regions (nodal strength > mean + std) were observed between the two sessions, regardless of the network construction procedures (Figure 3). For the L-AAL network, the hub regions were mainly located in the bilateral middle temporal gyri, superior and middle frontal gyri, precuneus, precentral gyrus, postcentral gyrus and supplementary motor area for both parcellations. While for the H-1024 network from surface-based parcellation, the hub regions were distributed in the bilateral temporal gyri, superior and middle frontal gyri, precuneus, anterior and median cingulate and paracingulate gyri, precental and postcentral gyrus, fusiform gyrus and insula. For the volume-based parcellation, more regions in the bilateral temporal gyri, superior and middle occipital gyrus and fewer regions in the superior and middle frontal gyri were identified as hubs compared with the network from the surface-based parcellation (Figure 3). Figure 4 shows the TRT reliability of nodal strength (A) and efficiency (B) under different construction procedures. Across parcellations, most of regions of the L-AAL network exhibited moderate to high reproducibility (surface: nodal strength ICC = 0.70; nodal efficiency ICC = 0.70; volume: nodal strength ICC = 0.75; nodal efficiency ICC = 0.75) except the right posterior cingulate cortex, left insula, right superior parietal gyrus and paracentral lobule. For the H-1024 network, the ICC values across most regions also ranged from moderate to high (surface: nodal strength ICC = 0.56; nodal efficiency ICC = 0.58; volume: nodal strength ICC = 0.62; nodal efficiency ICC = 0.72). When categorizing the cortical regions into three regional classes (primary, association and paralimbic) (Mesulam, 1998) (Figure 5A), a repeated ANOVA was performed in which nodal metric was treated as repeated measures while regional class, parcellation and resolution were treated as categorical factors. An interaction between regional class and network resolution (p < 0.0001, Partial Eta Squared = 0.02) and a significant main effect of regional class (p < 0.0001, Partial Eta Squared = 0.37) in the L-AAL network were observed ( Figure 5B). Further post-hoc comparisons showed that the association and primary cortices exhibit a higher reliability than the paralimbic/limbic regions (p < 0.0001) for only the L-AAL network ( Figure 5B). Additionally, the relationship between the nodal properties and their corresponding ICC values was investigated. The correlation results indicated that under both low-and highresolutions, regions with higher nodal strength or efficiency tend to have larger ICC values (all p < 0.001) (Figure 4). In other words, the properties of densely connected hub regions show higher reproducibility than those of peripheral non-hub regions. When focusing on the effects of cortical parcellation and network resolution on the reproducibility of nodal strength and efficiency, a repeated ANOVA was performed in which nodal metric was treated as repeated measure, parcellation and resolution were treated as categorical factors while the effect of regional class was averaged. The L-AAL network showed higher nodal ICCs than the H-1024 network (p < 0.0001, Partial Eta Squared = 0.02) (Figure 6). And the volume-based parcellation yielded higher nodal ICCs than the surface-based parcellation (p = 0.0003, Partial Eta Squared = 0.01) (Figure 6). An interaction between nodal metric and network resolution (p = 0.001, Partial Eta Squared = 0.01) was observed and nodal efficiency showed significantly higher ICCs than the nodal strength (p < 0.0001, Partial Eta Squared = 0.06) in the H-1024 network (Figure 6). Overall, the L-AAL network with volume-based parcellation exhibited the highest reproducibility in terms of nodal properties. Figure 7A shows the average matrices of WM connections across subjects for each session. Between two sessions, highly significant correlations of edge weights across all edges were observed, especially for the L-AAL network (all r > 0.9) (Figure 7B). To assess the intra-session reliability of the WM connectivity, we first detected significantly consistent connections across subjects, by performing a nonparametric one-tailed sign test. For each pair of brain regions, the sign test was performed with the null hypothesis that no connection exists ["fiber bundle number = 0" (p < 0.05)]. Nonzero connections within either session groups were detected and assigned the average edge weight (number of interconnecting streamlines between two regions) across subjects and sessions to combine as a backbone network. Figure 8A  A Two-Way ANOVA in which parcellation and resolution were treated as categorical factors revealed that surface-and volumebased parcellations have similar edge ICCs (p = 0.6), but the L-AAL network showed higher edge ICCs than the H-1024 network (p < 0.0001, Partial Eta Squared = 0.02). Additionally, we found that the ICC values are positively correlated with the edge weights (connection strength) (Figure 8C), suggesting that the stronger connections tend to be more reproducible than the weak ones.

TRT RELIABILITY OF RICH-CLUB ORGANIZATION
To quantify the reliability of the rich-club organization, we calculated the normalized rich-club coefficient (RC) of the backbone network according to van den Heuvel and Sporns (2011) under a range of thresholds. The normalized RC values were greater than 1 under each network construction procedure ( Table 4), suggesting a characteristic rich-club organization. Furthermore, the nodes of the backbone network were classified into hubs (nodal strength > mean + std) and non-hubs. Correspondingly, edges were classified onto rich-club connections, which link hub nodes to hub nodes; feeder connections, which link hub nodes to non-hub nodes; and local connections, which link between non-hub nodes ( Figure 9A). The reliability of the different hub categories of regions and edges were investigated using a Three-Way ANOVA in which parcellaion, resolution and hub category were treated as categorical factors. ANOVA analyses indicated that the reliability of hub regions was higher than that of non-hub regions (p < 0.0001, Partial Eta Squared = 0.01) regardless of the construction procedure (Figure 9B), consistent with the above finding that regions with higher nodal strength tend to have greater ICC values. For the connections, a significant effect of the edge category was observed (p < 0.0001, Partial Eta Squared = 0.01), and post-hoc comparisons confirmed that the reliability of rich-club connections is significantly higher than that of feeder (p = 0.0001) and local connections (p < 0.0001), and the reliability of feeder connections is significantly higher than that of local connections (p < 0.0001) ( Figure 9C). Figure 10 shows the TRT reliability of global network metrics from the conventional dMRI and subsampled multiband dMRI datasets. A significantly progressive increase of ICC values in the global network metrics from the conventional dMRI, the subsampled multiband dMRI to the original multiband dMRI was identified by a repeated ANOVA (p < 0.0001, Partial Eta Squared = 0.54). The conventional dMRI dataset showed an overall decrease of reproducibility in all network metrics regardless of the construction procedures, except for the lowresolution network with volume-based parcellation. The subsampled multiband dMRI data also exhibited significantly decreased reliability than the original multiband dMRI, especially in the small-world parameters from the volume-based low-resolution networks.   Figure 11 shows the TRT reliability of global network metrics for both binarized and weighted WM networks from the multiband dMRI dataset. Lower ICC values of the global network metrics were found for the binary networks compared with the weighted WM networks by a paired two-sample t-test (p = 0.002, Partial Eta Squared = 0.27).

DISCUSSION
In the present study, we investigated the reliability of weighted WM structural networks constructed from multiband dMRI data with two repeated scanning sessions. Our primary results can be summarized as follows: First, conserved topological architecture of WM structural networks constructed from the mEPI sequence was observed, such as low wring cost, small-worldness and highly connected hub regions. Second, most of the weighted WM network metrics exhibited a high TRT reliability, especially the first order metrics are more reliable than the second order metrics (a partial eta squared value around 0.8), suggesting the potential utility in clinical applications of the new sequence. Third, different procedures of network construction have an effect on the network reliability. For example, networks with volumebased parcellation and high spatial resolution are more reliable than those with surface-based parcellation and low resolution, respectively. Moreover, WM networks from the multiband dMRI showed higher reproducibility compared with those from the conventional dMRI. Additionally, the network reliability varies across regions and edges, although with relatively low effect sizes (partial eta squared values less than 0.1). These findings provide reference and guidance for the future network studies using this new sequence. Generally, the ICC values obtained in our study are comparable with the findings of previous WM network studies (Vaessen et al., 2010;Bassett et al., 2011;Cheng et al., 2012;Buchanan et al., 2014). Compared with the conventional dMRI, the multiband dMRI data showed higher reliability of global metrics of WM networks, and with a large effect size (Partial Eta Squared = 0.54). For the mEPI sequence, the high reproducibility of network metrics may be attributed to the relatively short scan time that can minimize the effects of head motion and can increase the reliability of fiber orientation estimation from the dMRI data with hundreds of gradient directions. However, the differences in the subjects and acquisition parameters (e.g., different slice thickness of T1 images) between the conventional and multiband dMRI datasets may have an effect on the comparison of the TRT reliability. Future comparisons with the same cohort and same acquisition parameters should be warranted.
The comparisons of parcellation methods and network resolutions offer certain insights into network reliability. First, in all cases, networks with volume-based parcellation showed better TRT reliability than the surface-based parcellation, in terms of both the global (Partial Eta Squared = 0.81) and local ICCs (Partial Eta Squared = 0.01). These results may be due to more WM seed voxels in volume-based parcellation. More WM seed points produce more robust tractography results, which can be supported by the findings of improved TRT reliability of structural networks seeding from WM rather than GM (Buchanan et al., 2014). However, investigation of other parcellation approaches merits further investigation; notably, approaches based on the individual landmarks of gyri and sulci without a template (Hagmann et al., 2008) may reduce the bias caused by registration errors. Second, the high resolution network exhibited an overall higher TRT performance than the low resolution network in terms of global network metrics with a large effect size (Partial Eta Squared = 0.82), whereas the low resolution performs better in terms of local (region and edge) properties with a relatively low effect size (Partial Eta Squared = 0.02). Consistent with our findings, Bassett et al. (2011) also found an increasing reproducibility of global metrics in all atlases at finer spatial resolutions. For the local properties, ROIs in lowresolution networks with bigger size are more possible to be connected by larger fiber tracts, avoiding the contamination from different structures, whereas smaller ROIs in high-resolution network are more easily impaired by the false positive streamlines FIGURE 10 | The TRT reliability from conventional dMRI and subsampled multiband dMRI dataset. The ICC values of global network metrics from low to high were presented with colorbars from blue to red. The results showed a progressive increase of ICC values in the global network metrics from the conventional dMRI (conv-dMRI-30grad), the subsampled multiband dMRI (multi-dMRI-30grad) to the original multiband dMRI (multi-dMRI-128grad).
with a lower SNR ratio but a more homogeneous fiber distribution (Parker et al., 2003). Therefore, specific methodological choice will affect the applicability of network topology-related approaches. Moreover, the weighting scheme also has an effect on the network reliability. We found the binary WM networks showed poorer reliability than the weighted networks. The increased reliability of weighted networks may be partly due to the increased inter-subject variability introduced by the weighting scheme, which contains both real connectome differences and other biases, such as the effects of brain size on the fiber tractography. Binary network can partly overcome such problem by avoiding the variability in fiber numbers, which also has its own drawbacks, such as how to threshold the network (Buchanan et al., 2014;Duda et al., 2014). Detailed investigation of the effects of different weighting schemes on the reproducibility of graph metrics for the multiband sequence is needed in the future.
On a more methodological note, we found significant differences in reliability between graph metrics. For global metrics, the first order graph metrics (such as shortest path length and efficiency) were more reliable than second order metrics (such as small-world parameters), with a large effect size (Partial Eta Squared = 0.86). This result is consistent with the findings from MEG data (Deuker et al., 2009), but in contrast with results obtained from rs-fMRI data (Braun et al., 2012). The worse reliability of second order metrics may be caused by the normalization of the metrics of random networks, which may also indicate an increased sensitivity to measurements such as short term changes in the WM structure (Tang et al., 2010). For nodal metrics, the nodal efficiency is more reliable than the nodal strength, especially for the high-resolution WM networks, with a relatively low effect size (Partial Eta Squared = 0.06). However, a previous rs-fMRI study  showed that the nodal degree showed higher reliability than other nodal metrics in the binary functional networks. These results suggest that the reliability of the same graph metrics can be influenced by the imaging modalities, strategy of nodal or edge definitions and network construction procedures. In future studies, selecting specific metrics with high reliability for specific modality and methodological choice should have high priority.
The reproducibility varied across regions and exhibited spatially heterogeneous distribution. We found that most of the regions (>75%) showed moderate to high reproducibility under all construction methods, except several regions located in the left olfactory cortex, left insula, left middle temporal gyrus, right gyrus rectus, right orbital frontal gyrus, right posterior cingulate cortex, right superior parietal gyrus and paracentral lobule. Some of those regions were also identified as showing poor estimated ICC values in a recent test-retest study of the dMRI network obtained from conventional EPI sequence (Buchanan et al., 2014). Bassett et al. (2011) also found certain less reproducible regions in the inferior temporal and occipital cortices. These similar results revealed that certain regions with inherent instability are driven by anatomy or technique limitations, such as magnetic susceptibility (Vargas et al., 2009). Moreover, we found that the more densely connected regions tend to have higher reliability, due to less influence by the bias from noise or limitations of tractography algorithms. In future studies with mEPI, results regarding these regions especially which showed low reliability in our study should be interpreted with caution.
According to the functional roles in information processing (Mesulam, 1998), the brain regions can be categorized into three classes, including association, primary and paralimbic/limbic regions. For the low-resolution WM network, the ICCs of association and primary regions were significantly higher than the paralimbic/limbic regions (Partial Eta Squared = 0.37) and 72% of regions that show low ICC values were located in the paralimbic/limbic cortices. This result may be induced by the smaller ROI size in the paralimbic/limbic regions in the AAL template (surface: association = 2.9 × 10 3 mm 3 , primary = 2.7 × 10 3 mm 3 , paralimbic/limbic = 1.4 × 10 3 mm 3 ; volume: association = 1.9 × 10 4 mm 3 , primary = 1.8 × 10 4 mm 3 , paralimbic/limbic = 8.9 × 10 3 mm 3 ). As mentioned above, the smaller ROI size can be easily biased by the image noise, partial volume effects and registration errors. Another possible reason is the high anatomical variability of paralimbic/limbic tracts, such as the uncinate fasciculus and cingulum bundles (Burgel et al., 2006).
For the structural connectivity, the reliability varies across edges. There are several sources that contribute to the variation of the edge weights (number of streamlines). Image noise, spatial resolution, dMRI gradient encoding, and partial volume effects may affect the quality of fiber quantification. The tractography algorithm (Bastiani et al., 2012), including the number of random seeds in fiber tracking, can also have a slight effect on the variance of the network. Specifically, fewer random seeds will lead to a larger variance in the number of fibers from fiber tracking, although the effect in this study was diminished by choosing eight seeds per voxel in fiber tracking. In addition, the reliability of network construction also relies on the accuracy of parcellation and the mapping during image registration. The parcellation can have errors due to SNR limitations of the T1weighted image or the algorithm itself. The registration between the T1-weighted image and the dMRI image can also have errors due to image distortion and partial volume effects. All of these factors affect the TRT reliability of the structural connectivity and networks.
Importantly, we investigated the reliability of the rich-club organization of WM networks. First, we found hubs regions and rich-club connections were more reliable than non-hub ones with a low effect size (Partial Eta Squared = 0.01). This is consistent with the findings of positive correlations between ICCs with nodal strength and edge weights. As the hub regions are more densely interconnected than the other brain regions and have a large influence on overall network organization, hubs are essential in supporting the performance of high cognitive functions of the human brain by integrating specialized brain regions into coordinated networks (van den Heuvel and Sporns, 2013). Buckner et al. (2009) demonstrated that the topography of human brain cortical hubs is highly similar across populations and robust against task states, therefore reflecting a stable property of brain functional architecture. Previous studies consistently revealed similar and stable hub distributions of WM networks across subjects from different samples (Hagmann et al., 2008;Gong et al., 2009a;Zalesky et al., 2010;Bassett et al., 2011;van den Heuvel and Sporns, 2013). This result is also in parallel with the findings from functional MRI data Liao et al., 2013), which indicate that the reliable regions qualitatively tend to serve as hubs in intrinsic functional brain networks. The high reliability of hub regions and rich-club connections indicated that rich-club organization is a stable metric with commendable potential utility in clinical applications. There are some methodological issues need to be addressed. First, we included only 11 subjects in the present study, large samples with more subjects in practical studies is necessary to obtain sufficient statistical power. Second, investigation of the effects of different acquisition parameters, gradient sampling schemes and advanced diffusion modeling approaches, such as application of higher order models to disentangle crossing fiber structures (Tournier et al., 2008), on the reproducibility of network metrics for this new sequence would be interesting, but was unfortunately outside the scope of this paper. Finally, when considering the influence of potential variations in WM structure, it is important to consider the tradeoff between the reliability and sensitivity of network metrics. In future studies, several measures (e.g., the coefficient of variation) can be further developed to comprehensively characterize the sensitivity of network metrics over scanning sessions (Lachin, 2004;Bassett et al., 2011).

APPENDIX NETWORK STRENGTH
For a network (graph) G with N nodes and K edges, we calculated the strength of G as follows: where S(i) is the strength of a node, which is the sum of the edge weights w ij (fiber number) linking to node i. The strength of a network is the average of the strength across all of the nodes in the network.

SMALL-WORLD PROPERTIES
Small-world network parameters (clustering coefficient, C p , and shortest path length, L p ) were originally proposed by Watts and Strogatz (1998). In this study, we investigated the small-world properties of the weighted brain networks. The clustering coefficient of a node i, C(i), which was defined as the likelihood of whether the neighborhoods were connected with each other, was computed as follows: where k i is the degree of node i andw is the weight, which is scaled by the mean of all weights to control each participant's cost at the same level. The clustering coefficient is zero (C(i) = 0) if the nodes are isolated or have just one connection, i.e., k i = 0 or k i = 1. The clustering coefficient, C p , of a network is the average of the clustering coefficient over all nodes and indicates the extent of the local interconnectivity or cliquishness in a network (Watts and Strogatz, 1998).
The path length between any pair of nodes (e.g., node i and node j) is defined as the sum of the edge lengths along this path. For weighted networks, the length of each edge was assigned by computing the reciprocal of the edge weight, 1/w ij . The shortest path length, L ij , is defined as the shortest length among the lengths of all possible paths between node i and node j. The shortest path length of a network was computed as follows: where N is the number of nodes in the network. The L p of a network quantifies the ability for information to propagate in parallel.
To examine the small-world properties, the clustering coefficient, C p , and the shortest path length, L p , of the brain networks were compared with those of random networks. In this study, we generated 100 matched random networks that had the same number of nodes and edges and the same degree distribution as real networks (Maslov and Sneppen, 2002). Notably, we retained the weight of each edge during the randomization procedure such that the weight distribution of the network was preserved. Furthermore, we computed the normalized L p , λ = L real p /L rand p , and the normalized C p , γ = C real p /C rand p , where L rand p and C rand p are the mean C p and the mean L p of 100 matched random networks, respectively. Importantly, the two parameters correct the differences in the edge number and degree distribution of the networks across individuals. A real network would be considered small-world if γ > 1 and λ ≈ 1 (Watts and Strogatz, 1998). Thus, a small-world network not only has a higher local interconnectivity but also has a shortest path length approximately equivalent to random networks. These two measurements can be summarized into a simple quantitative metric, small-worldness, σ = γ /λ, which is typically greater than 1 for small-world networks (Humphries and Gurney, 2008).

NETWORK EFFICIENCY
The global efficiency of G measures the global efficiency of the parallel information transfer in the network (Latora and Marchiori, 2001), which can be computed as follows: where L ij is the shortest path length between node i and node j in G.
The local efficiency of G reveals how much the network is fault tolerant and shows how efficient the communication is among the first neighbors of the node i when it is removed. The local efficiency of a graph is defined as follows: where G i denotes the subgraph composed of the nearest neighbors of node i.

REGIONAL CHARACTERISTICS
To determine the nodal (regional) characteristics of the WM networks, we computed the nodal strength and efficiency. The nodal strength S(i) is defined as the sum of all of the edge weights between this node and all of the other nodes in the network. The nodal efficiency, E nodal (i) is defined as (Achard and Bullmore, 2007): where L ij is the shortest path length between node i and node j in G. E nodal (i) measures the average shortest path length between a given node i and all of the other nodes in the network.

RICH-CLUB ORGANIZATION
A "rich-club" in networks is defined as the phenomenon that the high-degree nodes of a network tend to be more densely connected among themselves than is expected by chance (Colizza et al., 2006;McAuley et al., 2007). The brain's rich-club has been described previously (van den Heuvel and Sporns, 2011; van den Frontiers in Human Neuroscience www.frontiersin.org February 2015 | Volume 9 | Article 59 | 17