Test–Retest Reliability of Magnetoencephalography Resting-State Functional Connectivity in Schizophrenia

The reliability of magnetoencephalography (MEG) resting-state functional connectivity in schizophrenia (SZ) is unknown as previous research has focused on healthy controls (HC). Here, we examined reliability in 26 participants (13-SZ, 13-HC). Eyes opened and eyes closed resting-state data were collected on 4 separate occasions during 2 visits, 1 week apart. For source modeling, we used minimum norm software to apply dynamic statistical parametric mapping. Source analyses compared the following functional connectivity metrics from each data run: coherence (coh), imaginary coherence (imcoh), pairwise phase consistency (ppc), phase-locking value (plv), phase lag index (pli), weighted phase lag index (wpli), and weighted phase lag index debiased (wpli2). Intraclass correlation coefficients (ICCs) were calculated for whole brain, network, and network pair averages. For reliability, ICCs above 0.75 = excellent, above 0.60 = good, above 0.40 = fair, and below 0.40 = poor reliability. We found the reliability of these metrics varied greatly depending on frequency band, network, network pair, and participant group examined. Broadband (1–58 Hz) whole brain averages in both HC and SZ showed excellent reliability for wpli2, and good to fair reliability for ppc, plv, and coh. Broadband network averages showed excellent to good reliability across 1 hour and 1 week for coh, imcoh, ppc, plv, wpli within default mode, cognitive control, and visual networks in HC, while the same metrics had excellent to fair reliability in SZ. Regional network pair averages showed good to fair reliability for coh, ppc, plv within default mode, cognitive control and visual network pairs in HC and SZ. In general, HC had higher reliability compared to SZ, and the default mode, cognitive control, and visual networks had higher reliability compared to somatosensory and auditory networks. Similar reliability levels occurred for both eyes opened and eyes closed resting-states for most metrics. The functional connectivity metrics of coh, ppc, and plv performed best across 1 hour and 1 week in HC and SZ. We also found that SZ had reduced coh, plv, and ppc in the dmn average and pair values indicating dysconnectivity in SZ. These findings encourage collecting both eyes opened and eyes closed resting-state MEG, while demonstrating that clinical populations may differ in reliability.


INTRODUCTION
Magnetoencephalography (MEG) is an advantageous neuroimaging tool to study psychosis due to its safety as a noninvasive test, along with the high dimensional data it provides on neuronal activity, oscillatory dynamics, and connectivity at a millisecond time scale. The spontaneous oscillatory signals captured by MEG during a resting-state can be used to estimate neural interactions between brain regions and reveal network disorganization and abnormalities in schizophrenia (SZ) and other clinical populations. Despite the increasing prevalence of resting-state MEG research, studies examining the reproducibility and reliability of MEG-derived functional connectivity measures remain scarce, especially in clinical populations where reliability is critical for clinical application.
Previous resting-state MEG test-retest reliability has been evaluated in healthy controls (HC) (1)(2)(3)(4)(5), patients with depression (6), and patients with SZ (5). Most test-retest studies use intraclass correlation coefficients (ICCs) or Spearman correlations to report and categorize degree of reliability. To model and interpret an ICC, with values ranging from 0 to 1, excellent reliability is defined as ICC > 0.75, good reliability ICC = 0.75-0.60, fair reliability ICC = 0.59-0.40, and poor reliability ICC < 0.40 (7). In HC, MEG spectral power has good reliability in theta, alpha, and beta bands (ICCs > 0.6) over a 7 day testretest interval (3) and excellent reliability in theta-gamma bands (ICCs > 0.86) in global and regional spectral measures over both 1 hour and 1 week test-retest intervals (5). The reliability of MEG functional connectivity has varied greatly in HC depending on the connectivity metric used (1,2,5) and frequency band studied (4). For example, the reliability of phase-locking value (plv) in alpha, beta, and gamma bands average between ICCs = 0.74-0.82, but dip to ICCs < 0.1 when phase-lag index (pli) is used (2). Conversely, other studies have reported weighted phase-lag index (wpli) and the imaginary part of coherency had excellent to good reliability of global connectivity over 30 trials in alpha and theta bands, but often fair to poor reliability in other frequency bands and in vertex-based connectivity (4). It is clear given the variability of previous MEG functional connectivity findings that more research is needed to reach a consensus on which functional connectivity metric is best suited for MEG resting-state studies.
In clinical populations much less is known about the reliability of MEG functional connectivity metrics. Examining resting-state functional connectivity in patients with SZ can be especially informative given that SZ is often conceptualized as a disorder of altered brain connectivity by the disconnection hypothesis (8) with abnormal resting-state brain networks demonstrating disorganization (9). MEG functional connectivity abnormalities in patients with SZ, quantified by imaginary coherence (imcoh), include decreased left prefrontal cortex and right superior temporal cortex connectivity in alpha band which negatively correlated with negative symptoms, together with increased connectivity in left extrastriate cortex and right inferior prefrontal cortex (10). Other studies using spatial independent component analysis and pairwise correlations found hyperconnectivity within frontal and temporal networks in patients with SZ (11), information which was valuable in improving classification when combined with fMRI functional connectivity (12), in addition to hypoconnectivity between sensorimotor and task positive networks in the delta frequency band (13). Patients with SZ also have shown abnormalities in dynamic functional connectivity by changing meta-states more often than HC and exhibiting greater inter-individual variability (14), metrics which correlated with positive symptoms (15). For a more complete review of MEG abnormalities reported in SZ please refer to the following review papers (16)(17)(18). These previous findings, however, have not been replicated or shown test-retest reliability.
Recently we examined the 1 week reliability of MEG restingstate spectral power in a cohort of patients with SZ and HC. Overall we found that spectral power measures (power, normalized power, alpha reactivity) had excellent reliability for both HC and SZ in 1) global power averages in theta-gamma bands, 2) for all frequency bands across sensor regions, and 3) within parietal regions for alpha frequency (5). Furthermore, for patients, higher PANSS positive scores were negatively correlated with reduced parietal alpha normalized power. We also briefly examined a single functional connectivity metric, weighted phase lag index debiased (wpli2), and found poor reliability for the metric in both groups (5). The current study was designed as a follow-up to further explore other functional connectivity metrics which may perform better in patients with SZ. Where the previous study provided an in-depth analysis of MEG spectral power in patients with SZ and HC, the current study aims to provide an in-depth analysis of MEG functional connectivity in patients with SZ and HC.
The current study was designed to determine the test-retest reliability of MEG resting-state functional connectivity over 1 hour and 1 week intervals in psychosis. As such, it is one of the first studies to directly address MEG functional connectivity reliability in patients with SZ. MEG resting-state data were collected in 13 patients with SZ and 13 matched HC. Data were collected across 1 week (2 visits, 2 runs per visit). Each MEG session analyzed included both a 10 min and a 4 min rest session with rest phase alternating between an eyes open and eyes closed state. We hypothesized reliability would be lower in the patient group, when compared to HC, and that certain connectivity metrics, such as wpli2 would have poor reliability, similar to our previous study (5). Furthermore, when directly comparing functional connectivity metrics, we expected patients with SZ to have reduced connectivity when compared to HC, in line with (16). This study compared the reliability of various functional connectivity metrics in source space across 1 hour and 1 week intervals, using coherence (coh), imcoh (imaginary coherence), pairwise phase consistency (ppc), plv (phase-locking value), pli (phase-lag index), wpli (weighted phase lag index), and wpli2 (weighted phase lag index debiased). To determine reliability, ICCs were calculated and compared for whole brain averages, network connectivity averages, and regional connectivity pairs. Furthermore, ICCs were compared between patients with SZ and HC to determine which measures were most stable in a patient population.

Participants
The current study used existing data from 13 individuals diagnosed with SZ and 13 HC, age and gender matched (5). All participants were within 21-49 years of age, Table 1, and were compensated for their participation. Participant characteristics and procedures will be briefly described here, for further information on methods please refer to (5

MEG Behavioral Tasks
Visits occurred 7 days apart. In order to avoid circadian rhythm influence on reliability, time of day was matched between visits. The average time for return visits for HC was 7.54 days ± 60 min and for SZ was 7.84 days ± 51 min. During each visit, the hour long MEG scan began with a 10-min rest task and ended with a 4-min rest task. At the start of each task participants were instructed to monitor prompts to close their eyes or open their eyes and fixate on a white cross. As shown in Figure 1, each task alternated between equal phases of eyes closed and eyes opened. The 10-min task, herein referred to as Rest10, changed phase every 2.5 min, while the 4-min task, herein referred to as Rest4, changed phase every 2 min. In total, resting-state activity was recorded during 4 separate runs (Visit1_Rest10, Visit1_Rest4, Visit2_Rest10, Visit2_Rest4).

MEG Data Acquisition and Preprocessing
MEG data were collected with a 306-channel whole-head MEG system (Elekta Neuromag) in a magnetically shielded room (Vacuumschmelze-Ak3B) at the Mind Research Network in Albuquerque, New Mexico. Electro-oculogram and electrocardiogram channels were placed on the participant to monitor heartbeat and eyeblink artifacts. In addition, using three-dimensional digitization equipment (Polhemus FastTrack), four electromagnetic coils were registered to the nasion and preauricular points. During the tasks, data were sampled at 1,000 Hz and a Continuous Head Position Indicator (cHPI) was used to correct for motion. During each visit participants sat upright and head position was monitored closely. Average Euclidean distance was calculated for each task, see Table 1. Head position consistency was similar between HC and SZ (all p's > 0.31) (5). Using Neuromag MaxFilter 2.2 software, raw data were corrected for noise and head motion artifacts with the temporal extension of signal space separation (t-SSS) method with movement compensation (19,20). For equivalent sensor locations, head position was transformed between visits with the MaxFilter 2.2, MaxMove option. Using signal space projection (SSP) (21) in MNE software (22), data were cleaned from heartbeat and eye-blink artifacts. Any data that failed the automated process was visually inspected and SSPs to remove artifacts were generated manually. After ensuring the data were artifact-free, continuous files were segmented into 2 sec epochs. Epochs were rejected if the magnetic field exceeded 5 pT. Data quality was equivalent between groups (all p's > 0.25), see Table 1.

MEG Source Analysis
Similar to previous processing (5), the cortical surface of each participant was reconstructed from T1-weighted MRI files using FreeSurfer. To create a source space of 4.9 m with 4,098 locations per hemisphere, a repeatedly subdivided octahedron was used as the spatial subsampling method. In MNE software (22,23), dynamic statistical parametric mapping (dSPM) (24) was used to create an anatomically constrained linear estimation inverse model. The dSPM inverse model identified where the estimated current at each cortical surface vertex differed significantly from empty room data. Other data parameters were: depth weight of 0.8, loose constraint of 0.2, orientation of none, and signal-tonoise ratio of 3. A single layer (inner skull) boundary element method (25) was used to create the forward solution. A surfacebased source space was used to confine source locations to a fixed surface orientation. When using a fixed source space, loose/free orientations are not normed, leading to signed source activity.
Source estimates were derived from epoch files. Using the FreeSurfer DKT parcellation (26,27), average time series were extracted for 62 regional labels. The spectral connectivity computation performed in MNE software (version 0.19.0) (22,23) used multitaper spectrum estimation with 7 DPSS windows. The frequency bands used were defined as: broadband (1-58 Hz), delta (1-4 Hz), theta (5-8 Hz), alpha (9-13 Hz), beta (14-29 Hz), and gamma (31-58 Hz). The connectivity methods extracted were: coh, imcoh, ppc, plv, pli, wpli, and wpli2. Since an aim of the study was to compare available functional connectivity metrics, not create, or modify existing ones, we report connectivity values produced by MNE without modification. In the case of imcoh, MNE uses the original definition (28), not the absolute value, to calculate imcoh values. The results of the spectral connectivity computation were run through custom scripts in MATLAB (2019a, MathWorks) to create whole brain, network, and regional pair averages. Whole brain values were derived from averaging all 62 regional labels, network values were derived from averaging regional labels within predefined clusters (29), and regional pair values were predefined 1-to-1 select regional connections. The labeling used between functional and anatomical regions is shown in Table 2. These resting-state networks are semiindependent anatomical clusters of correlated brain activity commonly examined during rest (29). Regional pairs were chosen from the default mode, cognitive control, and visual networks. The pairs represent either unilateral or contralateral connecting nodes within the same resting-state network. The default mode pair was contralateral right hemisphere precuneus to left hemisphere medial orbitofrontal region, the cognitive control pair was unilateral left hemisphere inferior parietal to left hemisphere caudal middle frontal region and the visual network pair was contralateral left hemisphere lateral occipital to right hemisphere middle temporal, Table 2. These regions were chosen to maximize regional distance within networks in an effort to minimize the effects of signal leakage on connectivity measures.

Spectral Connectivity Estimation
Using MNE spectral connectivity commands, spectral connectivity was determined for the following 7 metrics: coh, imcoh, ppc, plv, pli, wpli, and wpli2. Coh is a generalization of correlation to the frequency domain, while imcoh is similar but is sensitive to synchronizations of two processes which are time-lagged to each other and avoids volume conduction artifacts by acknowledging that volume conduction does not cause a time-lag (28). Plv characterizes a stable phase relationship between two timecourses in a particular frequency band within

Network
Anatomical areas a predefined window (i.e., rhythmic neuronal synchronization) (30). Ppc is very similar to plv, but is bias-free and consistent with population parameter statistics by using an equivalent to squared plv (31). Pli uses similar information but improves upon plv by disregarding zero-lag phase differences (32). Furthermore, pli quantifies the asymmetry of the phase difference distribution and estimates the likelihood for a consistent phase lead or lag between signals from two sensors. Wpli builds upon phase lag index by weighting observed phase leads and lags by the magnitude of the imaginary component of the cross-spectrum (33). These additions reduce sensitivity to uncorrelated noise sources while increasing power. Wpli2 is a debiased estimator of the squared wpli which corrects for sample-size bias in phase-synchronization indices (33). Of the 7 metrics used, 2 are considered spectral coherence metrics (coh, imcoh) and 5 are considered phase estimation metrics (plv, ppc, pli, wpli, wpli2).
Because the goal of the study was to compare available functional connectivity metrics from an already available software package, values were reported without modification. The formulas used by MNE were according to original definition, meaning coh, plv, pli, wpli yielded absolute values, while imcoh did not.

Intraclass Correlation Coefficient
ICCs were calculated with SPSS (version 26 for Macintosh). We used a two-way mixed effects model with absolute agreement, single measurement criteria to estimate ICCs and their 95% confidence intervals. This is often referred to as an ICC (3,1) model. The equation for calculating the ICC is: Mean Square, MS B = Between-subjects Mean Square, and k = number of measurements (34)(35)(36). In a two-way mixed effects model, variance consists of 3 components: betweensubjects variance (between-subjects mean square), betweentests variance, and random error variance (residual mean squares). Furthermore, by specifying absolute agreement the model is described as: between-subjects variance/(betweensubjects variance + between-tests variance + random error variance) (37). ICCs ranged from 0 to 1 with higher values indicating better reliability, any negative values were rescored to zero. Following the guidelines of (7), we defined ICCs as: excellent reliability >0.75, good reliability 0.75-0.60, fair reliability 0.59-0.40, and poor reliability <0.40, similar to (5). ICCs were calculated over 4 timepoints to estimate an average across all 4 runs, and over 2 timepoints to estimate 1 hour and 1 week reliability. In the current study, each rest run was modeled with a fixed effects model, given that identical scanning parameters were used and task familiarity may have occurred. Meanwhile, subjects were modeled with a random effects model, given that sampling and recruitment was random and there was no reason to expect similarity in a spontaneous resting-state task.

Statistical Analysis
To look at group differences in functional connectivity metrics, data from a single visit (Visit1-Run1, Rest10, a 10 min restingstate task) was analyzed. Analysis of Variance (ANOVA) was performed using SPSS (version 26 for Macintosh) with the between-subjects factor of Group (HC, SZ). Each resting state (eyes open, eyes closed) was analyzed separately. The statistical threshold was set at p < 0.05 for each individual connectivity metric.

Whole Brain Reliability
Connectivity values within all 62 regional labels were averaged to create a whole brain, global reliability measure. As Figure 2 shows, global MEG connectivity reliability varied greatly depending on frequency band, connectivity measure, and participant group examined. Within the broadband (1-58 Hz) average ICC for HC, Figure 2A: there was excellent reliability  To compare whole brain functional connectivity metrics between groups (SZ, HC) the main effect of group was examined during Visit 1 for the Rest 10 task. As shown in Figure 3, there were no significant group effects for any of the global connectivity metrics for broadband (1-58 Hz) frequency, p's > 0.251, suggesting there were no differences between patients with SZ and HC in the 7 functional connectivity metrics at the global level. Individual frequency bands (delta-gamma) were not explored further since broadband analyses did not reveal a group effect in global connectivity.   To compare broadband network functional connectivity metrics between groups (SZ, HC) the main effect of group was examined during Visit 1 for the Rest 10 task. As shown in Average network reliability within frequency bands is shown in Figure 6. To help determine which measurement time (1 hour vs. 1 week) drove the average, Supplementary Figures 1,  2 further break down network reliability within frequency bands for 1 hour and 1 week. An alternative version of Figure 6, showing variability across networks is also provided in Supplementary Figure 3. As with broadband data, the somatosensory and auditory networks across all frequencies had the lowest reliability. Similar reliability levels were found in both resting-states for all metrics (average HC mean difference = 0.01, SZ mean difference = 0.02). To compare regional pair functional connectivity metrics between groups (SZ, HC) the main effect of group was examined during Visit 1 for the Rest 10 task. As shown in Figure 8, SZ had significantly reduced coh, plv, ppc in the dmn connectivity pair (precuneus right hemisphere to medial orbitofrontal cortex left hemisphere), during both eyes closed [Group effect for coh:  Figure 8A, when compared to HC.

DISCUSSION
Following source analysis various FNC metrics were compared, specifically coh, imcoh, ppc, plv, pli, wpli, and wpli2. The reliability of these metrics varied greatly depending on frequency band, network, network pair, and participant group examined.
To summarize a few key findings: (1) Broadband whole brain averages in both HC and SZ showed excellent reliability for wpli2, good to fair reliability for ppc, plv, and coh and poor reliability for imcoh, pli, and wpli, (2) Network averages showed  excellent to good reliability for coh, imcoh, ppc, plv, and wpli within default mode, cognitive control, and visual networks in HC, while the same metrics had excellent to fair reliability in SZ, (3) Regional network pair averages showed good to fair reliability for coh, ppc, and plv within default mode, cognitive control and visual network pairs, while imcoh, pli, wpli, and wpli2 all had poor reliability, (4) For both HC and SZ, the default mode, cognitive control, and visual networks had higher reliability compared to somatosensory and auditory networks, and (5) Eyes open and eyes closed states had similar reliability levels in HC and SZ for all metrics. When taken together, the results indicate functional connectivity reliability is highly dependent on connectivity metric, frequency band, and region or network size.
In HC, we confirmed some patterns of functional connectivity for certain metrics and frequency bands, which were in line with previous research. Our results in HC were fairly consistent with previous resting-state MEG studies, although there were a few differences. For example, in MEG whole-brain functional connectivity comparisons plv has been found to range from excellent to good reliability (ICC range 0.74-0.82) in alphagamma bands, while pli has been found to have poor reliability (ICCs < 0.1) for all frequency bands in both eyes open and eyes closed resting-states (2). Here, we found similar results that global functional connectivity averages for plv had excellent to good reliability across 1 hour and 1 week in delta-gamma bands and in broadband (ICC plv = 0.65), while pli had poor reliability in all bands across 1 hour and in broadband (ICC pli = 0.26). We also found that both eyes open and eyes closed resting states had similar reliability levels. However, in contrast to the previous paper, we found that pli had excellent reliability in beta, gamma, and broadband across 1 week. Another previous study examining MEG global whole brain reliability found that coh and wpli had good to excellent reliability in delta-gamma bands (4). In contrast to this, the present study found that imcoh and wpli generally had poor reliability in all frequencies with a few exceptions, such as fair to good reliability for imcoh during an eyes closed state in beta, gamma, and broadband and fair to excellent reliability for wpli across 1 week. A potential difference between the two studies may be how imcoh was calculated. Here, we reported the imcoh measure without using absolute values, which is the default formula used by MNE software and the original publication (28). To get undirected connectivity for imcoh, certain source models benefit from using imcoh absolute values, as (4) did. At the resting-state network level, there is often poor reliability in phase or coh based metrics which are robust to spatial leakage artifacts, such as pli and its derivatives, as well as imcoh (1). We found similar results here and reported that our network averages showed lower reliability in metrics robust to spatial leakage artifacts, e.g., imcoh, pli, wpli, and wpli2. It is interesting that both results showed low reliability for the metrics considering that we used an anatomical parcellation and dSPM algorithm, whereas the previous publication used a data-driven parcellation from fMRI along with a beamformer algorithm. Poor reliability of phase based metrics has also been seen in another MEG study which found poor reliability in pli in all frequency bands and networks, but good to excellent reliability in plv in alpha-gamma bands for visual, sensorimotor, auditory, and default mode networks (2). Here, we found excellent to good reliability for coh, plv, and ppc within default mode, cognitive control, and visual networks, with mixed reliability for pli and wpli dependent on network and interval length (1 hour vs. 1 week). Although graph analysis was not used in the current study, it should be mentioned that the reliability of those derived resting-state MEG functional connectivity networks have also been variable, ranging from poor to good (ICCs 0.256-0.655) depending on band and metric defining nodal centrality, with greatest reliability in eyes open resting state networks when assessed with Dnodal and Enodal metrics (38). Further direct comparison between our research and (1, 2, 4) is difficult to interpret given that each study used different source analysis and network modeling methods. Functional connectivity metrics based on phase-related connectivity can minimize the impact of spatial leakage and zero-lag synchronization, however, the estimates may be more variable in short or noisy recordings, or across connectivity pair data. It has been previously suggested that amplitude envelope correlation and partial correlation measures have higher reliability and are the most consistent functional connectivity methods for an MEG resting-state (1), however, those metrics were not tested in the current study. Phase-related connectivity metrics perform better when averaged across larger brain regions, more voxels, and in larger datasets, as sample size negatively impacts pli and imcoh metrics.
Magnetic field spread or spatial leakage artifacts is a problem in MEG functional connectivity estimation (39)(40)(41) that can influence measures of functional connectivity and artificially inflate reliability (1); therefore, metrics which avoid those confounds, such as imcoh, pli, wpli, phase slope index, amplitude envelope correlation, are generally recommended. However, the confounds introduced into connectivity estimation due to spatial leakage have been shown to be highly repeatable across scans and between subjects (1). Here, we also found that the metrics which are prone to spatial leakage, e.g., coh, plv, and ppc, generally have higher repeatability or reliability across sessions. The higher reliability may be spurious in nature, but coh, plv, and ppc remained consistently higher in reliability even when region size and number of regions averaged fluctuated in our data, e.g., throughout global, network, and regional pair data.
While signal leakage is expected to be highly reliable and may on the surface appear to influence the reliability of certain connectivity metrics, there are two additional factors which indicate that the reliability of coh, ppc, and plv is not solely attributed to signal leakage. First, signal leakage, especially for MEG, does not spread across the entire brain but remains relatively localized (41). In our data, the regional network pairs we selected represented "distant" intra network sources, e.g., parietal to anterior frontal or left occipital to right temporal, greatly decreasing the likelihood of the increased reliability of the coh, ppc and plv metrics to be influenced by signal leakage alone. Importantly also, these metrics that cannot directly eliminate the possibility of signal leakage also retain additional signal (zero-phase correlations) that metrics robust to signal leakage ignore. Invasive measures have demonstrated zero-phase correlations across broad regions of the brain indicating that not all zero-phase correlations are related to artifact alone, but contain real signal; therefore, eliminating all zero-phase correlations may reduce reliability by removing signal. There are other spatial leakage correction methods for MEG, besides removing zero-phase correlations, which may improve reliability, such as geometric correction scheme which removes spurious local connections without impacting dynamic hub regions and networks at rest (42) or adaptive cortical parcellations (43). In fact, it has been suggested that using a non-zero-lag connectivity metric does not obviate the need for adaptive parcellation. Based on this information, we consider the current results to support the reliability of coh, ppc, and pli across region, network and whole brain analyses.
Interestingly, our research found that the non-zero-lag connectivity metrics pli, wpli, and wpli2 had variable reliability depending on the size and number of regions averaged (i.e., as spatial resolution was increased, reliability decreased). Those metrics had their highest reliability in global averages, followed by network averages, and lowest reliability in individual network pair connections. For example, when wpli2 was not averaged across the whole brain or across a network, but in an individual network pair, HC and SZ groups both had poor reliability (HC ICC Avg = 0.14, SZ ICC Avg = 0.11), similar to previous results (5), yet global averages with all regions showed excellent reliability (HC ICC Avg = 0.85, SZ ICC Avg = 0.84). Even within network averages, the networks which contained more regions (default mode-14 regions, cognitive control-26 regions, and visual-12 regions) had higher reliability than networks defined by fewer regions (somatosensory-8 regions and auditory-2 regions). While some have used 2 or 3 nodes to characterize a resting-state network (2), we decided to use networks defined by the fMRI ICA resting-state network approach (12,29,44), a technique which has been successfully applied to MEG resting-state networks in SZ clinical populations (11,14,15).
Here, we showed several instances where patients with SZ had lower reliability in functional connectivity metrics, e.g., lower broadband whole brain averages and network averages when compared to HC. In our previous test-retest paper, which only examined wpli2 in select superior parietal regional connectivity pairs, we also found poor reliability in the metric for patients with SZ (ICC = 0.03), as well as HC (ICC = 0.12) (5). Although reliability was low, meaning significant effects were not consistent across run, we found instances of increased functional connectivity between superior parietal to lateral occipital and superior parietal to entorhinal connections in patients with SZ (5). However, these results may not replicate because the metric is unreliable. Previous research has shown that abnormal resting-state functional connectivity is a key process underlying SZ (9). While there are no other test-retest functional connectivity reliability studies to directly compare to, one study used imcoh and found decreased alpha-band connectivity in left prefrontal cortex and right superior temporal cortex together with increased connectivity in left extrastriate cortex and right inferior prefrontal cortex in patients with SZ (10). Our research would suggest that imcoh is a connectivity metric with low reliability in this patient population. We also found higher variability in ICC 95% confidence intervals (data not shown) in patients with SZ suggesting greater between subjects variability, similar to the dynamic functional connectivity finding that patients change meta-states more often than HC and exhibit greater inter-individual variability (14). Despite the difficulties in interpreting the functional meaning of lower reliability and higher variability in patients with SZ, the current findings are consistent with deficits in functional connectivity and neural oscillations previously reported (5,11,14,16,39,45). Aside from differences in ICC, here, we also found that patients with SZ had significantly reduced coh, plv, and ppc metrics in the default mode network average and pair (right precuneus to left medial orbitofrontal cortex) values when compared to HC, in the Rest10 task during Visit 1. We also found reductions for patients with SZ in coh, plv, and ppc metrics in the visual network pair, and for wpli2 in the auditory network during an eyes closed resting-state. When combined with the information that ICCs for these metrics (coh, plv, ppc) were relatively high, it implies the reduction in default mode functional connectivity seen in patients with SZ is somewhat stable during both eyes open and eyes closed resting-states.
A key question in examining test-retest reliability of resting state networks with MEG is whether there are stable networks within the time window assessed. Simulations have shown that at the network level, only longer window lengths were sufficient to detect resting-state networks that matched the ground truth, especially for plv, amplitude envelope correlation, and coh (46). While fMRI has presented multiple studies demonstrating the reliability of connectivity between different regions, the timescale of MEG is different and may present as more or less reliable depending on how these connectivity patterns are assessed. However, the visual occipital alpha activation that shows often reliable patterns of activation when changing between eyes open and eyes closed within subject provide evidence for the reliability of oscillatory networks. An additional question that remains is why clinical populations may exhibit different test-retest reliability than HC. A core characteristic of SZ is that patients experience repeated relapses even after initiation of medication (47). This supports the general idea that brain dynamics in patients are more variable and is consistent with the general hypothesis that healthy brain dynamics are maintained through homeostasis and deviations from this stable state lead to functional consequences (48). Future research is needed to determine if this reduction in reliability of measures in patients with SZ is dependent on medication, disease severity or disease duration and may further inform clinical treatment.
The current study was designed to compare available functional connectivity metrics in a test-retest dataset of patients with SZ and HC. We reported connectivity values without modification from the MNE provided functions and used a surface-based source space with a fixed surface orientation. However, it should be noted that certain functional connectivity metrics, e.g., imcoh, can become difficult to interpret when source direction is not well-defined. The other metrics used (coh, plv, ppc, pli, wpli, wpli2) are the result of absolute value calculations which account for sign flips across sessions. Others using imcoh should carefully evaluate their models to avoid introducing extra variability.
There are several limitations in the present study which warrant caution. The patient population recruited was a stable, medicated cohort of patients with SZ. As such, the results may not generalize to a more varied group of individuals with psychosis, other populations and/or imaging sites. Furthermore, it remains unknown if the functional connectivity abnormalities found were due to underlying neurophysiology of schizophrenia or were driven by medication, as all patients were antipsychotic medications. Another cautionary note is the small sample size. Although ICCs were calculated across 4 separate runs, the small group size (n = 13) warrants caution when generalizing to larger samples. Also, it is important to consider the ICC model itself.
An ICC examines variance changes within and between subjects over time. Occasionally, a low ICC can reflect that a withinsubject change occurred, and may not imply that a measure itself is inaccurate. While results between our study and others are similar, each study modeled ICC estimates differently and ICC values will fluctuate based on the model and variance assumptions (35,36,49). Another aspect to consider is the localization algorithm used. The optimal source localization algorithm to examine functional connectivity remains to be determined. One advantage of the dSPM algorithm is that its assumptions do not limit the ability to capture synchronous activity, which remains a limitation of most implementations of the beamformer approach. However, the dSPM algorithm is also known to have limited spatial resolution and also can propagate noise throughout the brain. As such, using dSPM may impact the sensitivity of the functional connectivity metrics measured. Future studies should examine realistic simulated connectivity patterns to determine the conditions under which the best results are obtained. Finally, the current study included a single MEG system, definitive conclusions on reliability cannot be made until a larger sample size and multiple sites are included.
Our research demonstrates that resting-state connectivity in clinical populations can be informative and reliable. Certain functional connectivity metrics should be preferred due to their higher reliability. MEG can be used to capture neural oscillatory networks in resting-states with good spatial precision and reliability. Both eyes open and eyes closed resting states were reliable over sessions and should be reported to best capture neural dynamics.

DATA AVAILABILITY STATEMENT
The original contributions generated for the study are included in the article/Supplementary Files, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by University of New Mexico Health Sciences Center Human Research Review Committee. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FC-C and JS: design and writing. FC-C: data collection and processing, formal analysis, and funding acquisition. JS: supervision. Both authors contributed to and have approved the final manuscript.

FUNDING
This work was supported in part by grants from the National Institutes of Health (P20GM103472 and P30GM122734) and National Science Foundation (NSF) 1539067. The funding sources had no role in study design, analysis, and interpretation of the data, or the writing of this manuscript.