Brain Network Modularity Predicts Exercise-Related Executive Function Gains in Older Adults

Recent work suggests that the brain can be conceptualized as a network comprised of groups of sub-networks or modules. The extent of segregation between modules can be quantified with a modularity metric, where networks with high modularity have dense connections within modules and sparser connections between modules. Previous work has shown that higher modularity predicts greater improvements after cognitive training in patients with traumatic brain injury and in healthy older and young adults. It is not known, however, whether modularity can also predict cognitive gains after a physical exercise intervention. Here, we quantified modularity in older adults (N = 128, mean age = 64.74) who underwent one of the following interventions for 6 months (NCT01472744 on ClinicalTrials.gov): (1) aerobic exercise in the form of brisk walking (Walk), (2) aerobic exercise in the form of brisk walking plus nutritional supplement (Walk+), (3) stretching, strengthening and stability (SSS), or (4) dance instruction. After the intervention, the Walk, Walk+ and SSS groups showed gains in cardiorespiratory fitness (CRF), with larger effects in both walking groups compared to the SSS and Dance groups. The Walk, Walk+ and SSS groups also improved in executive function (EF) as measured by reasoning, working memory, and task-switching tests. In the Walk, Walk+, and SSS groups that improved in EF, higher baseline modularity was positively related to EF gains, even after controlling for age, in-scanner motion and baseline EF. No relationship between modularity and EF gains was observed in the Dance group, which did not show training-related gains in CRF or EF control. These results are consistent with previous studies demonstrating that individuals with a more modular brain network organization are more responsive to cognitive training. These findings suggest that the predictive power of modularity may be generalizable across interventions aimed to enhance aspects of cognition and that, especially in low-performing individuals, global network properties can capture individual differences in neuroplasticity.


INTRODUCTION
Aging is accompanied by changes in cognition and brain function, yet there is individual variability in the extent to which older adults experience such effects (Wilson et al., 2002;Raz et al., 2005;Fabiani, 2012;Burzynska et al., 2015;Salthouse, 2016). Individual differences in age-related cognitive decline, particularly in executive function processes, are related to changes in structural and functional connectivity between brain regions (Andrews-Hanna et al., 2007;Damoiseaux et al., 2008;Kennedy and Raz, 2009;Madden et al., 2009Madden et al., , 2012. One method to quantify these complex interactions is to conceptualize the brain as a network comprised of sub-networks, or modules (Newman and Girvan, 2004;Newman, 2006b;Chen et al., 2008;Bullmore and Sporns, 2009;Meunier et al., 2010;Betzel et al., 2014;Bertolero et al., 2015). The extent of a module's segregation from the rest of the network can be quantified with a modularity metric (Newman and Girvan, 2004), where networks with high modularity have many connections within modules and fewer connections between modules. Computational models suggest that a modular network organization allows for a system that is more adaptable to new environments (Kashtan and Alon, 2005;Clune et al., 2013;Tosh and McNally, 2015), suggesting a role for network modularity in supporting complex behaviors like executive function. Compared to young adults, older adults have less modular brain networks (Chen et al., 2011;Onoda and Yamaguchi, 2013;Betzel et al., 2014;Geerligs et al., 2015) with pronounced age-related differences in sub-networks that support "associative" processes, such as executive function (Chan et al., 2014). Taken together, these findings suggest that more modular brain networks enable complex cognitive processes and neuroplasticity and, further, may provide insight into the mechanisms underlying the effectiveness of interventions geared toward ameliorating age-related cognitive decline.
Recent work has demonstrated that individual differences in brain network modularity can predict the extent to which individuals improve after cognitive interventions aimed to improve executive function. Specifically, higher baseline modularity (i.e., measured prior to the intervention) quantified during a task-free "resting state" predicted greater improvements after cognitive training in patients with traumatic brain injury (Arnemann et al., 2015) and more recently, in healthy older (Gallen et al., 2016) and young adults (Baniqued et al., 2015). Importantly, modularity predicted training gains even after controlling for baseline cognitive performance. These findings suggest that the informative nature of such individual differences in brain network organization can be used to maximize intervention effectiveness, such as by modifying training intensity or duration, especially in populations where behavioral measures may be difficult to collect (Gabrieli et al., 2015). Previous studies have examined other neural metrics in relation to learning and training responses Basak et al., 2011;Vo et al., 2011;Mathewson et al., 2012), but have often focused on specific brain regions related to specific types of interventions. As modularity has been shown to be reliable in individuals across sessions (Stevens et al., 2012;Cao et al., 2014) and predictive of cognitive gains across a variety of populations and training protocols, modularity may be a unifying biomarker that indexes an individual's potential for adaptive reorganization with intervention.
In addition to cognitive training interventions, cost-effective and easily accessible physical activity interventions involving brisk walking have been shown to have rehabilitative and protective effects on brain function in older adults (Kramer et al., 2006;Voss et al., 2013c). Further, there are significant individual differences in responsiveness to exercise training, with factors such as initial levels of heart rate and blood pressure determining gains in cardiorespiratory fitness (Bouchard and Rankinen, 2001). Although we have previously found that individual differences in brain network modularity can predict training-related gains after cognitive training (Arnemann et al., 2015;Baniqued et al., 2015;Gallen et al., 2016), it is not yet known whether the relationship between modularity and training gains is generalizable to interventions aimed to enhance executive function in older adults. Although there are several graph theoretical metrics, we were specifically interested if this relationship between pre-intervention brain modularity and training gains can also be found in a different, non-cognitive training intervention, such as a physical exercise intervention.
Specifically, we hypothesize that modularity reflects an individual's readiness to engage in and benefit from training. A recent study demonstrated that individuals with higher general intelligence show smaller connectivity changes between a resting state and task states, suggesting the existence of a more "optimal" network organization that provides more efficient reconfiguration during performance of various tasks (Schultz and Cole, 2016). Similar to this idea, we hypothesize that a more optimal-more modular network configuration is better able to transition to task states demanded by the interventions; it is more adaptable. In the context of the current study, a more modular brain network may potentiate the rehabilitative and protective effects of physical exercise on the aging brain, leading to greater improvements in executive function.
Here, we examined brain network modularity in older adults who underwent a 6-month exercise training intervention. Specifically, we tested the hypothesis that higher baseline modularity predicts larger exercise-related gains in cognition. The current study employed a broad battery of cognitive tests to assess intervention-related gains in executive function, episodic memory, vocabulary and perceptual speed. Here, we focused on the relationship between baseline modularity and improvements in executive function, as these processes show pronounced agerelated decline and exercise-related changes (Hillman et al., 2008;Voss et al., 2013c;Kawagoe et al., 2017).

Participants
Healthy, low active, older adults (N = 247) aged 60-80 from the Urbana-Champaign community participated in a randomized controlled exercise trial (https://clinicaltrials.gov/ ct2/show/NCT01472744; see Voss et al., 2016;Burzynska et al., 2017;Ehlers et al., 2017a,b;Fanning et al., 2017, for data published from this same cohort). All participants provided informed consent and the University of Illinois Institutional Review Board approved all procedures used in the study. Selection criteria consisted of the following, (1) >75% righthanded on the Edinburgh Handedness Questionnaire; (2) normal or corrected-to-normal vision of at least 20/40; (3) no colorblindness; (4) no history of stroke, transient ischemic attack, or head trauma; (5) >23 score on Mini-Mental State Examination (MMSE); (6) >21 score on Telephone Interview of Cognitive Status (TICS); (7) <10 score on Geriatric Depression Scale (GDS); (8) reported that they engaged in moderate intensity exercise for 30+ min no more than twice a week in the last 6 months and 9) screened for safe participation in an MRI environment (e.g., no claustrophobia or metallic implants). In all analyses presented here, we further excluded participants with MMSE scores less than 27 (N = 26), as a more stringent criterion is recommended in highly educated samples such as in the current study (O'Bryant et al., 2008). Summary demographics for included participants are provided in Table 1. Additional data were excluded on a case-by-case basis during data quality procedures applied to each behavioral measure. Specifically, cognitive measures greater than 3 SD from the mean were excluded. After this step, to reduce the influence of remaining extreme values, scores greater than 3 SD from the recomputed mean were winsorized (Tukey, 1962;Wilcox, 2005) to the appropriate cut-off value (3 SD below or above the mean). Analyses involving only fitness or behavioral scores were performed on the larger sample (N = 188), prior to exclusion due to MRI data quality, but effects were similar in the MRI sample (N = 128).
For the MRI data, we excluded one participant with incomplete resting state data, one participant with structural abnormalities (see section MRI Acquisition and Processing for more details), and 39 participants who reported taking medications known to influence the central nervous system. Thirty-five participants whose resting state scans contained more than 10% of volumes with movement greater than 0.50 framewise displacement (FD) or any volume with a maximum absolute displacement of 4.0 mm were excluded. MRI data were not collected for five subjects. Demographics for this reduced sample are provided in Table 1.

Protocol Summary
All participants underwent MRI, behavioral, and fitness testing sessions before and after a 6-month long physical exercise intervention. Participants were paid for the pre-and post-testing sessions at a rate of $10/h. Participants were randomly assigned to one of four intervention groups, which met for an hour three times a week. All group sessions were led by trained exercise specialists. In the walking group (Walk), participants were instructed to walk within their target heart rate (50-60% of their maximal heart rate for first 6 weeks, 60-75% for last 18 weeks). A second group was also instructed to walk within the same target heart rate and was provided with a daily milk-based supplement formula provided by Abbott Nutrition that contained betaalanine (Walk+). A third group was instructed in exercises focusing on stretching, strengthening and stability (SSS). A fourth group (Dance) was instructed in social dance sequences (i.e., Contra and English country dancing) by experienced dance instructors. Since the focus of this study is on the utility of brain modularity in predicting intervention-related gains, we limit our discussion of the intervention approach and choice of training regimen (for detailed information, see Ehlers et al., 2016;Burzynska et al., 2017).

Cardiorespiratory Fitness Testing
Participants underwent cardiorespiratory fitness (CRF) testing before and after the intervention. CRF reflects the integrated ability of the cardiovascular and respiratory systems to deliver oxygen during sustained physical effort (Ross et al., 2016), and regular physical exercise increases the efficiency of these systems (Wenger and Bell, 1986). CRF testing involves gradually increasing exercise intensity to tax the aerobic system and measuring the corresponding increase in oxygen consumption. Physician's approval was solicited prior to testing. CRF, operationally defined as peak oxygen consumption (VO 2 peak in mL/kg/min, relative rate in milliliters of oxygen per kilogram of body mass per minute), was measured with indirect calorimetry during a modified Balke graded maximal exercise test on a motordriven treadmill (Balke and Ware, 1959;Froelicher et al., 1975). Participants walked on a treadmill at a constant pace while the incline was increased 2-3% every 2 min. Expired air was sampled at 30-s intervals until maximal VO 2 was reached or the test was terminated due to volitional exhaustion and/or symptom limitation. Maximal VO 2 was determined after two of three criteria were met: (1) a plateau in VO 2 after increase in workload; (2) a respiratory exchange ratio (ratio of CO 2 production and O 2 consumption, reflecting limits of cardiovascular system) >1.10, and (3) a maximal heart rate within 10 bpm of their age-predicted maximum. VO 2 peak was the highest VO 2 recorded during the test. For the correlation analyses, we calculated a standardized CRF gain score for each individual by taking the difference between post-and pre-scores and dividing this by the standard deviation of pre-test scores (SD collapsed across groups).

Behavioral Testing
Participants underwent cognitive testing before and after the interventions. With the exception of the Switching Task and the Spatial Working Memory Task, all tests were taken from the Virginia Cognitive Aging Project (VCAP) (Salthouse and Ferrer-Caja, 2003;Salthouse, 2004Salthouse, , 2005Salthouse, , 2010. The VCAP tests were categorized into four categories: vocabulary, perceptual speed, episodic memory, and fluid reasoning. In the analyses, we grouped the Switching Task and Spatial Working Memory Task together with the fluid reasoning tasks to create an "executive function" component score, given previously demonstrated relationships between cognitive control and fluid reasoning abilities (Kane et al., 2005;Salthouse, 2005). We also performed a principal components analysis (PCA) on all the pre-test measures to confirm the VCAP construct groupings and to confirm that the Switching and Spatial Working Memory Tasks were related to performance on the fluid reasoning tests ( Table 2, Supplementary  Table 1). For each pre-test and post-test measure, we calculated standardized scores (z-scores) and averaged these z-scores according to the task groupings specified above, resulting in Mean (SD) and range for age, education, MMSE and VO 2 peak. * Full sample excludes participants with MMSE scores lower than 27. Two participants are missing VO 2 peak data.
four component scores representing baseline cognitive abilities in vocabulary, perceptual speed, episodic memory and executive function (fluid reasoning plus switching and working memory).
For each test, we also calculated standardized gain scores by subtracting pre-test performance from post-test performance, and dividing this value by the standard deviation of raw pre-test scores (collapsed across groups). We averaged the standardized gain scores accordingly to create composite gain scores in vocabulary, perceptual speed, episodic memory, and executive function. The following sections have brief descriptions of each test and the specific measure used for analyses.
Task-Switching (Kramer et al., 1999;Voss et al., 2010aVoss et al., ,b, 2013bLeckie et al., 2014) On each trial, participants were shown a number between 1 and 9 (except 5) against a colored background: (1) on a pink background, participants were instructed to determine whether the number was odd or even, (2) on a blue background, they were to determine if the number was higher or lower than 5. Participants completed a high/low practice block (40 trials) an odd/even practice block (40 trials), a single high/low task block (40 trials), a single odd/even task block (40 trials), a mixed practice block (64 trials) and a mixed task block (160 trials). We analyzed performance on the mixed task block and extracted (1) local switch cost (mixed switch reaction time; RT-mixed non-switch RT) and (2) task switching bin score (combination of accuracy and RT measures) (Draheim et al., 2016). The task switching bin score was used in the principal components and correlation analyses to better examine the relationship between task switching performance and performance on other tests (Draheim et al., 2016). Local RT switch cost was used in the analyses of intervention effects, consistent with previous studies (Voss et al., 2010a(Voss et al., , 2013b. The two measures were correlated (Supplementary Table 1; baseline measures: r (211) = 0.322, p < 0.001, two-tailed; standardized gain scores: r (159) = 0.267, p < 0.001, two-tailed), and the intervention effects were similar when using bin score instead of local RT switch cost.
Spatial Working Memory  On each trial, an arrangement of two, three, or four black dots was briefly presented on the screen. After a delay, a red dot appeared and participants were instructed to determine if the red dot matched the position of one of the black dots presented earlier in that trial (match or non-match). Participants performed a practice block of 12 trials, and a task block of 120 trials (40 trials per condition). We analyzed mean accuracy during the task block for the more difficult three-dot and four-dot trial conditions.
Shipley Abstraction (Zachary, 1986) Participants were given a list of word, letter, or number sequences on a piece of paper and were instructed to write the missing item/s (word, letter or number) in each sequence. Participants were given 5 min to answer 20 items. We analyzed the total number of correctly answered items.
Matrix Reasoning (Ravens, 1962) On each trial, participants were shown a 3 × 3 grid, with each cell except for one containing an abstract pattern. Participants were instructed to select which among eight options best completes the matrix along both the rows and columns. Participants performed two practice trials and were then given 10 min to complete a maximum of 18 items. We analyzed the total number of correctly answered items.
Paper Folding (Ekstrom et al., 1976) On each trial, participants were presented with images that show a sheet of paper folded in a certain sequence and a hole punched through the folded sheet. Participants were asked to select which among five options matched the pattern of holes that would result when the paper was unfolded. They were given 10 min to complete a maximum of 12 trials. We analyzed the total number of correctly answered items.
Spatial Relations (Bennett et al., 1997) On each trial, participants were presented with a 2-dimensional object pattern and instructed to identify which among four threedimensional figures would match the 2-dimensional pattern when folded. Participants were given 10 min to complete a maximum of 20 trials. We analyzed the total number of correctly answered items.
Form Boards (Ekstrom et al., 1976) On each trial, participants were presented with a specific shape and instructed to choose which pieces (five total options) will exactly fill the space inside the shape. They were given 8 min to complete a maximum of 24 trials. We analyzed the total number of correctly answered items.
Letter Sets (Ekstrom et al., 1976) On each trial, participants were presented with five sets of fourletter strings and asked to determine which set was different from the other four. Participants were given 10 min to complete a maximum of 15 trials. We analyzed the total number of correctly answered items.
Digit-Symbol Coding (Wechsler, 1997a) Participants were presented with a sheet of paper containing a series of numbers between 1 and 9, were asked to fill in the corresponding symbol based on a digit-symbol key provided. Participants completed 7 practice items and were given 2 min to complete a maximum of 133 items. We analyzed the number of correctly answered items.
Pattern Comparison (Salthouse and Babcock, 1991) Participants were given a sheet of paper with a set of line patterns and were tasked to determine whether a pair of line patterns was the same or different. Participants completed three practice items, followed by two task sets, each set with a maximum number of 30 items to be completed within 30 s. We analyzed the number of correctly answered items, averaged across two sets of problems.
Letter Comparison (Salthouse and Babcock, 1991) Participants were given a sheet of paper with a set of non-word letter strings and were tasked to determine whether a pair of letter strings was the same or different. Participants completed three practice items, followed by two task sets, each set with a maximum number of 30 items to be completed within 30 s. We analyzed the number of correctly answered items, averaged across two sets of problems.
Logical Memory (Wechsler, 1997b) Participants listened to stories narrated by an experimenter and after each reading, were asked to recall each story in detail. We analyzed the number of correctly recalled story details, summed across three story-tellings (first story, second story, re-reading of second story).
Paired Associates (Salthouse et al., 1996) Participants listened to a list of six word pairs read aloud by an experimenter. The experimenter then read the first word of each pair and asked participants to recall the paired second word. We analyzed the number of correctly recalled items, averaged across two sets of six pairs each.
Word Recall (Wechsler, 1997b) Participants listened to a list of words and were given 90 s to recall the words in any order. Participants listed to the same list three more times and were asked to recall as many words as possible after each reading. Participants were then read a new list of words, asked to recall as many words as possible from the new list, and then asked to recall words from the old list. We analyzed the total number of correctly recalled items.
Word Vocabulary (Wechsler, 1997a) Experimenters read aloud a list of 33 words and asked participants to verbally give the meaning of each word. Responses are scored 0-2 points according to the quality of the definition (based on provided word and phrase guidelines). The test is discontinued after six consecutive scores of 0. We analyzed the total number of points.
Picture Vocabulary (Woodcock and Johnson, 1989) Experimenters present a maximum of 30 images and participants are tasked to name the objects presented. The test is discontinued after a participant fails to name six consecutive items. We analyzed the total number of correctly named items.
Synonym-Antonym (Salthouse, 1993) On each trial, participants are presented a target word and are tasked to select which among five word options is most similar (synonym) or opposite (antonym) in meaning to the target word. Participants completed a synonym block followed by an antonym block, each with a maximum of 10 items to be completed within 5 min. We analyzed the total number of correctly identified words across the synonym and antonym blocks.

MRI Acquisition and Processing
Participants underwent MRI scanning on a 3 Tesla Siemens Trio Tim System with a 12-channel head coil before and after the intervention; however, only the pre-intervention scans were analyzed in this study given our hypotheses regarding correlations between baseline brain modularity and cognitive gains. The anatomical scan consisted of T1-weighted MPRAGE images acquired with the following parameters: GRAPPA acceleration factor 2, voxel size = 0.9 × 0.9 × 0.9 mm, TR = 1,900 ms, TI = 900 ms, TE = 2.32 ms, flip angle = 9 • , FoV = 230 mm. To analyze network properties during a taskfree "resting state, " a 6-min functional scan was obtained using a T2 * -weighted echoplanar imaging (EPI) pulse sequence with the following parameters: GRAPPA acceleration factor 2, 180 volumes, in-plane resolution = 3.4 mm 2 , TR = 2,000 ms, TE = 25 ms, flip angle = 80 • , 35 4 mm ascending slices, no slice gap. Participants were instructed to lie still with their eyes closed. Brain extraction from anatomical scans was performed with Advanced Normalization Tools (ANTs; Avants et al., 2010Avants et al., , 2011 using the Kirby/MMRR template (Landman et al., 2011). When this skull-stripping procedure failed, brain extraction was instead performed using the IXI template (Heckemann et al., 2003;Ericsson et al., 2008). The skull-stripped anatomical images and raw functional images were preprocessed through the Configurable Pipeline for Connectomes (CPAC; Giavasis et al., 2015). Anatomical images were registered to the MNI152 template (Fonov et al., 2009) using ANTs and segmented into gray matter (probability threshold = 0.7), white matter (probability threshold = 0.98) and cerebrospinal fluid (CSF; probability threshold = 0.98) using FSL/FAST (Zhang et al., 2001). Functional images were slice-time corrected, motioncorrected (Friston et al., 1996) and co-registered to the anatomical images. Nuisance signal removal was performed by regressing out the aforementioned motion parameters, signals from the first five components from white matter and CSF voxels (Compcor; Behzadi et al., 2007;Muschelli et al., 2014), and linear and quadratic trends. Signals were bandpass filtered at 0.009-0.08 Hz. Participants whose resting state scan contained (1) more than 10% of volumes with framewise displacement (FD) greater than 0.5 mm (N = 23) or (2) maximum absolute displacement greater than 4.0 mm were excluded from subsequent analyses (additional N = 12). One participant was excluded because structural abnormalities caused anatomical-to-MNI registration to fail (spatial warping) during preprocessing, such that we could not reliably extract ROIs.

Functional Connectivity and Modularity Analyses
Functional scans were warped to the MNI template and parcellated into 264 regions of interest (Power et al., 2011). Due to uneven partial coverage of the cerebellum across subjects in the functional data, we excluded the four cerebellum module ROIs prior to analysis. Eight additional ROIs were excluded due to lack of functional coverage in at least one participant, leaving a total of 252 ROIs. For each individual, time series from all voxels within each ROI were averaged together. Average ROI time series were correlated between each pair of ROIs (Pearson's coefficient), and the resulting ROI-to-ROI correlation matrices were Fisher z-transformed. Matrices were binarized over a range of connection density thresholds (costs): 2-10% of all possible connections, in 2% increments, following (Power et al., 2011;Power and Petersen, 2013). These thresholded matrices were used to create unweighted, undirected whole-brain graphs for each participant, from which network metrics were derived using the BrainX (https://github.com/nipy/brainx) and NetworkX Python package (Hagberg et al., 2008). Network modularity was quantified separately for each connection threshold to examine the consistency of results across thresholds. We use the middle 6% threshold for all our primary analyses, but verified effects at the other thresholds (Supplementary Material).
For our primary analysis, we quantified modularity, a network measure that compares the number of connections within modules to the number of connections across modules (Newman and Girvan, 2004). Modularity is defined as , where e ii is the fraction of connections that connect two nodes within module i, a i is the fraction of connections connecting a node in module i to any other node, and m is the total number of modules in the network (Newman and Girvan, 2004). There are multiple methods for identifying network modules. Here, we used a spectral algorithm (Newman, 2006a) to identify the partition that maximizes modularity for each participant at each threshold.
Further, to confirm that our effects were not driven by a specific partitioning algorithm, we also computed modularity using partitions identified in Power et al. (2011) using the Infomap algorithm (Rosvall and Bergstrom, 2008;Fortunato, 2010). Here, every node was assigned to one of thirteen modules (as identified in Power et al., 2011): default mode (DMN), frontoparietal (FP), cingulo-opercular (CO), salience (Sal), dorsal attention (DAN), ventral attention (VAN), auditory (Aud), visual (Vis), memory (Mem), sensory/somatomotor hand (SM-hand), sensory/somatomotor mouth (SM-mouth), subcortical (Subcort) and a module containing unassigned nodes. The modularity values derived from the Power partition were highly correlated with the modularity values obtained using the spectral clustering partition (all r > 0.761, all p > 0.001, two-tailed for all five cost thresholds).

Potential Confounds
Before examining the relationship between brain modularity and intervention-related gains, we examined relationships between potential confounding variables and our measures of interest (i.e., baseline modularity and intervention-related gains), including age, in-scanner motion (i.e., frame-wise displacement or FD; Power et al., 2012;Satterthwaite et al., 2012;Siegel et al., 2016), and baseline cognitive performance. All the analyses include only subjects with usable baseline MRI scans, baseline EF scores, and EF gain scores (N = 128). If a significant relationship between potential confounding variables and our dependent measures was found, we then used these variables as covariates in our primary analyses examining correlations between modularity and intervention-related gains. For all analyses, we also controlled for age and in-scanner motion (i.e., FD). For all correlation analyses, we computed bias-corrected and accelerated (BCa) confidence intervals (CI) using 5,000 bootstrapped samples.
There is considerable variability in brain volume in older adults (Salat et al., 2004;Raz et al., 2005;Raz and Rodrigue, 2006). Thus, for participants with structural volume data, we also tested whether the pattern of brain-behavior relationships from the network analyses could have been confounded by gross individual differences in brain structure. We extracted measures of brain volume using Freesurfer v5.3 (Dale et al., 1999); http:// surfer.nmr.mgh.harvard.edu), which performs segmentation of cortical and subcortical matter using automated and probabilistic algorithms (Fischl et al., 2002(Fischl et al., , 2004aDesikan et al., 2006). AZB inspected the segmentation output and performed appropriate corrections. Using the anatomical scans obtained at baseline, we obtained measures of total intracranial volume, white matter, and total gray matter volume, described in more detail on the Freesurfer website (https://surfer.nmr.mgh.harvard.edu/fswiki/ MorphometryStats). We included estimated intracranial volume as a covariate in volumetric analyses to control for differences in overall brain volume (Jack et al., 1989;Buckner et al., 2004). Since not all participants had high-quality structural scans for volumetric analysis (N = 15), we conducted this analysis as a follow-up to the primary analyses of modularity vs. interventionrelated gains.

Exercise-Related Changes in Cardiorespiratory Fitness (CRF)
We first verified that the groups demonstrated the expected patterns of fitness improvements. At baseline, the groups did not differ in CRF F (3, 182) = 0.199, p = 0.897, η 2 p = 0.003. A mixed ANOVA with VO 2 peak scores over time (pre-and post-testing) as a within-subjects factor and group as a between-subjects factor revealed a main effect of time F (1, 182) = 21.737, p < 0.001,

Exercise-Related Changes in Cognitive Function
To determine the effects of the exercise intervention on cognitive function and to minimize measurement error and multiple comparison issues in analyzing each test separately, we analyzed cognitive effects at the construct level using composite scores. The creation of composite scores was guided by previous literature (Kane et al., 2005;Salthouse, 2005), correlations (Supplementary Table 1), and a PCA on the baseline test scores ( Table 2), which confirmed the grouping of the cognitive tests into categories of vocabulary, episodic memory, perceptual speed, and executive function.

Relationship between Fitness and Cognitive Effects
Given that the groups that improved in EF were also those that showed larger CRF gains (i.e., Walk, Walk+, SSS groups), we tested whether the degree of CRF improvement was related to EF improvement. Across the whole sample with CRF data and behavioral data, there was no significant relationship between

Examination of Potential Confounds
Across the whole sample with quality MRI data, we first examined relationships between group assignment (i.e., to confirm that groups did not differ in baseline characteristics), potential confounding variables (i.e., age, years of education, mean FD) and our measures of interest (i.e., baseline modularity and EF gain). In the case of a non-significant relationship between variables when analyzing the whole MRI sample, we also verified that the relationship was not significant when analyzing each group separately, as the primary analyses of baseline modularity and EF gain were conducted within group. Age did not differ across groups (Table 1), but was significantly correlated with baseline modularity, r (126) = 0.239, 95% CI [0.102, 0.370], p = 0.007, two-tailed, and was not correlated with EF gain, r (126) = −0.008, 95% CI [−0.211, 0.197], p = 0.932, two-tailed. We verified that there was no significant relationship between age and EF gain within each group (all |r| <0.314, all p > 0.097, two-tailed).
Lastly, given previously documented relationships between modularity and cognitive function (Kitzbichler et al., 2011;Stevens et al., 2012;Stanley et al., 2014;Sadaghiani et al., 2015), we examined whether baseline modularity was related to baseline EF. Across the whole MRI sample, there was no significant relationship between baseline modularity and baseline EF, r (126) = 0.023, 95% CI [−0.167, 0.207], p = 0.798, two-tailed, even after accounting for age and/or mean FD and examining each group separately (all |r| < 0.319, all p > 0.098, two-tailed). Thus, potential relationships between modularity and EF gains cannot be attributed to correlations between modularity and EF performance at baseline.
The above results were similar when using modularity values derived from other thresholds and when using modularity values derived from the Power partition (see Supplementary Material). Thus, given these findings that age and mean FD showed some relationship with modularity, and given that baseline EF was moderately related to EF gain, we used age, mean FD and baseline EF as covariates in the primary analyses of modularity and exercise-related gains.

Relationship between Baseline Modularity and Exercise-Related Gains
We next examined the relationship between baseline modularity and intervention-related effects on EF, having confirmed EF and CRF improvements in the Walk, Walk+ and SSS groups (Figure 3). For each group, we first performed linear regression analyses with EF gain as the dependent variable, age, mean FD and baseline EF as covariates, and independent variables of baseline EF, baseline modularity, and an interaction term of baseline EF and baseline modularity. Importantly, the interaction term was included to test whether the relationship between baseline modularity and EF gain was moderated by baseline EF (i.e., whether the modularity-gain relationship was stronger in high or low performing individuals at baseline).
In the Walk group, age, mean FD, modularity, and the interaction term of modularity and baseline EF were significant predictors of EF gain (Table 3). Critically, modularity positively predicted EF gain, while the interaction showed that individuals with lower baseline EF showed a stronger relationship between modularity and EF gain.
In the Walk+ group, age and the interaction term of modularity and baseline EF were significant predictors of EF gain, with baseline EF as a marginal predictor ( Table 3). Similar to the Walk group, individuals with lower baseline EF showed a stronger relationship between modularity and EF gain.
In the SSS group, the full model was not significant [ Table 3; R 2 = 0.094, Adjusted R 2 = −0.048, F (5, 32) = 0.661, p = 0.656]. Modularity was not a significant predictor, although it explained the most variance and was related to EF gain in a similar positive direction. Given that there were no significant predictors in the full model, we performed a reduced model with only baseline modularity. This model was marginally significant [R 2 = 0.083, Adjusted R 2 = 0.057, F (1, 36) = 3.246, p = 0.080], with modularity marginally related to EF gain (B = 1.218, p = 0.080, BCa 95% CI [−0.439, 2.276]).
In summary, we find that baseline modularity was related to EF gains in groups that showed training-related gains. For illustrative purposes, Figure 3 shows the relationship between baseline modularity and EF gain with and without controlling for age, mean FD and baseline EF.

Controlling for Individual Differences in Brain Volume
Age-related differences in white and gray matter volume loss may influence brain function (Persson et al., 2006;Chadick et al., 2014;Pudas et al., 2017), functional connectivity patterns (Meunier et al., 2014), and in turn, the pattern of brain-behavioral results we find here. On the sample of participants with highquality anatomical data, we ran partial correlation analyses of baseline modularity and EF gain within each of the four groups (one-tailed tests to confirm initial results), controlling for estimated intra-cranial volume, gray matter volume, and white matter volume in addition to age, mean FD and baseline EF. Critically, the pattern of relationships remained the same, Walk: rp ( Exploratory Analyses: Sub-network Contribution to Relationship between Baseline Modularity and Training-Related Gains Brain modules show distinct age-related connectivity changes (Chan et al., 2014), and modularity in the association systems FIGURE 3 | Scatterplots show the relationship between baseline modularity (6% threshold) and executive function gain in each group, without controlling for any other factors (top) and after controlling for age, mean FD and baseline EF (bottom). Shaded areas represent 95% confidence region of the regression line. (DMN, FP, CO, Sal, DAN, VAN) has been found to drive the correlation between global modularity and training-related gains (Gallen et al., 2016). Given this, we examined whether specific networks contribute to the modularity vs. gain relationship. Similar to previous findings, sensory-motor modularity was higher than association cortex modularity both when analyzing the whole sample, t (127) = 24.954, p < 0.001, and each group separately (all p < 0.001). We then examined the contribution of each sub-network to the modularity vs. EF gain relationship. For these analyses, we performed partial correlation analyses with age, mean FD and baseline EF as covariates. To reduce the number of analyses, we combined the three groups (Walk, Walk+ and SSS) given the similarity in their intervention-related effects.
We also quantified module segregation (Chan et al., 2014), defined as (Z w -Z b )/Z w , where Z w is the average Fishertransformed correlation between nodes in the same module (within-module connectivity) and Z b is the average Fishertransformed correlation between nodes in a module to nodes in any other module (between-module connectivity). Importantly, this metric retains the weights of all connections (lower than 2-10% of connections). Given previous findings, we focused our analyses on the association cortex modules. When controlling for age, mean FD, and baseline EF, whole-brain segregation and association module segregation were not significantly related to EF gain, although the results were in the same direction as the modularity results (Supplementary Material).

DISCUSSION
We examined whether baseline brain network modularity predicts cognitive improvements in older adults after an exercise intervention. We found that in the groups that showed gains in fitness and cognitive function (Walk, Walk+, and SSS), higher baseline brain modularity predicted greater gains in executive function, even after accounting for individual differences in baseline performance, age, in-scanner motion, and individual differences in brain volume. These results parallel findings in TBI patients (Arnemann et al., 2015), older adults (Gallen et al., 2016), and young adults who underwent cognitive training (Baniqued et al., 2015). Given that we find a similar relationship between modularity and cognitive gains after an exercise intervention in older adults suggests that the predictive power of brain modularity may be generalizable across populations and interventions aimed to enhance executive function. Moreover, these findings point to the potential of global network properties to capture individual differences in neuroplasticity.

Modularity and Exercise-Related Gains in Executive Function
Our findings demonstrating a relationship between baseline brain network modularity and EF improvements with exercise training add to a series of studies that find a similar relationship with cognitive gains from cognitive training interventions (Arnemann et al., 2015;Baniqued et al., 2015;Gallen et al., 2016). Importantly, the current study shows that the pattern of results holds after controlling for factors such as baseline cognitive performance, age, and individual differences in brain volume-the latter of which can present a confound, especially when analyzing measures of brain function in older adults, who show considerable variability in age-related atrophy and lesions (Hedden et al., 2012;Grady, 2013). In the current study, the modularity-gain correlations were found in two (Walk, SSS) out of the three groups that showed some improvement in CRF and EF. In the Walk and Walk+ groups, the modularity-gain relationship was moderated by baseline EF, which together with previous findings in older adults (Arnemann et al., 2015;Gallen et al., 2016) underscores the utility of the network modularity measure in lower-performing individuals. These results suggest that the two measures of baseline performance and modularity together may be a better predictor of training-related gains than either alone.
The relevance of the modularity metric in neuroplasticity, specifically, in predicting response to an intervention, can be linked back to computational models showing that modular networks more rapidly reconfigure in response to new environments (Kashtan and Alon, 2005;Clune et al., 2013;Tosh and McNally, 2015), such that reorganization is more efficiently achieved by slight modifications within and between relatively specialized modules than by a large-scale overhaul of a highly interdependent network. Moreover, individuals with disrupted modular brain organization (Fornito et al., 2015), such as those with focal lesions to brain regions important for between-module connectivity (Nomura et al., 2010;Gratton et al., 2012;Warren et al., 2014) show widespread cognitive dysfunction and thus underscore the role of a modular structure in enabling brain processes that support a wide range of behaviors. In a recent study, individuals who scored higher on general intelligence tests tended to show smaller functional connectivity changes between a "resting state" and task performance states (Schultz and Cole, 2016), suggesting that they adapt more efficiently to task demands. In this sense, the architecture of brain networks at rest guides the connectivity patterns that emerge during the performance of various tasks. Indeed, modularity measured during "resting states" has been found to predict working memory performance (Stevens et al., 2012), and stimulus detection in a perceptual task (Sadaghiani et al., 2015). Taken together, these findings suggest that an "optimally" organized network requires less reorganization to be receptive to new input encountered during learning or training, or to capitalize from intervention-related changes in brain function. In the context of the current study, a more modular brain network may potentiate the rehabilitative and protective effects of physical exercise on the aging brain. In fitness interventions, for example, exercise-associated up-regulation in neurotrophic factors has been related to greater exercise-related changes in brain connectivity (Voss et al., 2013a). Given previous findings and the results of the current study, an optimal network for intervention-related cognitive gains is modularly organized at rest, with a balance of within-module connections that support local processing and across-module connections that support global processing (Meunier et al., 2009(Meunier et al., , 2010. Indeed, recent studies have shown that increased brain modularity post-therapy correlated with greater speech improvement in aphasic patients (Duncan and Small, 2016), and that greater structural modularity prior to carotid artery intervention predicted reduced risk of cognitive decline after carotid intervention (Soman et al., 2016).
Additionally, connectivity measures obtained during preclinical stages, when combined with biomarkers such as amyloid-beta, have been shown to predict later cognitive decline (Buckley et al., 2017), suggesting that these metrics have the potential to provide actionable information when clinical symptoms have yet to manifest.
We found that modularity predicted training gains, beyond the baseline behavioral EF measure. This is a promising finding given that behavioral or cognitive measures may be confounded in certain populations (Gabrieli et al., 2015), such as in older adults, where factors such as mobility or visual acuity interact with task performance. While typical behavioral measures may not reliably distinguish between individual differences in cognitive ability, brain network measures provide a way to gauge training responsiveness. Although this study involved a fairly large sample, functional connectivity was assessed during a relatively short resting-state scan. More reliable measures and more information regarding network structure, particularly in higher performing individuals, may be gleaned from a longer scan period (Birn et al., 2013;Laumann et al., 2015;Gordon et al., 2017). Nonetheless, the pattern of higher baseline modularity predicting intervention-related cognitive gains is now consistent across four studies (Arnemann et al., 2015;Baniqued et al., 2015;Gallen et al., 2016). Using brain network measures in combination with behavioral, demographic, lifestyle, and other brain measures could also help customize intervention protocols to maximize effectiveness, especially in the context of doseresponse relationships, for example by increasing the intensity, frequency, or duration of exercises, or including pre-intervention lifestyle or behavioral protocols geared to promote or maintain optimal levels of brain modularity. Nonetheless, future work may identify behavioral measures that are sensitive to the information captured by network measures; the relationship between baseline modularity and future behavior (i.e., training gains) suggests that modularity may be reflected in baseline behavior to some extent, a brain characteristic that the current study's behavioral measures are not designed to capture. In addition, more work is needed to examine the mechanisms in which a modular architecture interacts with changes in neural and vascular function to enable benefits from cognitive and fitness interventions, and whether such interventions lead to changes in brain modularity. In the current study, we found a marginal correlation between EF gain and baseline association cortex modularity, which suggests that association sub-networks drive the relationship between baseline modularity and EF gain, similar to our previous study (Gallen et al., 2016). Relatedly, association sub-networks have also been shown to increase in functional connectivity after a physical exercise intervention (Voss et al., 2010b), concomitant with improvements in EF.
In our dataset, we found a positive correlation between age and baseline modularity, unlike previous studies that found lower modularity in older adults compared to young adults (Meunier et al., 2009;Betzel et al., 2014;Song et al., 2014;Geerligs et al., 2015). Importantly, our study only included older adults, whereas reductions in modularity are typically found when comparing older and young populations. In addition, some studies show no correlation between modularity and age within older adults (Geerligs et al., 2015;Gallen et al., 2016), no difference in modularity per se when comparing young and old adults (Meunier et al., 2009) and observations that modularity variability was higher in older adults (Song et al., 2014). Moreover, our older adult sample may not be representative of the general population, as participants were relatively healthy and free of major health incidents despite being generally inactive or sedentary prior to participating in the study. Notably however, the relationship between baseline modularity and training gain in the current study remained even after accounting for age.
Neurovascular coupling is an important issue to consider when conducting fMRI studies in older adults, where age-related vascular changes may lead to age-related BOLD differences in the absence of "true" neural differences (D'Esposito et al., 2003;Samanez-Larkin and D'Esposito, 2008). The current study however, does not compare heterogeneous groups (e.g., young vs. old, low-fit vs. high-fit)-all participants were low-fit but relatively healthy older adults, and all analyses controlled for age. Moreover, across the whole sample, baseline VO 2 and baseline modularity were not significantly correlated (all |r| <0.067, all p > 0.457, two-tailed), even after controlling for mean FD. In addition, controlling for baseline VO 2 in the modularity vs. training gain analyses does not change the results. Future studies can include taking into account indicators of cerebrovascular health such as cerebral blood flow (Brown et al., 2010;Zimmerman et al., 2014) to determine whether and/or to what extent it relates to connectivity measures. In the current study, we controlled for measures such as age, medication, and structural brain measures to examine the potential effects of confounds common to studying an older population. Nonetheless, methodological considerations such as the use of population-specific brain templates may help increase the reliability of brain measures .

Fitness and Cognitive Gains after Exercise Intervention
The cognitive improvements in the current study are similar to previous studies that find the largest gains in executive function after aerobic exercise training (Colcombe and Kramer, 2003;Guiney and Machado, 2013;Voss et al., 2013c;Kelly et al., 2014). Here, we used a composite score to analyze training effects instead of assessing group by time interactions in each cognitive task, which can be problematic given the multitude of tasks which requires multiple statistical comparisons. Nonetheless, it is possible that the cognitive effects of the current intervention are driven by specific tasks. For example, the task-switching and spatial working memory tasks in the current study are similar to previous tasks that are sensitive to fitness-related improvements (Hawkins et al., 1992;Kramer et al., 2001;Colcombe and Kramer, 2003;Erickson et al., 2011). On the other hand, improvements in reasoning tasks have been less studied in fitness interventions, although aerobic-related gains in visuospatial processes have been documented in younger populations (Stroth et al., 2009;Monti et al., 2012), and improvement in reasoning skills have been found after cognitive training interventions in older adults (Ball et al., 2002;Willis et al., 2006;Lustig et al., 2009). Moreover, compared to previous studies (Colcombe and Kramer, 2003;Voss et al., 2010b;Erickson et al., 2011), the current intervention lasted only 6 months, and it is likely that larger cognitive effects would result from a longer intervention (Colcombe and Kramer, 2003;Kelly et al., 2014). In addition, aerobic exercise has been shown to improve hippocampal function in animal and human studies (Berchtold et al., 2010;Voss et al., 2013c), increase hippocampal volume in humans  and to relate to hippocampaldependent functions such as spatial memory  and relational memory (Chaddock et al., 2010). Given these previous findings, we would have expected exerciserelated effects not only in the spatial working memory task, but also in the episodic memory tasks. The null findings of the current study may in part reflect a lack of sensitivity in these relatively brief memory tasks in measuring interventionrelated change, but may also stem from comparable effects across the four groups, with similar improvements from the different interventions.
In the current study, the SSS group performed exercises that involved some form of resistance training, which has also shown to be beneficial for executive functioning in older adults when performed at a higher intensity (Liu-Ambrose et al., 2010, 2012. Although the strength portion of the SSS exercises in the current study is not comparable to the intensive strength training regimens of other studies, the similarity in exercise style may present an issue for analyzing the effects of interventions such as these, since strength training exercise and aerobic-walking exercise may benefit cognitive function in both differential and overlapping ways. Thus, "null effects" in terms of a lack of differential improvement (i.e., group by time interaction) in other cognitive domains may instead partly reflect comparable gains from the different types of interventions (in addition to gains attributable to test-retest effects) and contamination effects from self-initiated exercise (Ehlers et al., 2016). The Dance group, despite the cognitive demands thought to be involved in the learning and execution of dance steps, showed the smallest effects post-intervention; the group as a whole did not improve in CRF and showed the smallest changes in cognitive function. These findings may in part stem from the heterogeneity and lack of intensity in the Dance sessions, which varied in form (i.e., type of dance) across sessions, and may have thus failed to consistently and intensively train specific physical and cognitive skills. Indeed, Dance participants perceived their inclass sessions as less intensive (Ehlers et al., 2016). Nonetheless, the Dance intervention in the current study has been shown to improve white matter microstructure in the fornix, with baseline fornix fractional anisotropy correlating with baseline processing speed (Burzynska et al., 2017). This paper focuses on EF and connectivity in gray matter, and it is likely that different brain measures reflect distinct aspects of cognitive function. Moreover, the sixth month duration before pre-and post-testing may not adequately reflect longer-term neural and behavioral effects of each intervention. EF improvements were not directly related to CRF improvements. Combining the test scores into a composite score may have diluted any relationship between CRF gain and gains in a specific test, but no robust correlations were found when examining the relationship between CRF gain and gain on each test measure. In addition, it is possible that intervention-related gains in CRF per se does not lead to cognitive improvements, and that indirect effects of exercise on stress, sleep and overall health lead to positive cognitive outcomes (King et al., 1997;Etnier et al., 2006;Cotman et al., 2007;Bherer et al., 2013;Awick et al., 2015). Furthermore, CRF as measured using VO 2 peak in the current study, indexes an array of bodily functions (Dustman et al., 1984;Etnier et al., 2006;Jain et al., 2010) and may not adequately capture cerebrovascular changes.