The influence of bilingualism on gray matter volume in the course of aging: a longitudinal study

Background Bilingualism is associated with higher gray matter volume (GMV) as a form of brain reserve in brain regions such as the inferior frontal gyrus (IFG) and the inferior parietal lobule (IPL). A recent cross-sectional study reported the age-related GMV decline in the left IFG and IPL to be steeper for bilinguals than for monolinguals. The present study aimed at supporting this finding for the first time with longitudinal data. Methods In the current study, 200 participants aged 19 to 79 years (87 monolinguals, 113 sequential bilinguals, mostly native German speakers with variable second language background) were included. Trajectories of GMV decline in the bilateral IFG and IPL were analyzed in mono- and bilinguals over two time points (mean time interval: 3.6 years). For four regions of interest (left/right IFG and left/right IPL), mixed Analyses of Covariance were conducted to assess (i) GMV changes over time, (ii) GMV differences for language groups (monolinguals/bilinguals), and (iii) the interaction between time point and language group. Corresponding analyses were conducted for the two factors of GMV, surface area (SA) and cortical thickness (CT). Results There was higher GMV in bilinguals compared to monolinguals in the IPL, but not IFG. While the left and right IFG and the right IPL displayed a similar GMV change in mono- and bilinguals, GMV decline within the left IPL was significantly steeper in bilinguals. There was greater SA in bilinguals in the bilateral IPL and a steeper CT decline in bilinguals within in the left IPL. Conclusion The cross-sectional observations of a steeper GMV decline in bilinguals could be confirmed for the left IPL. Additionally, the higher GMV in bilinguals in the bilateral IPL may indicate that bilingualism contributes to brain reserve especially in posterior brain regions. SA appeared to contribute to bilinguals’ higher GMV in the bilateral IPL, while CT seemed to account for the steeper structural decline in bilinguals in the left IPL. The present findings demonstrate the importance of time as an additional factor when assessing the neuroprotective effects of bilingualism on structural features of the human brain.


Introduction
The process of aging is accompanied by an inter-individually variable decline in cognitive abilities (for reviews, cf. Hedden and Gabrieli, 2004;Grady, 2012;Salthouse, 2019) and brain structure (for reviews, cf. Bartrés-Faz and Arenaza-Urquijo, 2011;MacDonald and Pike, 2021). One of the protective factors potentially delaying age-related cognitive decline is bilingualism 1 (e.g., Costumero et al., 2020;Zhang et al., 2020;Bialystok, 2021;Gallo et al., 2022). Bilingualism imposes unique challenges onto the human brain, such as a constant state of competition of the simultaneously active languages (e.g., Kroll et al., 2014;Bobb et al., 2020). Hence, bilinguals are required to engage in continuous conflict monitoring, conflict resolving, interference suppression of the non-target language and appropriate language switching (e.g., Bialystok, 1991;Green, 1998;Green and Abutalebi, 2013). The cognitive demands of bilingualism may depend on linguistic distance between languages spoken: It has been argued, for example, that typologically different languages may be more difficult to learn, while languages with similar typology might require greater inhibitory control when using one of the languages (Antoniou and Wright, 2017;for review, cf. Carthery-Goulart et al., 2023). Thus, both situations may result in considerable cognitive effort, but via contrasting mechanisms (cf. Danylkiv and Krafnick, 2020). Altogether, bilingualism may represent a form of cognitive exercise, which appears to induce a cognitive advantage also in terms of domain-general cognitive functions (for review, cf. Bialystok, 2017;Tao et al., 2021; but see also Paap and Greenberg, 2013;Antón et al., 2019).
The cognitive requirements of bilingualism have repercussions in brain structure. Bilingualism is usually associated with higher gray matter volume (GMV), higher cortical thickness (CT), and higher white matter integrity in regions related to language and domain-general control (for reviews, cf. e.g., Li et al., 2014;Hayakawa and Marian, 2019;Taylor et al., 2022). Among regions that have reliably shown higher GMV in bilinguals are the bilateral inferior frontal gyrus (IFG; Heim et al., 2019) and the bilateral inferior parietal lobule (IPL; Abutalebi et al., 2015; for reviews, cf. e.g., Li et al., 2014;Pliatsikas, 2020). Nonetheless, findings are variable between studies (cf. e.g., García-Pentón et al., 2016), and whole-brain analyses directly comparing mono-and bilinguals yield inconsistent results (meta-analysis in Danylkiv and Krafnick, 2020). This might, at least partially, result from a heterogeneity in samples and methodology (García-Pentón et al., 2016;Danylkiv and Krafnick, 2020). Additionally, the impact of bilingualism on brain structure appears to depend not only on the number of non-native languages spoken (Grogan et al., 2012), but also on experience-based factors such as age of acquisition (AoA), level of proficiency (LoP), amount of use of a second language, and frequency of language switching (cf. e.g., Li et al., 2014;DeLuca et al., 2019aDeLuca et al., , 2020. When investigating the effects of bilingualism on brain structure, one might distinguish simultaneous and sequential bilinguals from each other (Klein et al., 2014;Kaiser et al., 2015; for review, cf. e.g., Berken et al., 2017). These two groups differ in AoA, since simultaneous bilinguals learn two languages beginning from birth, while sequential bilinguals acquire a second language later in life. With respect to the context of language acquisition, simultaneous bilingualism usually corresponds to a rather naturalistic language experience, while sequential bilinguals often learn a second language in a classroom setting (Kaiser et al., 2015). Regarding structural brain adaptations to bilingualism, smaller cortical differences have been found for simultaneous than for sequential bilinguals when compared to monolinguals (Klein et al., 2014;for review, cf. Berken et al., 2017;Pliatsikas, 2020). Differences between simultaneous and sequential bilinguals might even persist into adulthood (Kaiser et al., 2015). Thus, it seems necessary to differentiate between these two forms of bilingualism when investigating the impact of bilingualism on brain structure. Altogether, bilingualism can be seen as a complex, heterogeneous, and dynamic experience, which might explain some of the discrepancies arising in bilingualism studies (cf. e.g., DeLuca et al., 2020).
Adaptations of the brain to bilingualism are supposed to counteract the effects of aging due to both, higher "cognitive reserve" and "brain reserve" in bilinguals (Bartrés-Faz and Arenaza-Urquijo, 2011). Cognitive reserve refers to differences in cognitive processing regarding efficiency, capacity, and flexibility of neural networks, including the ability for compensation when facing age-related structural atrophy (Stern, 2009;Stern et al., 2020). In bilinguals, an "anterior-to-posterior and subcortical shift" (BAPSS) has been described for task-induced neural activity with increasing bilingual experience (Grundy et al., 2017). As this shift is interpreted as increasing efficiency of cognitive processing in bilinguals, resulting from the cognitive requirements of bilingualism, BAPSS may represent a form of cognitive reserve in bilinguals (Grundy et al., 2017). When it comes to aging, however, a "posterior-to-anterior shift" (PASA) in neural activity has been observed (Davis et al., 2008). PASA is thought to correspond to a shift from automated to controlled processing. Therefore, it might represent a compensatory mechanism maintaining cognitive functioning despite the decline of brain structure in older adults (Davis et al., 2008). In bilinguals, who appear to use more posterior (and subcortical) brain regions for processing, as outlined in BAPSS, frontal brain regions may remain accessible for age-related compensation as described in PASA (Davis et al., 2008) to a greater extent than in monolinguals (Grundy et al., 2017), possibly reflecting another aspect of cognitive reserve in bilinguals.
Complementary to the concept of cognitive reserve, brain reserve refers to structural features such as brain volume, cell count, and number of synapses (Stern, 2009;Bartrés-Faz and Arenaza-Urquijo, 2011;Stern et al., 2020). It is assumed that individuals with higher brain reserve can tolerate more decline before reaching a certain threshold under which clinical deficits become evident (Stern, 2009;Bartrés-Faz and Arenaza-Urquijo, 2011;Stern et al., 2020). Interestingly, the structural adaptations of the brain to bilingualism appear to result in higher brain reserve, as higher gray matter volume (GMV), higher cortical thickness (CT), and higher white matter integrity found in bilinguals when compared to monolinguals (for reviews, cf. e.g., Li et al., 2014;Hayakawa and Marian, 2019;Taylor et al., 2022) can be seen as proxies of brain reserve. When combining the two concepts, higher brain reserve as well as cognitive reserve in bilinguals can explain how bilingualism may delay age-related cognitive decline not only in healthy older subjects (e.g., Bialystok et al., 2004;Gold et al., 2013;Bak et al., 2014), but also in terms of neurodegenerative diseases (i.e., bilingualism appears to delay the clinical onset of dementia by four to five years; Craik et al., 2010;Alladi et al., 2013;Perani et al., 2017;meta-analyses in Anderson et al., 2020;Paulavicius et al., 2020;see also Voits et al., 2020).
A concept closely related to brain reserve is "brain maintenance." Greater brain maintenance corresponds to reduced age-related structural decline over time, possibly modulated by lifestyle or genetic factors (Stern et al., 2020). While brain reserve corresponds to brain structure at a single time point, brain maintenance is best evaluated longitudinally (Stern et al., 2020). Recently, Costumero et al. (2020) found less GMV decline in bilinguals compared to monolinguals across a time interval of 7 months (all individuals being older adults with diagnosis of mild cognitive impairment), which could be interpreted as first evidence for brain maintenance in bilinguals. However, additional longitudinal studies investigating trajectories of structural change in the older bilingual brain are needed to further investigate the relationship between brain maintenance and bilingualism.
Interestingly, a recent cross-sectional study found evidence not only for higher GMV, but also for a steeper GMV decline with aging in the left IFG and IPL in bilinguals compared to monolinguals (Heim et al., 2019). The left IFG is described as a critical brain region for language production and comprehension, e.g., in terms of lexical retrieval, semantic and phonological fluency, and syntax processing (Heim et al., 2009;for review, cf. Friederici, 2011;Li et al., 2014). In bilinguals, the left IFG seems to be involved in response selection, e.g., in language switching tasks (for review, cf. Abutalebi and Green, 2016). The left IPL, on the other hand, is involved in phonological as well as semantic processing (for review, cf. Li et al., 2014;Binkofski et al., 2016) and has been found to play a key role in second language acquisition (Barbeau et al., 2017) and vocabulary knowledge (Lee et al., 2007). Their role in language processing might explain why these two brain regions show structural differences, such as higher GMV, in bilinguals when compared to monolinguals. However, with evidence for a steeper GMV decline in the left IFG and IPL in bilinguals, volume differences between mono-and bilinguals appear to diminish over time, with a higher persistence of a bilingual brain reserve in posterior than anterior brain regions (Heim et al., 2019). This pattern might reflect a more pronounced activation of posterior brain regions in bilinguals compared to monolinguals, as described in BAPSS (Grundy et al., 2017). Nevertheless, since cross-sectional results may differ substantially from results obtained from longitudinal data (e.g., Hedden and Gabrieli, 2004;Salthouse, 2010), the differential GMV trajectories in monolinguals and bilinguals predicted by cross-sectional studies remain to be confirmed in longitudinal analyses. Therefore, the present study was devised as follows: (1) The previous cross-sectional study (Heim et al., 2019) was to be replicated in a large-scale population-based longitudinal design over two time points. Hence, trajectories of GMV decline over time were investigated in mono-and bilinguals in the cytoarchitectonically defined IFG (Amunts et al., 1999) and IPL (Caspers et al., 2006(Caspers et al., , 2008. We predicted higher GMV in bilinguals in the IFG and IPL in both hemispheres. Additionally, we expected a steeper GMV decline in bilinguals in the IFG (cf. Heim et al., 2019) and IPL (cf. Abutalebi et al., 2015) in the left, but not necessarily in the right hemisphere. (2) In a refined model, age, sex, education, and intracranial volume (ICV) were included as covariates. (3) To set a focus on the investigation of the older adult population, basic as well as refined analyses were conducted using a subsample, comprising only participants ≥ 55 years old.
(4) Finally, regression analyses were conducted to evaluate the influence of experience-based factors such as AoA, LoP, and number of languages actively spoken, on GMV in the IFG and IPL in bilinguals.

Participants
The current sample was derived from the longitudinal population-based 1000BRAINS study (Caspers et al., 2014). 1000BRAINS aims at investigating inter-individual variability in brain aging in healthy adults. Subjects for 1000BRAINS were drawn from the Heinz-Nixdorf Recall (HNR) study and the subsequent HNR MultiGeneration study, which have been conducted in the German Ruhr area to assess risk factors for atherosclerotic disease, myocardial infarction and cardiac death (Schmermund et al., 2002;Erbel et al., 2012). With 1000BRAINS being a populationbased study, exclusion was based solely on contraindications to magnetic resonance imaging, i.e., coronary artery stents, cardiac pacemakers, surgical implants or prostheses in the trunk or head, claustrophobia, a history of neurosurgery, the presence of tattoos or permanent make-up on the head, and dental implants and bridges (the latter being a relative contraindication; see Caspers et al., 2014). Written informed consent was obtained from all subjects prior to participation in 1000BRAINS. The study was performed in accordance with the Declaration of Helsinki. All methods were approved by the local Ethics Committee of the University of Essen, Germany.
From the initial 1000BRAINS cohort (n = 1,314), 466 subjects took part in a second examination. From this sample, 269 individuals completed the Language Experience and Proficiency Questionnaire (LEAP-Q; Marian et al., 2007) and had structural MRI data sets from two time points and were thus eligible for the current longitudinal study. Left-handed individuals (n = 8, Laterality Quotient < −60 as assessed with the Edinburgh Handedness Inventory; Oldfield, 1971) or individuals who did not provide any data regarding their handedness (n = 1) were excluded. Further exclusion criteria based on LEAP-Q data were simultaneous bilingualism (n = 6; see discussion above for putative structural differences between simultaneous and sequential bilinguals; for reviews, cf. Berken et al., 2017;Pliatsikas, 2020) and developmental first language deficiencies in any modality (speaking, comprehending, reading, writing) (n = 32). Moreover, eight subjects had to be excluded due to methodological problems within the preprocessing of structural brain images. Further exclusion of 14 participants due to outlier correction (GMV, CT and/or white matter surface area (SA) exceeding three standard deviations from the mean) resulted in the final sample of 200 participants (87 monolinguals and 113 bilinguals; Table 1). For

Assessment of bilingualism
Participants' second language status was determined using LEAP-Q data (Marian et al., 2007) during the first examination. The LEAP-Q is a questionnaire to set up language profiles in bilinguals and multilinguals regarding age of acquisition, proficiency of all modalities, manner of acquisition and immersion in a bilingual environment. For the present study, individuals who indicated to currently speak, understand, read and/or write in more than one language were classified as bilinguals. Consequently, participants with no or lost second language abilities were classified as monolinguals. While monolinguals within the current sample spoke German only, language backgrounds for bilinguals (mostly native German speakers) are reported in Table 2.
Of the bilinguals, 20.4% reported a very good and 46.0% a good level of proficiency for speaking, understanding, reading and/or writing in their second language, while 26.5% reported an adequate and 7.1% a low level of proficiency as maximum. Second languages were rated according to self-reported proficiency in the respective language. The mean age of acquisition of the second language that was associated with the highest proficiency was 13.2 years (± 7.2 years).

Image processing
Magnetic resonance imaging sequences were processed using the automated surface-based longitudinal pipeline implemented in FreeSurfer 6.0 (for a detailed description, see Reuter et al., 2012), which consists of three major steps: First, structural images from both time points were processed individually, corresponding to the processing of cross-sectional data (cf. Dale et al., 1999;Fischl et al., 1999a). Second, a within-subject template was built across the resulting data from the two time points (Reuter et al., 2012). Third, information from both, the cross-sectional as well as longitudinal Distribution of bilinguals who reported language abilities for the respective language in percent. "Other" includes: Ancient Greek, Finnish, Portuguese, Serbian, Swedish, Ukrainian. Dialects were not considered within the present study, since only one participant reported language abilities for a dialect, and this participant had to be excluded from the sample due to simultaneous bilingualism. The inferior frontal gyrus (IFG) and the inferior parietal lobule (IPL) in the left (A) and right (B) hemisphere based on the cytoarchitectonic probabilistic Jülich-Brain atlas (Amunts et al., 2020).
preprocessing, were used to generate surface maps for GMV, CT, and SA (Reuter et al., 2012). The present study targeted two language-relevant regions of interest (ROIs): the left IFG (Amunts et al., 1999) and the left IPL (Caspers et al., 2006(Caspers et al., , 2008; Figure 1A). The right IFG and right IPL were analyzed as control regions ( Figure 1B). Predefined ROI masks derived from the probabilistic cytoarchitectonic Jülich-Brain atlas (Amunts et al., 2020) were mapped onto the reconstructed surface maps. The masks were inspected by a neuroanatomy specialist (S.C.) when overlayed on FreeSurfer's fsaverage template (Fischl et al., 1999b) and manually corrected when necessary. For both ROIs and control regions, GMV as well as CT and SA were extracted from the longitudinally processed data.
Corresponding to the cross-sectional analyses by Heim et al. (2019), the current study focused on the analysis of GMV. GMV, as the product of CT and SA, can be seen as a multi-determined parameter that may provide insights into structural variability of the brain that might not be captured by sole analyses of CT or SA (Nicolaisen-Sobesky et al., 2022). However, since CT and SA are thought to be genetically and phenotypically independent from each other (Panizzon et al., 2009;Winkler et al., 2010), CT and SA analyses are reported as well, providing a first step to disentangling the picture of age-related structural change in the bilingual brain. To evaluate whether bilinguals show a steeper GMV decline when compared to monolinguals, mixed Analyses of Covariance (ANCOVAs) were conducted separately for each of the four ROIs (left/right IFG and left/right IPL). As the aim of the present study was to replicate the previous cross-sectional analysis (Heim et al., 2019) in a longitudinal design, the basic ANCOVA model was designed as similar to the crosssectional study as possible. Therefore, GMV values from both time points were treated as dependent variables, while language group (monolinguals/bilinguals) and age group (younger/older participants, sample split at the age median of 62.8 years; for demographic information, see Supplementary Table 1) were included as between-subject factors (to further relate the findings from the present longitudinal study to the former cross-sectional analysis, see Supplementary material: Supplementary methods and Supplementary Table 2). Additionally, in the present study, time point (t1/t2) was included as within-subject factor, while time interval between the two measurements was treated as covariate of no interest. The resulting 2 × 2 × 2 ANCOVA allowed the assessment of (i) GMV changes over time between t1 and t2, (ii) differences in GMV for language groups and age groups, and (iii) the interaction between time point and language group.

Statistical analysis
To assess putative interhemispheric differences regarding GMV trajectories over time in bilinguals compared to monolinguals in the IFG and IPL, hemisphere (left/right) was included in an additional ANCOVA model as within-subject factor. This resulted in a 2 × 2 × 2 × 2 ANCOVA with time interval as covariate. All analyses were additionally performed for CT and SA.

Refined ANCOVA models
Following the mere replication of the cross-sectional study (Heim et al., 2019) in a design over two time points, the ANCOVA models were refined in a next step. To control for the effects of potential confounds on GMV, age, sex, education (as assessed with the International Standard Classification of Education; UNESCO Institute for Statistics [UIS], 2012), and intracranial volume (ICV) were included into the basic ANCOVA model as covariates of no interest. Due to the inclusion of age as a covariate, age group was excluded as between-subject factor, resulting in a 2 × 2 ANCOVA with GMV values from both time points as dependent variables, language group as between-subject factor, time point as withinsubject factor, and age, sex, education, ICV, and time interval as covariates. In a next step, hemisphere was added to the model as within-subject factor. Analogous analyses were performed for CT and SA.
One may discuss whether ICV can be seen as a meaningful covariate for CT and SA analyses (for an investigation of the relationship between ICV and GMV, CT, and SA, see Im et al., 2008). Thus, additional CT and SA analyses only including age, sex, education, and time interval as covariates were conducted. For the sake of comparison, corresponding GMV analyses were performed as well.

Regression analyses
To evaluate the influence of experience-based factors such as AoA, LoP, and number of actively spoken languages on GMV in the bilateral IFG and IPL in bilinguals, regression analyses were conducted. Since the experience-based factors were, by definition, available for bilinguals only, monolinguals were excluded from the models. For each of the four ROIs, three separate analyses were performed with (1) GMV at t1, (2) GMV at t2, and (3) GMV differences between t1 and t2 as dependent variable, respectively. Variables of interest (AoA, LoP, and number of actively spoken languages) and nuisance variables (age, sex (males set to 0, females to 1), education, ICV) were treated as predictors in all models. For the analyses of GMV differences between t1 and t2, time interval was added as additional nuisance variable.
Corresponding analyses were performed for CT and SA. Again, additional analyses not including ICV as a nuisance variable were conducted as well.

Subsample analyses: participants ≥ 55 years old
Within the present study, participants' age ranged from 18.5 to 79.4 years at time point t1. To set a focus on the investigation of the inter-individual variability within an older adult population with a more homogenous distribution of mono-and bilinguals, mixed ANCOVAs and regression analyses were not only conducted with the total sample of 200 participants, but also with a subsample comprising only subjects ≥ 55 years of age at t1 (83 monolinguals, 71 bilinguals, Table 1).
For the subsample, basic and refined ANCOVAs as well as regression analyses were performed corresponding to GMV analyses of the total sample. The only difference was the exclusion of age group as between-subject factor in the basic ANCOVA model, since only older adults were investigated here. The same analyses were conducted for CT and SA.

ANCOVA models
In the next section, results for basic and refined ANCOVA models (the latter including ICV as a covariate) are presented (see also Tables 3-8). Results for refined models excluding ICV as a covariate show a similar pattern to analyses including ICV and are reported in Supplementary Tables 3-5 The GMV change over time in the participants of the present study is depicted in Figure 2 for the analyzed regions. Regarding the interaction between language group and time point, GMV decline within the left IPL was significantly steeper in bilinguals when compared to monolinguals [F(1,195) = 4.211, p = 0.042] (Figure 3). In contrast, for the left and right IFG and the right IPL, bilinguals and monolinguals displayed a similar GMV change over time [IFG left: F(1,195)  When including age, sex, education, and ICV as covariates, results for the bilateral IFG and the right IPL (see Table 4) corresponded to the ones emerging from the basic ANCOVA model. For the left IPL, no significant GMV difference between mono-and bilinguals could be found in these analyses [F(1,195) = 2.376, p = 0.125]. However, bilinguals still displayed a tendency for a steeper GMV decline within the left IPL when compared to monolinguals [F(1,195) Including hemisphere in the refined ANCOVA model yielded similar GMV values for the two hemispheres for both, the IPL and the IFG. Corresponding to basic ANCOVA analyses, no significant interaction effect for time point × language group × hemisphere was found, neither for the IPL, nor the IFG (see Table 7).

Subsample analyses
Assessing participants ≥ 55 years only, we revealed similar results as obtained from the total sample for the bilateral IFG as well as the right IPL (see Tables 5, 6). For the left IPL, a higher GMV in bilinguals compared to monolinguals within the subsample was found for the basic ANCOVA model [F(1,195) = 8.703, p = 0.004], which remained a tendency when including age, sex, education, and ICV as covariates [F(1,195)  When including hemisphere as within-subject factor, results from both basic and refined ANCOVA model corresponded to the respective analyses of the total sample (see Table 8). For an overview of mean values and standard deviations for GMV at t1 and absolute GMV differences between t1 and t2 for the subsample, see Supplementary Tables 8, 9.

Analyses for CT and SA
Results for the parallel analyses for CT and SA are presented in Tables 3-8 Tables 3-6).
For CT, similar values were found for mono-and bilinguals in all of the analyses (see Tables 3-6). Regarding trajectories over time, there was a significantly steeper CT decline in the left IPL in bilinguals compared to monolinguals [total sample:  Additionally, in the refined total sample analyses including hemisphere as within-subject factor, a significant interaction effect between hemisphere, language group, and time point emerged for CT in the IPL [F(1,195)

Analyses for GMV
When investigating the influence of AoA, LoP, and number of actively spoken languages on GMV in the bilateral IFG and IPL in bilinguals of the total sample, no significant effect of these predictors emerged for GMV at t1, nor t2, nor for GMV difference for any of the four ROIs (see Supplementary Tables 10, 11). The same results were found for bilinguals of the older subsample (see Supplementary Tables 16, 17).

Analyses for CT
For CT in the left IFG, there was a tendency of later AoA predicting lower CT at t1 in bilinguals of the total sample (unstandardized coefficient B = −0.003; standard error = 0.002; p = 0.054). Additionally, higher LoP in this group predicted less CT decline between t1 and t2 for the left IFG (unstandardized coefficient B = 0.002; standard error = 0.001; p = 0.034) and there was a tendency for the very same effect within the left IPL (unstandardized coefficient B = 0.001; standard error = 0.001; p = 0.055). However, none of these effects showed significance when investigating bilinguals of the older subsample (see Supplementary Tables 18, 19).
For CT at t1 and t2 and CT differences in the right IFG, no effect of AoA, LoP and number of actively spoken languages was      found, neither in bilinguals of the total sample (see Supplementary  Table 12), nor of the subsample (see Supplementary Table 18). For the right IPL, a higher number of actively spoken languages showed a tendency of predicting higher CT at t2 within bilinguals of the subsample (unstandardized coefficient B = 0.035; standard error = 0.018; p = 0.059), while no effect of bilingual experiencebased factors on CT at one of the two time points nor on CT differences was found within bilinguals of the total sample (see Supplementary Table 13).
For SA at t1 and t2 and SA differences in the right IFG as well as in the bilateral IPL, no effect of AoA, LoP and number of actively spoken languages was revealed, neither in bilinguals of the total sample (see Supplementary Tables 14, 15), nor of the subsample (see Supplementary Tables 20, 21).

Discussion
The present large-scale population-based study over two time points provides novel insights into the effects of longterm bilingualism on cortical brain structure. Five major results emerged: (1) For basic analyses of the total sample, there was a steeper GMV decline over time in bilinguals as compared to monolinguals in the left IPL, confirming the earlier crosssectional observations (Heim et al., 2019) for the first time over two time points. However, this effect showed only a tendency toward significance when including age, sex, education, and ICV as covariates, and analyses of the older subsample yielded no significantly differing decline in mono-vs. bilinguals within the left IPL at all. (2) In both hemispheres, bilinguals showed a higher GMV in the IPL, but not the IFG, for basic analyses of the total sample, indicating that bilingualism might contribute to brain reserve especially in posterior brain regions. For refined analyses as well as subsample analyses, this effect was more stable for the right IPL.
(3) With a steeper GMV decline in bilinguals as found in basic total sample analyses, GMV differences between monolinguals and bilinguals appear to diminish over time in the left IPL. In contrast, monolinguals and bilinguals showed a similar GMV change with aging in the right IPL, indicating that the bilingual brain reserve might be more persistent in the right IPL. (4) Analyses of CT and SA as the two factors of GMV revealed that, while SA appears to be the factor explaining the overall higher GMV in bilinguals in the bilateral IPL, CT explains more of the age-related changes in GMV than SA. For CT, there was also a steeper decline over time  in bilinguals in the left IPL, corresponding to GMV trajectories in basic analyses of the total sample. In contrast to GMV, the differing CT trajectories for mono-vs. bilinguals within the left IPL were also stable for refined total sample analyses including age, sex, education, and ICV as covariates, as well as for refined subsample analyses. (5) For the left IFG, higher LoP was associated with less CT decline over time within bilinguals. Additionally, later AoA predicted higher SA in this brain region at both, t1 and t2, and was associated with less SA decline over time within bilinguals. Thus, even though monolinguals and bilinguals had shown similar GMV, CT, and SA within the left IFG when compared directly, bilingual experience-based factors such as AoA and LoP appear to modulate brain structure as well as trajectories of structural change over time within the left IFG.

The longitudinal effects of bilingualism on brain structure
Regarding GMV trajectories in mono-and bilinguals in the IPL, the present longitudinal results partially underpin those from cross-sectional studies. For the left IPL, a steeper GMV decline in bilinguals was found within the total sample for the basic ANCOVA, corresponding to results from Heim et al. (2019). This effect seemed to be especially attributable to aging-related CT changes. For the right IPL, however, trajectories of structural decline were similar for mono-and bilinguals in all of the analyses, which is in contrast to Abutalebi et al. (2015), who provided evidence for a steeper GMV decline in monolinguals in this region. While the left IPL is relevant for language processing, the right IPL has been associated with visuo-spatial attention reorientation as an aspect of executive functions (Numssen et al., 2021;for review, cf. Binkofski et al., 2016). Thus, the hypothesis proposed by Heim et al. (2019), that structural decline over time seems to be steeper in bilinguals in language-related areas, while bilingual brain reserve in the non-linguistic domain appears to be more persistent, is supported by the present data from two time points.
To provide explanations for the observed inter-hemispheric differences in GMV trajectories in mono-and bilinguals in the IPL in basic total sample analyses, two hypotheses will be discussed: First, a steeper GMV decline in bilinguals in the left IPL may represent increasing efficiency in bilinguals in language-related brain regions. A model that may provide a helpful framework for this hypothesis is the so-called "dynamic restructuring model" (DRM; Pliatsikas, 2020), which will be presented briefly. Second, a steeper decline in bilinguals may be expected in both, the left as well as the right IPL. However, due to age-related compensation strategies, such as increasing use of the right IPL with aging in bilinguals, structural decline in the right IPL may be attenuated in bilinguals.
According to the DRM (Pliatsikas, 2020), different cortical and subcortical adaptations in gray and white matter pertain to three different phases: (1) initial exposure, (2) consolidation, and (3) peak efficiency. The initial exposure to a new language results in an increase in cortical gray matter volume (GMV). During consolidation of the new skill, cortical GMV decreases again while  subcortical and cerebellar GMV as well as white matter structural connectivity increase. The third stage, peak efficiency, implies further adaptations of cerebellar and subcortical GMV and white matter structural connectivity as a result of increasing proficiency and immersion in the additional language (DeLuca et al., 2019b;Pliatsikas, 2020).
The first hypothesis is predicated on the first two stages of structural plasticity in the bilingual brain, as proposed by the DRM: initial exposure and consolidation (Pliatsikas, 2020). As it is assumed that initial exposure to a new language induces GMV increases in cortical brain regions relevant for language and executive control, this may reflect a local generation of dendritic spines and new neural pathways during learning (Lövdén et al., 2013), possibly facilitated by an upregulation of neurotrophic factors such as noradrenaline (Robertson, 2013;Guzman-Velez and Tranel, 2015). Correspondingly, in the current study, bilinguals showed a higher GMV in the bilateral IPL compared to monolinguals (with this effect being more stable for the right IPL in the refined total sample analysis).
With increasing proficiency and bilingual experience, a partial or complete return to baseline GMV takes place during the second phase, consolidation (Pliatsikas, 2020). This is supposed to reflect a selection of most efficient circuits, with non-efficient and therefore under-utilized spines being eliminated via pruning (Lövdén et al., 2013;Wenger et al., 2017). Thus, cortical GMV may decrease without loss of the novel skill (Wenger et al., 2017). In the present study, the steeper decline in bilinguals in the language-relevant left IPL might consequently reflect an ongoing selection of most efficient neural circuits due to a continuous bilingual experience. Nevertheless, one has to bear in mind that previous expansion-renormalization trajectories have been observed mostly in training studies over a course of weeks or months (Wenger et al., 2017;Pliatsikas, 2020), while the present data evaluate changes across a mean time interval ± SD of 3.6 ± 0.8 years after lifelong bilingual experience. Therefore, one might question whether the current findings could indeed correspond to continuously increasing efficiency in bilinguals or whether they might, contrarily, reflect an accelerated structural decline in aging bilinguals. However, if the steeper decline in bilinguals in the left IPL would actually reflect a continuous selection of most efficient neural circuits, corresponding to the stage of consolidation, then the DRM would predict parallel increases in cerebellar and subcortical GMV as well as greater structural connectivity in terms of white matter tracts in bilinguals (Pliatsikas, 2020). This might indicate a shift in weight from lexical acquisition (subserved by cortical regions) to language control (provided by subcortical and cerebellar structures) with increasing bilingual experience, the latter facilitated by efficient long-distance connectivity (Pliatsikas, 2020), which may be investigated in future studies. Figure 4 integrates the novel insights from the present analyses into the DRM.
The second hypothesis is based on the idea that bilingualism might modulate GMV trajectories not only in the left, but also in the right IPL. For the right IPL, which is relevant for executive functions such as visuo-spatial attention reorientation (Numssen et al., 2021;for review, cf. Binkofski et al., 2016), there was no steeper GMV decline in bilinguals as compared to monolinguals in the present study. Corresponding to the interpretation of GMV trajectories in the left IPL, a steeper GMV decline in bilinguals could have reflected increasing efficiency, while similar trajectories GMV change over time in mono-and bilinguals in the left IPL. For both language groups, mean GMV and standard error of the mean are depicted for t1 and t2 as well as V, i.e., the mean absolute GMV difference between t1 and t2. The GMV decline over time is significantly steeper in bilinguals. GMV, gray matter volume; IPL, inferior parietal lobule; t1, first time point; t2, second time point.
in mono-and bilinguals might indicate that bilingualism does not result in a continuous increase of efficiency in the domain of executive control. On the other hand, it is possible that not only language processing in the left IPL, but also the executive functions provided by the right IPL would become increasingly efficient in bilinguals over time. The increasing efficiency should be reflected in a steeper GMV decline in bilinguals due to pruning (Lövdén et al., 2013) and in increasing structural connectivity in bilinguals as proposed by the DRM (Pliatsikas, 2020). Simultaneously, however, GMV may increase in bilinguals due to an additional activation of the right IPL, for example in the context of aging-related compensation, to the extent that monolinguals and bilinguals show a similar GMV decline over time in the right IPL.
The interaction of three pre-existing models, that describe shifts in task-induced neural activity in the course of bilingualism and aging, may explain why monolinguals and bilinguals show a similar GMV decline over time in the right IPL (for a depiction of the models, see Figure 5). The models will be briefly presented in the following: (1) In bilinguals, a "bilingual anterior-to-posterior and subcortical shift" (BAPSS) can be observed with increasing second language experience (Grundy et al., 2017). This is interpreted as a shift from controlled (frontal) to automatic (posterior and subcortical) processing in bilinguals. As increasing activation of a certain brain region may result in an increase of GMV in the very same region (Lövdén et al., 2013), the higher GMV in Two models of GMV change over the course of time in mono-and bilinguals. While GMV change in monolinguals is sketched as continuous GMV decline, three stages of dynamic structural change are depicted for bilinguals, as suggested in the "dynamic restructuring model" (DRM; Pliatsikas, 2020 ): Learning an additional language results in increasing GMV, followed by GMV decrease during the phase of consolidation. With aging, the decline in left-hemispheric language areas is steeper in bilinguals than in monolinguals (A). Hence, the volume differences between monolinguals and bilinguals disappear over time, possibly reflecting an ongoing selection of most efficient neural circuits in bilinguals with continuous bilingual experience. Thus, bilingualism might result in constantly increasing efficiency with regards to language processing. In right-hemispheric regions related to domain-general control, the decrease in bilinguals attenuates until it matches monolingual decline (B). Thus, bilingualism may provide a persistent brain reserve in the non-linguistic domain. Figures adapted from Heim et al. (2019). bilinguals in the IPL found in the present study may reflect the more pronounced activation of posterior brain regions during cognitive processing in bilinguals. (2) With aging, however, a "posterior-toanterior shift" (PASA) in neural activity has been described (Davis et al., 2008). This shift from automated to controlled processing has typically been interpreted as a compensatory mechanism to maintain cognitive functioning (Davis et al., 2008). In bilinguals, who seem to rely more on posterior (and subcortical) regions for processing, as described in BAPSS, anterior brain regions may remain available for age-related compensation as outlined in PASA (Davis et al., 2008) to a higher extent than in monolinguals (Grundy et al., 2017). Therefore, the bilingual brain reserve in the IPL, reflected by higher GMV, seems to be particularly beneficial (Heim et al., 2019), as it might support the capacity for compensation in frontal brain regions in bilinguals up to an older age than in monolinguals, thus possibly delaying age-related cognitive decline. (3) Additionally, the "hemispheric asymmetry reduction in older adults" (HAROLD) model states that in older A model of shifts in task-induced neural activity in mono-and bilinguals in the course of aging (dashed lines) and bilingualism (solid lines). With aging, a "posterior-to-anterior-shift" (PASA; Davis et al., 2008) and a "hemispheric asymmetry reduction" (HAROLD; Cabeza, 2002) can be observed (in terms of language, which is usually processed in the left hemisphere, HAROLD predicts an increased recruitment of the right hemisphere with aging, as depicted here). In monolinguals, this leads to increasing prefrontal activation with aging (A). In bilinguals, however, an "anterior-to-posterior and subcortical shift" (BAPSS; Grundy et al., 2017) occurs with increasing bilingual experience, which may counteract age-related changes predicted by PASA (B). Therefore, BAPSS may represent a form of cognitive reserve in bilinguals. Additionally, the interaction of BAPSS and HAROLD may lead to increasing use of the right IPL with aging in bilinguals, but not monolinguals, possibly resulting in an attenuation of structural decline in bilinguals in the right IPL.
adults, corresponding regions in both hemispheres are recruited for previously lateralized processes (Cabeza, 2002). This is suggested to reflect a compensatory process to maintain cognitive functioning despite age-related structural atrophy. In terms of language, which is usually processed in the left hemisphere, the HAROLD model would predict an increased recruitment of the right hemisphere with aging. An interaction of BAPSS and HAROLD may result in an attenuation of structural decline in bilinguals, but not monolinguals, in the right IPL: An increasing use of the right IPL with aging in bilinguals might compensate for GMV loss in the left IPL, as described in HAROLD (Cabeza, 2002). In turn, monolinguals, who seem to rely less on posterior brain regions than bilinguals (Grundy et al., 2017), might rather recruit frontal brain regions for compensation as described in PASA (Davis et al., 2008) than the right homologue of a posterior brain region. This could explain why bilinguals do not show a steeper decline than monolinguals in the right IPL, in contrast to GMV trajectories in the left IPL. Notably, while higher GMV in the IPL in bilinguals has been interpreted as a form of brain reserve, the above-described hypothesis might reflect a mechanism of brain maintenance in bilinguals, possibly resulting in a reduced structural decline over time in bilinguals in the right IPL.
The two hypotheses proposed to explain the inter-hemispheric differences in structural decline in the bilingual brain do not mutually exclude each other. Instead, they touch upon complementary aspects of the same topic, since the first hypothesis focuses on the steeper decline in the left IPL in bilinguals and the second on the (presumably) attenuated decline in the right IPL. Future research is warranted to examine whether these hypotheses prove true. Additionally, one has to bear in mind that GMV trajectories in the left and right IPL in mono-and bilinguals did not differ significantly from each other when compared directly.
However, as the interaction between hemisphere, language group, and time point was significant for CT for the refined total sample analyses in this brain region, the results from CT analyses underline the idea of inter-hemispheric differences regarding structural change in the bilingual brain during aging. Furthermore, there was only a tendency of a steeper GMV decline in bilinguals within the left IPL when age, sex, education, and ICV were included into the ANCOVA model as covariates, and analyses of the subsample including only participants ≥ 55 years revealed no significantly differing decline in mono-vs. bilinguals at all. For CT, however, the decline remained significantly steeper in bilinguals not only when including the additional covariates in the total sample analyses, but also for the refined analyses of participants ≥ 55 years. Thus, SA appears to be the factor contributing to higher GMV in bilinguals in cortical brain regions corresponding to previous studies (Li et al., 2017), but CT might be the one mediating the effects of aging on GMV.
In contrast to cross-sectional results, which indicated higher GMV in bilinguals in the left and right IFG and a steeper GMV decline in bilinguals in the left IFG (Heim et al., 2019), group differences between mono-and bilinguals within the bilateral IFG are missing in the present data. On one hand, the previous crosssectional study might have overestimated the effect of bilingualism on GMV in the IFG. On the other hand, the smaller number of participants in the present study compared to the cross-sectional sample of Heim et al. (2019), resulting in a reduced statistical power, might explain the discrepancy. However, regression analyses revealed that AoA and LoP, factors modulating the bilingual experience, seem to have an impact on structural parameters and their trajectories over time within the left IFG in bilinguals of the present sample. Previous studies also showed an effect of these experience-based factors on bilingual brain structure in the left IFG, albeit partially contradictory to the present results: for AoA, for example, Klein et al. (2014) found that later AoA was associated with higher CT in the left and less CT in the right IFG, while there was a tendency of later AoA predicting less CT within the left IFG for the current study. Additionally, for bilinguals of the present sample, later AoA was associated with higher SA in this brain region at both, t1 and t2, and predicted less SA decline over time. For LoP, the present study revealed less CT decline over time with higher LoP. Thus, viewing bilingualism as a continuous spectrum of experiences modulated by factors such as AoA and LoP may reveal effects that do not become evident when performing only dichotomous group comparisons between mono-and bilinguals (see also DeLuca et al., 2019aDeLuca et al., , 2020. For the bilateral IPL, AoA, LoP and number of actively spoken languages showed no significant effects on brain structure within bilinguals of the present sample. Previously, earlier AoA and higher LoP have been associated with higher gray matter density in the left (and for AoA, also in the right) IPL (Mechelli et al., 2004). However, when investigating GMV in the bilateral IPL in older bilinguals (mean age ± SD: 63.2 ± 5.86), Abutalebi et al. (2015) observed no effect of AoA on GMV, corresponding to the current findings. Thus, it is possible that the importance of AoA for the modulation of adaptations to bilingualism may diminish in the case of lifelong second language use (Abutalebi et al., 2015).
Altogether, a similar pattern emerges from cross-sectional and longitudinal observations when taking results for both, IFG and IPL, into account: Bilingualism appears to add brain reserve, expressed by higher GMV, especially to posterior brain regions (Heim et al., 2019). Additionally, there seems to be a steeper structural decline in bilinguals in the left, but not in the right hemisphere (Abutalebi et al., 2015;Heim et al., 2019). The latter finding, possibly reflecting a reduced structural decline in bilinguals in the right IPL, could correspond to a form of brain maintenance in bilinguals. Yet, further longitudinal studies are necessary to explore the impact of bilingualism on structural change in the human brain over time in greater detail.

Limitations and future directions
There are some limitations of the current study that should be mentioned. First, because the present sample included mainly native German speakers, the LEAP-Q was chosen to evaluate participants' second language status, as there is a German version available. The use of an alternative instrument would have most likely made it necessary not only to translate, but also validate said instrument. However, the LEAP-Q as a self-assessment questionnaire does not include objective evaluation of second language abilities, which may be considered a disadvantage. Nonetheless, the LEAP-Q has shown external validity when compared to objective measures of language proficiency (Marian et al., 2007) and may therefore be regarded a reliable tool for the evaluation of language abilities in the context of the present study.
Second, the present results are based on structural MRI data from two time points only. Thus, further studies encompassing more than two time points are necessary. Additionally, future longitudinal studies could take more than one parameter into account -for example, analyses of GMV could be combined with analyses of functional and structural connectivity or with neurocognitive data -, thus eventually providing an integrated view on longitudinal changes in the brains of long-term bilinguals across multiple modalities.
Third, bilinguals were mostly native German speakers with variable second language background, and while Germanic, Italic and Balto-Slavic languages could be included in the present study (see Table 2), it is unclear whether the current results would be generalizable for any combination of languages. In a previous cross-sectional study, Abutalebi et al. (2015) investigated GMV in the bilateral IPL in 30 monolinguals compared to 16 Cantonese-English and 14 Cantonese-Mandarin bilinguals. They found higher GMV in bilinguals, corresponding to the present results, but evidence for a steeper GMV decline in monolinguals in the right IPL, in contrast to the present finding of a steeper GMV decline over time in bilinguals in the left IPL. When investigating the influence of linguistic distance on GMV, there was a trend toward significance for the association between second language naming performance and GMV in the left IPL only for Cantonese-Mandarin bilinguals, which was interpreted as possible evidence for increased control demands in bilinguals who speak typologically close languages (Abutalebi et al., 2015). However, for the right IPL, a significant correlation between language exposure and GMV was revealed for both Cantonese-English and Cantonese-Mandarin bilinguals (Abutalebi et al., 2015). Thus, one might assume some consistencies in structural brain adaptations to bilingualism (cf. Danylkiv and Krafnick, 2020), but further large-scale longitudinal studies are necessary to test whether the present finding of a putatively steeper structural decline in bilinguals in the left IPL is generalizable across varying language combinations.

Conclusion
To the best of our knowledge, the present study is the first to investigate age-related GMV changes in bilinguals as compared to monolinguals in a longitudinal approach within a large sample. Importantly, the cross-sectional observations of a steeper GMV decline over time in bilinguals when compared to monolinguals were confirmed over two time points for the left IPL. Additionally, as there was a higher GMV in bilinguals in the IPL, but not the IFG, our results indicate that bilingualism might contribute to brain reserve especially in posterior brain regions. With the steeper GMV decline in bilinguals, which appears to be mediated by CT rather than SA, the volume differences between monolinguals and bilinguals might diminish over time in the left IPL. However, there appears to be a higher persistence of brain reserve in bilinguals in the right IPL. Furthermore, experience-based factors such as AoA and LoP appear to modulate brain structure as well as trajectories of structural change over time in bilinguals within the left IFG. Altogether, the importance of longitudinal studies when investigating the effects of bilingualism on structural features of the human brain becomes evident.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors upon request, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the local Ethics Committee of the University of Essen, Germany. The participants provided their written informed consent to participate in this study.