Mapping of the Language Network With Deep Learning

Background: Pre-surgical functional localization of eloquent cortex with task-based functional MRI (T-fMRI) is part of the current standard of care prior to resection of brain tumors. Resting state fMRI (RS-fMRI) is an alternative method currently under investigation. Here, we compare group level language localization using T-fMRI vs. RS-fMRI analyzed with 3D deep convolutional neural networks (3DCNN). Methods: We analyzed data obtained in 35 patients with brain tumors that had both language T-fMRI and RS-MRI scans during pre-surgical evaluation. The T-fMRI data were analyzed using conventional techniques. The language associated resting state network was mapped using a 3DCNN previously trained with data acquired in >2,700 normal subjects. Group level results obtained by both methods were evaluated using receiver operator characteristic analysis of probability maps of language associated regions, taking as ground truth meta-analytic maps of language T-fMRI responses generated on the Neurosynth platform. Results: Both fMRI methods localized major components of the language system (areas of Broca and Wernicke). Word-stem completion T-fMRI strongly activated Broca's area but also several task-general areas not specific to language. RS-fMRI provided a more specific representation of the language system. Conclusion: 3DCNN was able to accurately localize the language network. Additionally, 3DCNN performance was remarkably tolerant of a limited quantity of RS-fMRI data.


INTRODUCTION
In treating brain tumors, the neurosurgeon must balance the benefit of maximal tumor resection against the risk of a functional impairment consequent to more aggressive approaches. These two factors, maximal resection and functional preservation, are often cited in the surgical literature as predictors of long term survival (1)(2)(3). Thus, preoperative and intraoperative functional localization is critical to optimizing these often conflicting priorities. Functional MRI (fMRI) has been used as an adjunct measure for preoperative mapping of eloquent cortex and intraoperative navigation (4). The gold standard, however, for defining eloquent cortex is intraoperative mapping using electrical cortical stimulation mapping (5). Since electrocortical stimulation carries clinical risk (6), preoperative mapping to optimize the intraoperative surgical approach is an effective means of preserving function. fMRI detects changes in the blood oxygen level dependent (BOLD) signal that reflect the neurovascular response to neural activity. In conventional fMRI, function is localized by presenting stimuli or imposing tasks (such as finger tapping or object naming) (7). More recently, resting state fMRI (RS-fMRI), i.e., fMRI obtained in the absence of stimuli or tasks, has been used to map the brain's functional organization (8). RS-fMRI data is simpler to acquire and does not require patient cooperation (important in children and neurologically impaired patients). Thus, RS-fMRI has opened new opportunities for clinical applications, such as pre-surgical planning (9)(10)(11).
Multiple techniques have been used to map the representation of function using RS-fMRI data. These techniques include independent component analysis (12), seed-based correlation (13), and machine learning methods such as the multi-layer perceptron (MLP) (14). With typical fMRI acquisition times, these methods are signal to noise limited and have limited sensitivity and specificity. More recently, advances in deep convolutional neural networks (CNN) have revolutionized image classification and segmentation problems (15,16). It is thus logical to apply CNN methodology to the problem of mapping resting state networks.
In this study, we apply a CNN to RS-fMRI data for the localization of the language system in a cohort of patients with brain tumors and compare these results to the localization estimated in the same patients with task-based fMRI (T-fMRI). We evaluated T-fMRI vs. RS-fMRI as methods for presurgical mapping of the representation of language. We used aggregated language T-fMRI data in the Neurosynth platform (www.neurosynth.org) (17) to define language localization ground truth. Our hypothesis is that the current study will provide preliminary data for the utility of CNN methods for the localization of the language system. We further wish to characterize differences in language localization between the CNN and T-fMRI methods.

Patients
Patients were retrospectively recruited from the Neurosurgery brain tumor service at the Washington University School of Medicine in Saint Louis (WUSM), initially as part of an NIHfunded tumor data base grant (CONDR NIH 5R01NS066905). This patient cohort was used in a prior study targeting noninvasive localization of sensorimotor cortex (18). The following inclusion criteria were used: new diagnosis of primary brain tumor; age above 18 years; clinical need for an MRI scan including fMRI for presurgical planning as determined by the treating neurosurgeon. Additionally, we required that the patients have both a language task (word-stem completion) T-fMRI and RS-fMRI. Exclusion criteria included: prior surgery for brain tumor, inability to have an MRI scan, or a patient referred from an outside institute with an MRI scan not performed at WUSM. Our cohort include N = 35 patients (male/female 23/12) with a mean age of 44.8 years (23-71 years range). The mean preoperative enhancing tumor volume was 43.8 mL (range: 1.4-207 mL); 28 patients had a left-hemisphere tumor; pathology was most often oligoastrocytoma (11 cases) and glioblastoma (10 cases). Handedness was recorded in 26 patients. To decrease any uncertainty in regard to laterality we also included the laterality index (LI) for all subjects based on (19). Since the 3 left handed patients had LI > 0, and two of the three patients with LI < 0 were right handed (the handedness on the third was not available) we opted to average language activation in all subjects as a single group. Patient demographics are summarized in Table 1. All aspects of the study were approved by the WUSM Institutional Review Board. Clinical data were acquired during preoperative evaluation and reviewed retrospectively.

MRI Acquisition
Patients were scanned with either a 3T Trio or Skyra scanner (Siemens, Erlangen, Germany) using a standard clinical presurgical tumor protocol. Anatomical imaging included T1-weighted (T1w) magnetization prepared rapid acquisition gradient echo (MP-RAGE), T2-weighted (T2w) fast spin echo, fluid-attenuated inversion recover (FLAIR), susceptibilityweighted imaging (SWI), and pre/post-contrast T1w fast spin echo in three projections. Additional sequences for presurgical mapping included Diffusion Tensor imaging (DTI) for track tracing, T-fMRI for motor and language localization, and RS-fMRI.
Both the task and resting state fMRI were acquired using echo planar imaging (EPI) (voxel size 3 × 3 × 3 mm; TE = 27 ms; TR = 2.2 s; field of view = 256 mm; flip angle = 90 • ). The language T-fMRI employed a block design in which patients covertly generated words in response to a visually presented first letter. Five task/rest blocks (10 frames each) were acquired over a total of 90-100 frames (3:40 min total per T-fMRI run). For most subjects, two language task sessions were acquired, and the run with the lowest root-meansquare head motion measure was used in the present analysis. RS-fMRI was acquired in two 160-frame runs (total of 320 frames = 11:44 min).

Pre-processing
The fMRI data were preprocessed using previously described techniques using locally written software (4dfp.readthedocs.io) (14). Preprocessing was identical for RS-fMRI and for T-fMRI and included compensation for slice dependent time shifts, elimination of systemic odd-even slice intensity differences due to interleaved acquisition, and rigid body correction for head movement within and across runs. Atlas transformation was achieved by composition of affine transforms connecting the fMRI volumes with the T2-weighted and MPRAGE structural images, resulting in a volumetric time series in (3 mm cubic) atlas space. Additional preprocessing included: spatial smoothing (6 mm full width half maximum Gaussian blur in each direction), voxelwise removal of linear trends over each run, and temporal low pass filtering retaining frequencies <0.1 Hz. Spurious variance was reduced by regression of nuisance waveforms derived from head motion correction and extraction of the time series from regions of white matter and CSF. The whole brain ("global") signal was included as a nuisance regressor (20). Frame censoring was performed to minimize the impact of head motion on the correlation results (21). Thus, frames (volumes) in which the root mean square (evaluated over the whole brain) change in voxel intensity relative to the previous frame exceeded 0.5% (relative to the whole brain mean) were excluded from the functional connectivity computations (22). All fMRI data acquired in each patient were pooled during preprocessing. Thus, the T-fMRI and RS-fMRI data were mutually co-registered in each patient. Additionally, all T-fMRI and RS-fMRI data in all patients were resampled in a standard atlas space. No attempt was made to correct for the mass effect of tumors. To match acquisition durations of RS-fMRI and T-fMRI (11:44 vs. 3:40 min), we selected 100 contiguous frames from pre-processed RS-fMRI data for comparisons with T-fMRI. Additionally, the full quantity of RS-fMRI data was compared to T-fMRI. T-fMRI responses were evaluated using standard general linear model methods using in house software. Activation maps were generated from the T-fMRI as described in Corbetta Table 2). Statistical analysis of network FC (evaluated within and across the default mode network, the dorsal attention network, vision network, and deep gray structures) between the different data sets revealed no significant group effect attributable to study. Each subject had ∼14 min of resting state fMRI data (TR = 3,000 ms, 3 mm cubic) which was processed using standard methods developed at WUSM (14). Resting state networks (RSNs) were identified using a set of 169 region of interests (ROI) divided into 11 RSNs (14). Multiple (n = 268,000) example sets were generated from the data and then divided into a training (N = 18,7600) and validation (N = 80,400) sets. A 3D convolutional neural network (3DCNN) with a densely connected architecture (15) was trained to classify brain regions as belonging to a priori assigned RSNs. The 3DCNN consisted of 49 layers and 3 dense blocks that performed 3 and 5 cubic convolutions. Batch normalization was used within the network to prevent overfitting and improve performance, and average pooling was used for dimensionality reduction. Training was terminated if the accuracy did not improve after 3 validations. The 3DCNN was implemented in Matlab R2019b (www.mathworks.com).

Testing
For each of 35 tumor patients the T-fMRI results were compared to the 3DCNN results obtained with matched data samples, i.e., 100 contiguous frames of RS-fMRI. Additionally, the 3DCNN analysis was run using all available data (320 RS-fMRI frames per subject). 3DCNN maps representing the probability of language Refer to text for acronyms and citations.
representation were smoothed with stride-1 mode filtering and length-3 box filtering.

A Priori Defined Language Regions of Interest (ROI)
Language representation in the brain resides primarily on two areas of the left cerebral cortex: Broca's area, located in inferior frontal cortex (roughly, Brodmann areas 44 and 45) and frontal operculum (28), is required for fluid performance of phonemic or semantic tasks (29). Wernicke's area extends over portions of temporal and parietal cortex and is essential for understanding written or spoken language (30). The Broca-Wernicke model embodies core expressive and receptive language functions but omits auxiliary functions such as reading (31)(32)(33). We defined the ground truth for language representation using T-fMRI responses aggregated by Neurosynth (17). This representation was confined to the left hemisphere to simplify comparison between the Neurosynth regions and those derived in our patients. To define T-fMRI-based language ROIs, we queried Neurosynth using "language comprehension" as a search term, which identified 107 studies (as of November, 2018) contributing coordinates in Talairach atlas space, each coordinate associated with a Z-score corresponding to the null hypothesis of equally likely activation anywhere in the brain. The returned association map (threshold at Z > 3.7 by Neurosynth) was passed through smoothing and clustering operations (see below), ultimately yielding Broca-and Wernicke-like ROIs in volumetric atlas space (Figure 1A).
The following steps were taken to obtain Broca-and Wernicke-like ROIs starting with a "language comprehension" map (units = Z-score) generated by Neurosynth in 2 cubic mm MNI152 atlas space:

Statistical Analysis
RS-fMRI and T-fMRI produce native measurements with distinct statistical properties. Acceptable conventions for image processing, denoising, and significance testing have evolved distinctly for these functional imaging methods. Additionally, highly non-linear deep learning architectures such as 3DCNNs have unknown statistical properties when applied to functional imaging data. To enable meaningful statistical comparisons in our data, we used probabilistic strategies to enhance the detection of the population-invariant language network and reduce the influence of experimental conventions, differential preprocessing and the biological variability arising from the use of normal training data but pathophysiologic testing data. Specifically, we reused common image processing pipelines wherever possible in our analyses. We reduced all methodspecific metrics to normalized probability maps. For purposes of group-level inferences, for each patient, all temporal imaging information was contracted into T-fMRI activations or RS-fMRI membership in RSNs. As described in the preprocessing methods, all T-fMRI and RS-fMRI data were co-registered to a standardized atlas. Consequently, spatially distributed measures of task activation or resting state network membership retained co-registration in atlas space. We used arithmetic averages of patient data prior to computing comparative analyses at the group-level. Finally, we used method-dependent thresholds for

RESULTS
Mean Language Maps Figure 1 shows group level localizations of the language network. Figures 1A,B are computed from 3DCNN analysis of RS-fMRI, using the full amount of available resting state data (1A), and one third of the available data (1B), comparable to the amount of data available in the T-fMRI. Figure 1C is a group level language map computed from the T-fMRI response to wordstem completion. Figure 1D is derived from the Neurosynth platform (17) using the search term "language comprehension." Both T-fMRI and RS-fMRI clearly identify Broca and Wernicke regions. The 3DCNN method provides highly specific maps with large probability gradients at the margin of the language regions, as would be expected of a method trained on thousands of exemplars including millions of internal parameters. The T-fMRI experiment focused on expressive language and therefore emphasizes Broca's region. The 3DCNN map reflects the properties of spontaneous activity which characteristically is more symmetric than task responses. Robust delineation of both Broca's and Wernicke's area is not surprising as the 3DCNN was trained to recover the topography of T-fMRI responses in RS-fMRI data (14). The crucial difference between the two methods is that the word stem completion task activates areas not specifically associated with language (pink and red arrows in Figures 1, 2) in addition to areas that are specifically associated with language (pink arrows in Figures 1, 2). Task-general responses occur in the cingulo-opercular network (35), the dorsal attention system (23), and fronto-parietal control network (36). Additional discussion of these task-general systems is given below.

Correspondence of Identified Language Maps With Neurosynth Reference
We assessed the topography of T-fMRI and RS-fMRI maps in relation to language ROIs defined on the basis of aggregated fMRI responses to language tasks ( Figure 1D). ROC curves for Broca's area are presented in Figure 3A, and for Wernicke's area in Figure 3B.
In As in the Broca's case, the AUC between the 3DCNN full length data and that for the shortened 100 frame data had overlapping 95% confidence intervals.

Case Examples
This section demonstrates two case examples of the ability of the 3DCNN method to provide data acquired in individual patients and a comparison between the 3DCNN method and the T-fMRI at the individual level. Individual cases took approximately 4 h of computation time on a Dell (Austin, Texas) Power Edge 18 core with Nvidia (Santa Clara, California) v100 GPU.

Case 1
Images from a 44 year old right handed male (RS003) with glioblastoma multiform in the left basal ganglia region are presented in Figure 4. The top two rows display the anatomy with a post contrast T1-weighted and FLAIR images. The bottom two rows display the language localization information from the 3DCNN and T-fMRI overlying the anatomical images. We provide the T-fMRI at several thresholds in accordance with clinical practice. Although the T-fMRI appears noisier (bottom row) than the 3DCNN (third row), the information provided by both methods is similar with significant overlap of the localized language area with the tumor location.

Case 2
Images from a 24 years old right handed male (RS004) with anaplastic glioma in the subcortical left frontal lobe are presented in Figure 5. As in Case 1, The top two rows display the anatomy with a post contrast T1-weighted and FLAIR images. The bottom two rows display the language localization information from the 3DCNN and T-fMRI overlying the anatomical images. In this case, the sagittal views of the two methods are very similar, although the 3DCNN (third row) demonstrates more overlap of the language localization with the tumor as compared to the T-fMRI (bottom row).  Figure 1C). Pink arrows point to components of the cingulo-opercular network (CON). Red arrows point to components of the dorsal attention network (DAN) and fronto-parietal control network (FPC). See text for discussion of these task-general functional systems.

DISCUSSION
The current work demonstrates that the representation of language in the brain can be identified using 3DCNN analysis of RS-fMRI data (Figure 1, white arrows). It should be noted that the 3DCNN was trained to identify language-associated parts of the brain using T-fMRI acquired at Washington University School of Medicine [same training set used by (14)]. The Neurosynth-derived map was obtained from an independent meta-analysis of 107 reported neuroimaging studies. Nevertheless, the 3DCNN and Neurosynth maps are strikingly similar.
The differences between T-fMRI and 3DCNN maps are instructive. The word-stem completion task activated the dorsal anterior cingulate (dACC) (a.k.a the rostral cingulate zone) as well as the right anterior insula (Figures 1, 2, pink arrows). These regions are components of the salience network (37), also known as core task-control regions (35,38). The core task control system is recruited by a wide variety of goal-directed behaviors (38)(39)(40). Functions attributed to the dACC include task control (41,42), error monitoring (43), and conflict detection (44). Additional T-fMRI responses not specific to the LAN occurred in the left superior parietal lobule and the left middle frontal gyrus. These regions are components of the dorsal attention network (DAN) and the fronto-parietal control network (FPC) (Figures 1, 2, red arrows). The DAN responds to any task requiring directed spatial attention (23,(45)(46)(47). The FPC supports goal-directed analysis of environmental stimuli (48)(49)(50). These functional systems are recruited by the word stem completion task as it requires directing attention to and analyzing stimuli presented on an electronic display.
The present results raise the possibility of distinguishing between parts of the brain that are language specific (28) vs. task-general (38)(39)(40). This distinction may be of value in selected neurosurgical cases. Although the DAN and dACC are not conventionally classified as "eloquent" (51), injury to these areas can lead to attentional deficits (52) and to loss of motivated behaviors (53), respectively.
An additional important observation evident in Figure 1 is that the localization of the language network using the 3DCNN appears remarkably tolerant to limited quantities of RS-fRMI data (Figures 1A,B). This characteristic could lead to reduced RS-fMRI acquisition times. Figure 3 compares T-fMRI vs. 3DCNN as regards localization of Broca and Wernicke areas as defined a priori, according to a large collection of T-fMRI studies aggregated by Neurosynth (17). According to the AUC measure, 3DCNN had a small but significant advantage over T-fMRI in localizing Broca's area ( Figure 3A). The difference was much larger in Wernicke's area ( Figure 3B). This result is understandable as the word stem completion task is an expressive language task that preferentially activates Broca's area. Performing several different T-fMRI studies better characterizes the language system (54)(55)(56). Figure 3 also demonstrates 3DCNN tolerance to a limited quantity of data: there is no significant difference in AUC corresponding to 100 vs. 320 frames (red and blue curves).
The case examples demonstrate 3DCNN functional mapping in individual patients with brain tumors (Figures 4, 5). The higher specificity and sharper margins of the 3DCNN method in comparison to T-fRMI is promising. A prospective comparison of the 3DCNN RS-fRMI method vs. T-fMRI remains to be done. Limitations of our study include first, that it is formulated in terms of group-level analyses; hence, our results do not directly speak to the question of which technique provides the best functional localization of language in individuals. Our focus has been on regions instantiating expressive and receptive language functions (Broca and Wernicke). Further study will be needed to evaluate the utility of classifier-based analysis or RS-fMRI for localizing functions such as reading, articulation, and prosody. Additionally, we make no claims for the 3DCNN in determining language lateralization. As our data are retrospective, we have no means of assessing the potential effects of neuro-vascular uncoupling (57). Similarly, we have not considered how anatomical distortion owing to tumor mass may have affected our results, although the average tumor size was small (∼44 mL). This work omits comparison with intraoperative brain mapping; that comparison is reported in prior related work (58). The relatively poor performance of T-fMRI in localizing Wernicke's area is a limitation of the word stem completion task. This limitation could be overcome by use of a wider range of language tasks (33,56). Distorted brain anatomy in tumor patients compromises affine atlas registration; however, this issue will have affected T-fMRI and 3DCNN equally. Finally, a definitive comparison of RS-fMRI vs. 3DCNN in terms of patient outcomes would require a prospective, multi-center, clinical trial (59).

CONCLUSION
This study demonstrates that 3DCNN analysis of RS-fMRI data is able to accurately and specifically localize the language network in patients with brain tumors. In addition to the inherent advantages of RS-fMRI, specifically, limited requirement for patient cooperation, the 3DCNN method provides robust results with limited quantities of data, which is an advantage in the clinical setting. We anticipate that this method will lead to improved pre-surgical localization in future applications.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Washington University Human Research Protection Office. The patients/participants provided their written informed consent to participate in this study.  Disabilities Research Center at Washington University. AS was supported by P30 NS098577-01. EL was additionally supported by the Christopher Davidson Foundation. AD was supported by the Chancellor's Graduate Fellowship. Computations were performed using the facilities of the Washington University Center for High Performance Computing, which were partially provided through NIH grant S10 OD018091.