Structural Brain Imaging Predicts Individual-Level Task Activation Maps Using Deep Learning

Accurate individual functional mapping of task activations is a potential tool for biomarker discovery and is critically important for clinical care. While structural imaging does not directly map task activation, we hypothesized that structural imaging contains information that can accurately predict variations in task activation between individuals. To this end, we trained a convolutional neural network to use structural imaging (T1-weighted, T2-weighted, and diffusion tensor imaging) to predict 47 different functional MRI task activation volumes across seven task domains. The U-Net model was trained on 591 subjects and then subsequently tested on 122 unrelated subjects. The predicted activation maps correlated more strongly with their actual maps than with the maps of the other test subjects. An ablation study revealed that a model using the shape of the cortex alone or the shape of the subcortical matter alone was sufficient to predict individual-level differences in task activation maps, but a model using the shape of the whole brain resulted in markedly decreased performance. The ablation study also showed that the additional information provided by the T2-weighted and diffusion tensor imaging strengthened the predictions as compared to using the T1-weighted imaging alone. These results indicate that structural imaging contains information that is predictive of inter-subject variability in task activation mapping and that cortical folding patterns, as well as microstructural features, could be a key component to linking brain structure to brain function.


INTRODUCTION
Functional MRI (fMRI) maps the locations and intensities of brain activations by measuring the blood-oxygen-level-dependent (BOLD) signal arising from task performance in the scanner. In a research setting, the correlation between the performed task and subsequent BOLD response is estimated for the cortex and averaged over a group of subjects. Mapping brain task function gives researchers insight into regions of interest that activate during task performance for the group being studied. Non-invasive mapping of task activations in the brain using fMRI has also been appealing for research and clinical use in individual subjects. Accurate task activation mapping of individuals is critically important for patient care, such as brain tumor cases requiring neurosurgical resection. Research has shown, however, that individual task fMRI (tfMRI) activation maps have both limited reliability and accuracy (Weng et al., 2018;Elliott et al., 2020;. Therefore, investigation into alternatives for individual task mapping is warranted. One potential alternative to relying solely on tfMRI for individual subject mapping is to deduce localization and intensity of task activations from structural imaging, such as anatomical and diffusion imaging. Unfortunately, there is little evidence demonstrating that structural imaging features can predict variations in task activations from multiple domains in individual subjects. Tavor et al. demonstrated that while a linear model using both structural and resting-state fMRI features could predict individual subject variations in task activation, an identical model using only structural features could not. The more complex method of extracting white matter tractographybased connectivity features from the diffusion signal has shown promise for predicting individual differences in responses to visual stimuli and word reading tasks (Saygin et al., 2012(Saygin et al., , 2016Osher et al., 2015;Ekstrand et al., 2020), but the same has not been reported for a broader range of task domains.
We hypothesize that a trained convolutional neural network (CNN) will predict task activation maps from diffusion and anatomical imaging that are sensitive to inter-subject differences over a wide variety of task domains. This finding would demonstrate that structural imaging features do contain variances predictive of functional task differences in individuals.

Data Acquisition
Imaging data was obtained from the Human Connectome Project (HCP) young adults S1200 public data release (https:// www.humanconnectome.org/study/hcp-young-adult) provided by the WU-Minn HCP consortium. All subjects were 22 to 35 years old and were scanned by the HCP using Washington University's 3T Siemens Connectome Scanner (Van Essen et al., 2012). Preprocessed and aligned anatomical and structural data was provided by the HCP in the subject T1-weighted (T1w) imaging space, including T1w and T2-weighted (T2w) with 0.7 mm isotropic voxel size, diffusion images with 1.25 mm isotropic voxel size acquired at 3 b-shells (1,000, 2,000, and 3,000 s/mm 2 ) with 90 directions per shell acquired twice with opposite phase encoding directions , and MSMSulc and MSMAll (Glasser et al., 2016) registered template surfaces. The HCP also provided two runs of unprocessed task fMRI volumes for seven task domains (Barch et al., 2013): hand, foot, and tongue movements (MOTOR), auditory language processing (LANGUAGE), n-back working memory (WM), shape and texture matching (RELATIONAL), emotionprocessing (EMOTION), social interactions (SOCIAL), and incentive processing (GAMBLING).

Subject Selection
Of the 1206 HCP subjects, 591 were selected for the training group, 133 for the validation group, and 122 unrelated subjects for the test group. The remaining 360 subjects were excluded for having incomplete data, failing data processing, or being related to the test group. In order to prevent information from one of the groups from biasing the results of another group, all groups were selected so that no subjects from one group were related to any of the subjects in another group. Additionally, 39 subjects that were scanned twice by the HCP with a mean interval of 140 days were used to evaluate the test-retest reliability of the predicted and actual maps. These test-retest subjects were either a part of the test group or related to someone in the test group, and none of them were a part of the training or validation groups.

MRI Processing
The fMRI volume preprocessing was performed according to the HCP processing pipelines, including gradient distortion correction (Glasser et al., 2013), with the exception that the onestep resampling linearly transformed the volumes into the native T1w space rather than the MNI space. Individual-level fMRI zscore activation volumes were then computed across both runs for each task domain using the HCP pipelines in the subject's native volume space with high-pass bandwidth filtering and minimal 2 mm full-width at half maximum (FWHM) smoothing (Woolrich et al., 2001). All 47 unique task activation volumes across the seven task domains were used for model training and testing (Supplementary Table S1). The preprocessed T1w, T2w, and diffusion data were used as distributed by the HCP. Diffusion tensor image (DTI) modeling was performed using dipy (Garyfallidis et al., 2014) on the preprocessed diffusion data resulting in mean diffusivity (MD) and 3-directional fractional anisotropy (FA) feature maps computed for each b value separately (1,000, 2,000, and 3,000 s/mm²) in the subject's native space similar to Ganepola (Ganepola et al., 2018).

Model Architecture
For predicting task activation from structural imaging, a U-Net-style CNN was used (Figure 1) (Ronneberger et al., 2015;Myronenko, 2018;Aizenberg, 2020, 2021). The model architecture was inspired by Myronenko (2018) and has been utilized on other projects Aizenberg, 2020, 2021). This model architecture combines learned convolutional layers at multiple resolutions. The initial input images consisted of the aligned and skull-stripped T1w, T2w, and DTI volumes cropped to remove background slices and resampled to a size of 144 × 160 × 144. Each layer consisted of two residual blocks (He et al., 2016) performing group normalization (Wu and He, 2018), rectified linear activation, a 3 × 3 × 3 convolution (Myronenko, 2018). The first encoding layer consisted of 32 channels and contained a 20% channel dropout between the two residual blocks. After each consecutive encoding layer, the images were downsampled by a factor of two using a strided convolution. As is common in U-Net architectures, the following layer doubled the number of channels. The encoder consisted of five layers. Each layer of the decoder mirrored the encoder with two residual blocks per layer. The decoder layers took as inputs the upsampled output of the previous decoder layer concatenated with the output of the encoder layer of the same depth and resolution. A final 1 × 1 × 1 convolution linearly resampled the outputs from 32 channels of the last decoding layer to predict the output volumes for each of the 47 task activation maps. The predicted output volumes were then resampled back into the original T1w space. FIGURE 1 | Model architecture. A U-Net architecture was used with two residual blocks for each encoding and decoding layer. The inputs for each consecutive encoding layer are downsampled using a strided convolution. The decoding layer takes as inputs the outputs from the last encoding layer. Each consecutive decoding layer concatenates the outputs from the encoding layer at the same resolution as well as upsampled outputs from the previous decoding layer. A 1 × 1 × 1 convolution linearly resamples the 32 channel outputs from final decoding layer into 47 volumes, one for each task activation map (Reproduced with permission from Ellis and Aizenberg, 2020).

Model Training
We trained a single model to predict the fMRI task activation volumes across all seven task domains using T1w, T2w, and DTI data from the training set of subjects (Figure 2A). The mean squared error loss between the predicted and actual activation volumes was used to iteratively train the model along with Adam optimization (Kingma and Ba, 2014). The loss function was weighted so that each of the seven domains, rather than the 47 activation volumes, had an equal influence on the model training. In order to augment the training data, the scale of the input and output images was randomly altered, and random white noise was added to the input images. The learning rate was decayed after the validation loss had not improved for 20 epochs, and model training was stopped after the validation loss did not improve for 50 epochs.

Model Testing and Statistical Analysis
After the completion of model training, the model was used to predict the activation maps of 47 different tfMRI maps from seven different task domains. In order to validate the model's ability to detect individual differences in task maps, we imitated the analysis done by Tavor et al. (2016) and compared the correlations between the predicted and actual maps ( Figure 2B). To allow for the comparison between subjects, both the actual and predicted task activation volumes were sampled onto the cortical template surfaces in T1w space using the connectome workbench (https://humanconnectome. org/software/connectome-workbench) under a cortical ribbon constrained sampling method. The cortical template surfaces are distributed by the HCP with indices that have been aligned between subjects according to sulcal patterns (MSMSulc) using surface matching registration . The predicted task activation maps of a given subject were then compared to the actual activation maps for that subject as well as to the activation maps for all of the other test subjects, and a correlation matrix was constructed. A Kolmogorov-Smirnov test between the distribution of the diagonal elements of the correlation matrix and the extra-diagonal elements was performed with the threshold for significance set at α = 0.05.
In order to determine if the inter-subject variation predicted by the model could be accounted for via alignment with structural and functional features beyond that of the sulcal patterns alone, the predicted and actual activation volumes were sampled onto another set of surface templates whose indices had been aligned by the HCP using the MSMAll surface matching technique (Glasser et al., 2016;Coalson et al., 2018). Correlations between the predicted and actual activations as sampled onto the MSMAll surface templates were computed, and a correlation matrix was constructed. A Kolmogorov-Smirnov test was again conducted to test for a difference between the distributions of the diagonal and extra-diagonal elements.
The difference in the distributions was also tested for each of the 47 task activations with the threshold corrected for multiple comparisons: α = 0.05 47 = 0.001. The lateralization indices (the difference in activation between the left hemisphere and right hemisphere averaged from the surface vertices within a 10 mm radius of the peak location for the predicted map) as described by Tavor et al. (2016) for the predicted maps were also FIGURE 2 | (A) Training of the convolutional neural network. The model takes as inputs the structural imaging for a given subject which includes the T1w and T2w weighted imaging as well as mean diffusivity (MD) and fractional anisotropy (FA) for b-values 1,000, 2,000, and 3,000 s/mm 2 . The model is trained to predict the task activation volumes for 47 different task maps across 7 different task domains (visualized on the cortical surface in the figure). The mean squared error loss (L MSE ) between the predicted and actual activations is used to iteratively train the model. (B) In order to validate the model's ability to predict individual activation maps, the predicted maps of a given subject m are compared to the actual activation maps for that subject as well as the activation maps for all of the 122 subjects. From these comparisons, the correlation matrix shown on the right is constructed.
compared to the lateralization indices for the actual maps and tested for the contrasts from the LANGUAGE, SOCIAL, and WM task domains with a correlation analysis corrected for multiple comparisons (α = 0.05 20 = 0.002). An ablation study was performed to assess the amount of information gained by the selected input features. In this study, in addition to the CNN model trained as described previously, six additional models were trained in an identical fashion but with different sets of input feature maps: (1) DTI features along with the T1w and T2w imaging, (2) T1w and T2w imaging alone, (3) T1w imaging alone, (4) a binary mask of the cortex, (5) a binary mask of the subcortical matter, and (6) a binary mask of the whole brain. In order to observe the effects of alignment techniques, the ablation study was evaluated using the MSMSulc and MSMAll template surfaces, as well as with the predicted and activation volumes aligned via non-linear warping into MNI space.
In order to compare performance of the CNN to other techniques, two linear models were trained using least squares multiple linear regression on the feature data sampled to the MSMSulc and MSMAll template surfaces respectively. In addition to the imaging data, the feature data for the linear models also incorporated curvature, myelination, sulcus, and cortical thickness data. The linear models were trained such that each of the 59,412 cortical vertices were fitted to their own regression model weights for each of the 47 output task activation maps. Additionally, a CNN model trained on permuted inputs and output pairings was trained as a metric for baseline performance.
To ascertain if the microstructural features from the diffusion imaging were capable of predicting individual subject differences in task activation beyond that accounted for by anatomical differences, a partial correlation was computed with the predictions from the T1w+T2w model used as the covariate.
Additionally, a test-retest reliability analysis was performed on the actual maps and predictions by calculating the intraclass correlation (ICC) using the ICC (3,1) mixed-effects model (Shrout and Fleiss, 1979).

RESULTS
The CNN was able to use structural imaging to predict activation patterns of individual subjects that matched the group average as well as activations that deviated from the group average (Figure 3). The predicted maps matched their actual maps better than the maps of other subjects, as seen by the diagonal dominance correlation matrix ( Figure 4A). Row and column normalization to remove the mean and to account for the higher variability in the actual maps made the diagonal of the correlation even more pronounced (Figure 4B). A Kolmogorov-Smirnov test between the distribution of the diagonal elements of the correlation matrix and the extra-diagonal elements (Figure 4C) resulted in a highly significant difference (p < 0.001), indicating that the predicted task maps are significantly more correlated to their actual maps than to the maps of other subjects. Likewise, the distributions of all of the individual task maps passed the test score corrected for multiple comparisons of (p < 0.001) except for the GAMBLING Punish-Reward task map (p = 0.23) and the MOTOR LF-AVG (left foot minus average, p = 0.009) task map (Supplementary Figures S1-S3). For all task maps, the average correlation to its actual map was greater than the average correlation to the actual maps of the other subjects, showing that, on average, the predicted maps matched their actual maps more than those of others (Supplementary Figure S4). Further, when the predictions from the T1+T2 model were used as covariates for the partial correlations, the diagonal elements were still more correlated than the extra-diagonal elements (p < 0.001) (Supplementary Figure S5). Even with using the MSMAll registered template surfaces, the Kolmogorov-Smirnov test between the distribution of the diagonal elements of the correlation matrix and the extradiagonal elements resulted in a highly significant difference (p < 0.001) (Supplementary Figure S6).
The model was also able to predict the individual differences in lateralization of task function. The predicted lateralization indices were significantly correlated (p < 0.008) with the actual lateralization indices for 20 out of the 25 LANGUAGE, SOCIAL, and WM contrasts examined with the WM 0BK_BODY, WM 2BK-0BK, WM BODY-AVG, WM PLACE-AVG, and SOCIAL RANDOM-TOM task being the exceptions (Supplementary Table S2). The linear regression of the lateralization indices between the predicted and actual maps for the exemplary tasks is shown in Figure 5.
The ablation study shows the correlation of the model using all of the proposed structural data plotted against models trained using T1w with T2w data and T1w data only, as well as models trained on binary mask images of either the cortex, subcortical matter, or the whole brain (Figure 6). Notably, all models were able to predict individual subject variation. This indicates that the information contained in the additional imaging features improved the model's prediction. The model trained on the mask of the brain resulted in the least increase in correlation over the average.
The test-retest reliability of the predicted maps as measured by ICC was greater for all domains than the actual maps (Supplementary Figure S7), and the ICC of all predicted maps on average was 0.95 compared to 0.61 for the actual maps.

DISCUSSION
We demonstrated that structural imaging contains information that is predictive of inter-subject variations in task activations using a CNN. By contrast, Tavor et al. did not find structural imaging features to be predictive of variations in task activations when using a simple linear regression model (Tavor et al., 2016). CNNs, however, are able to extract rich and complex features that give them an immense advantage, with enough data, over linear regression. Therefore, the CNN was able to extract features from the imaging that predicted task activations over a wide array of domains that correlated well to the actual tfMRI activations. Previous research has not reported such a broad correlation between structural imaging and individuallevel variations in functional activation, although some research has reported the connectivity of reconstructed white matter tracts to be predictive of visual tasks (Saygin et al., 2016). In contrast, our model was able to predict variations in task activation in 47 different activation maps across seven different task domains (motor, language, working memory, relational, emotion, social, and gambling). We demonstrated that our model was able to predict deviations from the group average activation (Figure 3) as well as variations in lateralization (Figure 5). Even when using the T1+T2 predictions as covariates, the model could predict differences in task activation between subjects. This indicates that the microstructural diffusion features themselves are predictive of variations FIGURE 3 | Example comparisons between group average, actual, and predicted maps for illustrative subjects with task maps across several domains. The actual and predicted Subject A maps demonstrate the model's ability to predict accurate activation maps that resemble the group average while the Subject B maps demonstrate the model's ability to predict deviations from the group average. Thresholds for the maps were determined using the medians of the positive and negative gamma distributions from a Gaussian and 2-gamma mixture model. The black circles highlight variations from the group average that the model was able to predict correctly. Subjects A and B are different subjects for each of the various task maps.

FIGURE 4 | (A)
Overall correlation matrix for all tasks. The predicted maps (y-axis) were compared to the actual maps (x-axis) for all of the subjects. The visible diagonal indicates that the predicted maps were more correlated with their own actual maps than the maps of other subjects. (B) Row and column normalized correlation matrix to remove mean correlation. (C) Distribution of the diagonal elements of the (un-normalized) correlation matrix in orange and the extra-diagonal elements in blue visualized using a kernel density estimation and overlapping normalized histogram. A Kolmogorov-Smirnov test between the two distributions gives a highly significant difference, p < 0.001, indicating that the predicted task maps are more correlated to their actual maps than the maps of other subjects.
FIGURE 5 | Linear regression fit with 95% confidence intervals between the predicted and actual lateralization indices for (A) the LANGUAGE MATH, (B) SOCIAL RANDOM, and (C) WM FACE activation maps. The lateralization index is defined as the difference between the left hemisphere and right hemisphere at the peak location for the predicted map. All three plots show that the predicted lateralization indices are positively correlated with the with the actual lateralization indices indicating that the predicted maps are able to capture some of the variation of lateralization seen in the actual maps.
in individual task activation beyond that predicted by the anatomical imaging (Supplementary Figure S5). Furthermore, when alignment between subjects was performed using MSMAll, which takes into account functional mapping features, our model was still able to predict variation in the task activation of individual subjects (Supplementary Figure S6). Therefore, it is not likely that the variation captured by the predictions is solely the result of the functional misalignment between subjects.
The ablation study showed that adding DTI along with T1w and T2w anatomical imaging resulted in better predictions than that of any other model (Figure 6). Surprisingly, the ablation study also showed that the shape of either the sub-cortex alone or the cortex alone was sufficient to predict individual-level variation using a CNN. However, when the model was only able to see the shape of the brain mask, the model was not as accurate. The information present in the shape of the subcortex but not in the shape of the brain is the structure and location of the gyri and sulci, also known as the cortical folding patterns. The model was able to use the shape of the sub-cortex to infer the cortical folding patterns and then use this information to predict the task activation patterns throughout the cortex. Without these patterns, the model is unable to predict variations between individual subjects to the same extent as delineated by the poorer performance of the model using only the brain mask. This finding, along with previous research showing that cortical folding patterns are unique to the individual (Duan et al., 2020), influenced by the tension from brain connections (Van Essen, 2020), and correlated to behavior (Whittle et al., 2009) as well as neuropsychological impairments (Shaw et al., 2012), indicates that folding patterns may be an integral part to how brain function is derived from brain structure.
Structural imaging is immune to some of the sources of noise that make tfMRI mapping less reliable, such as neurovascular uncoupling, poor task performance, and cardiac rhythm. Therefore, it is not surprising that the predictions were more reliable than the actual tfMRI maps when evaluated in subjects that were scanned twice (Supplementary Figure S7). However, it should be noted that more thorough processing of the fMRI to remove structured noise components could also improve the reliability of the fMRI signal Parkes et al., 2018). More dependable functional activation maps would greatly assist clinicians in cases where task activation mapping is critical for patient care, including neurosurgical operative planning. Accurate task mapping predictions could give neurosurgeons the ability to visualize eloquent areas and avoid potential surgically induced deficits, even without collecting any fMRI data. This would be particularly advantageous in situations where collecting fMRI data is infeasible due to limitations in patient performance or insufficient resources.
While this study demonstrated that deep learning and structural imaging has some predictive power relating to task activations, much more work is needed to create models and methodologies that could provide insight into clinical cases. Any models or methodologies aiming at clinical use need to be thoroughly tested on subjects and data that closely resemble those seen in a clinical setting. For the HCP Young Adult dataset used in the current study, all subjects were scanned on a single scanner, were close to the same age (22-35 years old) and were free from significant psychiatric or neurological illnesses (Van Essen et al., 2012). In contrast, clinical patients have or are suspected of having some neurological or psychiatric illness and are of varied ages. For example, brain tumors can create anatomical distortions that make identifying key functional areas extremely difficult. Furthermore, older patients may have agerelated neurodegeneration distorting anatomy.
Significant differences also exist in clinical imaging acquisition compared to research imaging acquisition. Many of the tasks and contrasts acquired as a part of the HCP dataset are of little or no interest to clinicians (e.g., GAMBLING REWARD, LANGUAGE MATH-STORY, etc.). Conversely, some tasks often used by clinicians were not acquired, such as an object naming task where subjects are asked to think of the name for certain objects. The diffusion imaging from the HCP dataset was acquired at a higher resolution, with more directions, and at more b-shells than what time allows for a typical clinical diffusion scanning sequence. These differences between the HCP vs. clinical realm for imaging and subjects, as well as other limitations, must first be overcome before clinical use becomes feasible. FIGURE 6 | Ablation study. Self (matrix diagonal) vs. other (extra-diagonal elements) correlation in predictions shows the difference between average correlation of the predictions to the actual maps and the average correlation of the predictions to the actual maps of all the other subjects. Ablation study results are shown for the CNN models using the proposed combination of the anatomical imaging and DTI data for each of the 3 b-values (T1+T2+DTI12), the anatomical imaging alone (T1+T2), and the T1w imaging alone (T1) as well as masked images of the cortex (Cortex Mask), subcortical areas (Subcortical Mask), and the whole brain (Brain Mask). Examples of the masked images for a single subject are shown on the right. Results were compared using the MSMSulc and the MSMAll registered template surfaces as well as non-linear registration into MNI volume space. Additionally, results are displayed for linear models trained on the MSMSulc and MSMAll surface-level features (linear) as well as a CNN model trained on permuted input and output pairings (permutation). Compared to the MSMSulc registered template surfaces, the MSMAll registered templates did account for some of the predicted variation, but the predicted maps were still more correlated with their own actual maps than all other actual maps on average for all of the input types and methods. Comparing the predictions to the actual maps non-linearly warped into MNI volume space resulted in a much higher increase correlation between the predictions and actual maps than comparing the maps using the surface templates. The models trained on the masked images were able to predict individual subject variation, but the model using the masked brain images exhibited a marked decrease in performance.

CONCLUSION
Structural imaging, when paired with a CNN, was predictive of inter-subject variations in 47 different task activation maps across seven task domains. These findings suggest that anatomical and microstructural features contain information that is predictive of unique functional brain activations in individuals. Future work could focus on utilizing structural imaging information to predict or enhance functional activation mapping for individuals in a clinical setting and this study represents a first step toward this goal.

DATA AVAILABILITY STATEMENT
The code used for model training and data analysis can be found at https://github.com/ellisdg/fCNN. The imaging data utilized in this project was the Young Adults Human Connectome Project provided by the Connectome Coordinating Facility and can be found online at https://www.humanconnectome.org. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.