TractoSCR: a novel supervised contrastive regression framework for prediction of neurocognitive measures using multi-site harmonized diffusion MRI tractography

Neuroimaging-based prediction of neurocognitive measures is valuable for studying how the brain's structure relates to cognitive function. However, the accuracy of prediction using popular linear regression models is relatively low. We propose a novel deep regression method, namely TractoSCR, that allows full supervision for contrastive learning in regression tasks using diffusion MRI tractography. TractoSCR performs supervised contrastive learning by using the absolute difference between continuous regression labels (i.e., neurocognitive scores) to determine positive and negative pairs. We apply TractoSCR to analyze a large-scale dataset including multi-site harmonized diffusion MRI and neurocognitive data from 8,735 participants in the Adolescent Brain Cognitive Development (ABCD) Study. We extract white matter microstructural measures using a fine parcellation of white matter tractography into fiber clusters. Using these measures, we predict three scores related to domains of higher-order cognition (general cognitive ability, executive function, and learning/memory). To identify important fiber clusters for prediction of these neurocognitive scores, we propose a permutation feature importance method for high-dimensional data. We find that TractoSCR obtains significantly higher accuracy of neurocognitive score prediction compared to other state-of-the-art methods. We find that the most predictive fiber clusters are predominantly located within the superficial white matter and projection tracts, particularly the superficial frontal white matter and striato-frontal connections. Overall, our results demonstrate the utility of contrastive representation learning methods for regression, and in particular for improving neuroimaging-based prediction of higher-order cognitive abilities. Our code will be available at: https://github.com/SlicerDMRI/TractoSCR.

Neuroimaging-based prediction of neurocognitive measures is valuable for studying how the brain's structure relates to cognitive function.However, the accuracy of prediction using popular linear regression models is relatively low.We propose a novel deep regression method, namely TractoSCR, that allows full supervision for contrastive learning in regression tasks using diffusion MRI tractography.TractoSCR performs supervised contrastive learning by using the absolute difference between continuous regression labels (i.e.neurocognitive scores) to determine positive and negative pairs.We apply TractoSCR to analyze a large-scale dataset including multi-site harmonized diffusion MRI and neurocognitive data from 8735 participants in the Adolescent Brain Cognitive Development (ABCD) Study.We extract white matter microstructural measures using a fine parcellation of white matter tractography into fiber clusters.Using these measures, we predict three scores related to domains of higher-order cognition (general cognitive ability, executive function, and learning/memory).To identify important fiber clusters for prediction of these neurocognitive scores, we propose a permutation feature importance method for high-dimensional data.We find that TractoSCR improves the accuracy of neurocognitive score prediction com-

Introduction
The brain's white matter (WM) connections, which can be quantitatively mapped using diffusion MRI (dMRI) tractography (Zhang et al., 2022a), play an important role in brain networks that enable human cognition (Wang et al., 2018;Zekelman et al., 2022).Investigating the predictive relationship between WM microstructure and cognition can therefore improve our understanding of the brain in health and disease.Regression analysis, which can predict values of a dependent variable (label) given a set of input independent variables (features), enables the prediction of neurocognitive measures given input features from neuroimaging.This strategy is recently of high interest (Sripada et al., 2020;Kim et al., 2021;Wu et al., 2022;Radhakrishnan et al., 2022;Feng et al., 2022).While many studies perform prediction using high-dimensional neuroimaging features from functional MRI (fMRI) (Cui and Gong, 2018;Dubois et al., 2018;Sripada et al., 2020;Wu et al., 2022) or multimodal data (Gong et al., 2021(Gong et al., , 2022;;Mansour L et al., 2021;Kim et al., 2021;Sun et al., 2022;Radhakrishnan et al., 2022), a unimodal focus on dMRI tractography (e.g., (Jeong et al., 2021;Feng et al., 2022;Mansour L et al., 2022;Chen et al., 2022b)) can improve our understanding of the role of the WM connections in cognition.While a number of studies have pursued prediction of neurocognitive measures based on information from dMRI tractography (Table 1), current approaches are limited in terms of study cohorts and regression methodology.
Linear regression models such as ElasticNet (Zou and Hastie, 2005) have been widely used for prediction of neurocognitive performance (Cui and Gong, 2018;Jollans et al., 2019;Sripada et al., 2020;Gong et al., 2021; Table 1.Summary of studies for prediction of neurocognitive measures using white matter measures extracted only from dMRI tractography.For each study, study cohorts and regression methodology are reported.
One avenue for improving the prediction of neurocognitive performance metrics is to investigate recent machine learning algorithms for the analysis of tabular (row and column) data (Borisov et al., 2021).Many quantitative features derived from neuroimaging can be represented as tabular data.The most popular machine learning algorithm for tabular data is the gradient boosting decision tree (GBDT) method (Chen and Guestrin, 2016;Prokhorenkova et al., 2018).In recent years, deep-learning-based methods (Yoon et al., 2020;Arik and Pfister, 2021;Gorishniy et al., 2021;Bahri et al., 2022) have been developed for tabular data, which is the last "unconquered castle" for deep learning (Kadra et al., 2021;Borisov et al., 2021).One important research direction for deep learning on tabular data is representation learning, which can discover beneficial data representations for downstream tasks.For example, the value imputation and mask estimation (VIME) (Yoon et al., 2020) and self-supervised contrastive learning using random feature corruption (SCARF) (Bahri et al., 2022) methods enable representation learning on tabular data.However, these representation learning methods were developed for classification tasks, and cannot utilize regression label information during representation learning.
Another avenue for improving prediction of neurocognitive measures is to investigate recently proposed algorithms for contrastive learning (Chen et al., 2020b;Chen and He, 2021;Khosla et al., 2020).In medical image computing, supervised contrastive learning improves classification accuracy by using labels during representation learning (Schiffer et al., 2021;Dufumier et al., 2021;Xue et al., 2022;Seyfioglu et al., 2022).It is usually designed for classification tasks, where samples with the same categorical label are positive pairs, and samples with different categorical labels are negative pairs.During representation learning, embeddings of positive pairs are pulled together, and embeddings of negative pairs are pushed apart.However, regression tasks require continuous labels (e.g.neurocognitive scores) that cannot directly be used for pair determination.Two recent works have shown that contrastive learning can be useful in the context of regression based on medical images as input (Lei et al., 2021;Dai et al., 2022).For example, RPR-Loc proposed a learning strategy to predict the distance between a pair of image patches (Lei et al., 2021).Recently, the AdaCon method used a contrastive learning strategy that leveraged distances between labels (e.g.bone mineral densities) to benefit downstream computer-aided disease assessment.These recent regression methods did not use labels for pair determination for contrastive learning.How to best use label information to enhance regression is still an open question.In this study, we propose a novel deep regression method for tractography analysis with supervised contrastive regression, referred to as TractoSCR.TractoSCR is a novel contrastive representation learning framework to predict measures of neurocognition using white matter microstructure derived from dMRI tractography, as illustrated in Fig. 1.Our proposed TractoSCR method extends the supervised contrastive learning method (Khosla et al., 2020), which is designed for categorical data in classification tasks, to per-form regression analysis where the predicted labels are continuous values.We propose a novel pair-determination strategy that uses the absolute difference between continuous regression labels to determine positive and negative sample pairs for contrastive learning.To our knowledge, this is the first method that leverages deep representation learning techniques for the prediction of neurocognitive performance.Our method uses a tractography fiber clustering method that enables consistent white matter parcellation across populations.The parcellation allows representation of microstructure features from whole brain tractography as tabular data, which enables the use of a recently proposed random feature corruption technique (Bahri et al., 2022) for data augmentation to further improve prediction performance.In addition, for interpreting prediction results, we propose a novel permutation feature importance algorithm to identify tractography fiber clusters and their corresponding anatomical tracts that are important for prediction of neurocognitive measures.We demonstrate our method in a large-scale dMRI dataset including data from 8735 children, where we explore the relationship between white matter microstructure and prediction of neurocognitive performance (including general ability, executive function, and learning/memory).
The remaining structure of this paper is as follows.Section 2 describes the dataset and data processing, the proposed regression and interpretation methods, and the model training and testing details.Section 3 describes the evaluation metric, experimental results, and interpretation of results.Finally, the discussion and conclusion are given in Section 4 and 5, respectively.(Casey et al., 2018;Volkow et al., 2018).Three neurocognitive principal component scores from ABCD were studied, representing three major domains of higher-order cognition, namely General Ability (PC1), Executive Function (PC2) and Learning/Memory (PC3) (Thompson et al., 2019).These component scores are lower dimensional representations of nine assessment measures from the ABCD neurocognitive battery (Luciana et al., 2018) (including seven measures from the NIH toolbox (Casaletto et al., 2015)).These component scores statistically summarize nine neurocognitive assessment measures and reveal latent variables which have been theorized to be a more pure reflection of the cognitive domains of interest (Snyder et al., 2015;Thompson et al., 2019).Furthermore, these component scores have been associated with measures of psychopathological behavior (i.e., stress reactivity and/or externalizing behaviors), perhaps suggesting their clinical utility (Thompson et al., 2019).
A two-tensor Unscented Kalman Filter (UKF) tractography method2 (Malcolm et al., 2010;Reddy and Rathi, 2016) was conducted on harmonized dMRI data of all subjects to obtain whole-brain tractography.The UKF method fitted a mixture model of two tensors to the diffusion data while tracking streamlines.This enabled the estimation of fiber-specific microstructural measures from the first tensor, which models the tract being traced (Reddy and Rathi, 2016).Next, automated parcellation of tractography was performed based on an anatomically curated cluster atlas3 (Zhang et al., 2018), which was provided by the O'Donnell Research Group (ORG).Compared to traditional tractography parcellation based on cortical atlases, this clustering method was shown to be more reproducible and consistent across the lifespan (Zhang et al., 2018(Zhang et al., , 2019)).For each subject, the ORG atlas (Zhang et al., 2018) enabled extraction of 953 expert-curated fiber clusters.These finely parcellated fiber clusters are grouped and categorized into 58 deep white matter tracts including major long range association and projection tracts, commissural tracts, and tracts related to the brainstem and cerebellar connections, as well as 198 short and medium range superficial fiber clusters.We performed tractography quality control and white matter parcellation using open-source WhiteMatterAnalysis (WMA)4 software.Tractography visualization was performed using SlicerDMRI software5 (Norton et al., 2017;Zhang et al., 2020).
For all subjects, cluster-specific microstructural measures of fractional anisotropy (FA), mean diffusivity (MD), and number of streamlines (NoS) were computed.These measures have been previously shown to be associated with neurocognitive scores (Zekelman et al., 2022;Chen et al., 2022c;Madole et al., 2021).Here, FA and MD are measures of fiber-specific tissue microstructure, while NoS is widely used to quantify the connectivity strength (Zhang et al., 2022a).These cluster-specific measures can be considered as tabular data, allowing algorithms from the field of tabular data to be employed.For any empty cluster (due to variability of tractography or the underlying anatomy), each measure was set to zero, as in (He et al., 2022).

Supervised Contrastive Regression
We propose a novel contrastive representation learning method for regression, TractoSCR.Our overall strategy is to use the absolute difference between two continuous regression labels to determine positive and negative pairs for contrastive learning.An overview of the TractoSCR framework is shown in Fig. 2. The regression framework (Fig. 2 (a)) has two phases: contrastive representation learning and fine-tuning.In representation learning, random feature corruption (Fig. 2 (b)) and proposed pair determination (Fig. 2 (c)) are utilized with a supervised contrastive loss.The network trained in representation learning is then fine-tuned to output neurocognitive scores.These steps are described in the following sections.

Random Feature Corruption for Data Augmentation
To avoid potential model overfitting and increase the discriminative ability of the learned global features in contrastive learning, we performed a data augmentation process to create more training samples.We applied the recently proposed random feature corruption technique that was designed specifically for tabular data (Yoon et al., 2020;Bahri et al., 2022) .In brief, in each mini-batch of training with input samples X, we created a corrupted batch copy X.To do so, we chose a proportion of the input cluster-specific measures (features) uniformly at random and replaced each of those measures by a random draw from the corresponding measure dimension of other samples (as shown in Fig. 2 (b)).The ratio of replaced measures to all measures is defined as the corruption rate c.Corrupted samples X retain the same regression labels Y as original samples X.

Positive and Negative Pairs Determination
From the generated augmented data in each training mini-batch, we construct positive and negative sample pairs to enable supervised contrastive learning (SCL).Unlike existing studies (Khosla et al., 2020) using SCL to perform a classification task, where positive and negative pairs are defined based on the class labels, determination of positive and negative sample pairs is not straightforward in regression because the regression labels are continuous values.To handle this, we propose a new strategy that uses the absolute difference between two continuous regression labels to determine pairs (Fig. 2 (c)).Given x i , x j ∈ X with labels y i and y j , if |y i − y j | < θ, x i and x j are defined as positive pairs.Otherwise, x i and x j are considered to be negative pairs.The label difference threshold θ, a threshold on the absolute difference of two regression labels, is the key parameter for positive and negative pair determination.For our dataset with regression labels ranging from approximately -3 to 3, the optimal θ is 0.35 based on experimental results.Note that our TractoSCR method is robust to changes in this threshold (from 0.1 to 0.5) as described in Section 3.2.4.

Supervised Contrastive Loss
After positive and negative pairs are determined using regression labels, the supervised contrastive loss as shown below becomes applicable: where r is the anchor (current) sample, and R is the set of all samples (X and X) in a training batch (r ∈ R); P (r) is the set of samples that are positive pairs with anchor sample r (p ∈ P (r)); A(r) is the set of all samples in R except for anchor sample r (a ∈ A(r) ≡ R\{r}); z r , z p and z a are contrastive features obtained from P roj (•) for samples r, p and a; and τ (temperature) is a tuneable hyperparameter for the contrastive loss.

Contrastive Learning and Fine-tuning
The overall process of contrastive learning and fine-tuning (Fig. 2 (a)) is as follows.In contrastive representation learning, training samples (from X and X) are input into the encoder Enc (•) and projector P roj (•) to get embeddings (Z and Z).The supervised contrastive loss is computed using normalized embeddings (Z and Z), where positive and negative pairs are determined by absolute differences between regression labels Y .After the contrastive representation learning, the parameters of Enc (•) are frozen and the P roj (•) is untouched, as in (Chen et al., 2020b;Khosla et al., 2020;Bahri et al., 2022;Xue et al., 2022).The usage of P roj (•) may retain useful information for downstream regression tasks in Enc (•) (Chen et al., 2020b).A predictor head for regression Reg (•) is added on top of the trained Enc (•).Reg (•) takes the output of Enc (•) as the input and is fine-tuned with MSE loss to obtain the final prediction.

Ensemble Learning
We use ensemble learning (Hastie et al., 2009) to combine prediction results from three predictors that are trained on three microstructural measures (FA, MD, and NoS) independently, as in (He et al., 2022).The ensemble prediction is obtained as the average prediction across the three predictors.Therefore, ensemble learning is beneficial in our application to study the relationship between three microstructural measures and neurocognitive performance metrics.Ensemble learning can also potentially improve the performance of the regression, because different microstructural measures may provide complementary information for prediction of neurocognitive performance.(Note that ensemble learning is used not only for our method but also for all compared methods in experiments.)

Permutation Feature Importance
We propose a permutation feature importance algorithm to assess the contribution of each cluster to the prediction of a neurocognitive score.Our proposed interpretation method is based on the permutation feature importance (Breiman, 2001), which is a popular model-agnostic technique for estimating how important a feature is for a particular model.The traditional permutation feature importance is defined as the decrease in a model score (e.g., prediction accuracy) when a single feature value is randomly shuffled (permuted) across samples.This enables identification of highly important features that have a large effect on the model's prediction accuracy.This traditional permutation feature importance method is not directly applicable to our high-dimensional data because the decrease of prediction accuracy is negligible when only permuting a single feature value.(Our input includes 953 cluster-specific white matter features per subject.)Therefore, we propose a new strategy to permute multiple feature values simultaneously (e.g., a random sample of 10% of features).By repeating this strategy a very large number of times (e.g., 50,000), we can estimate the importance of all high-dimensional input features.

Implementation Details
For model training and performance evaluation, datasets are split into train/validation/test with the rate 70%/10%/20%, and we repeat each experiment 10 times with different train/validation/test splits to report the average performance.Regarding the network structure, as suggested in (Bahri et al., 2022), Enc (•), P roj (•) and Reg (•) all have hidden dimension 256 with the ReLU activation in each layer.Enc (•) has four layers, whereas P roj (•) and Reg (•) both consist of two layers.For training hyperparameters, all deep learning methods are trained with the Adam optimizer with the learning rate 0.001 and use early stopping with patience 3 on the validation loss as in (Bahri et al., 2022).We conduct a grid search for parameter selection with b ∈ {256, 512, 1024, 2048, 4096}, c ∈ {0.3, 0.4, 0.5, 0.6, 0.7}, and τ ∈ {0.5, 1, 5, 10} for our method and all compared representation learning methods.For AdaCon, we also tune the temperature scaling factor (s ∈ {10, 50, 100, 150}) based on their paper and code.Weight ratios of two losses in AdaCon are tuned with the rule that two losses should have similar values (Dai et al., 2022).Then we choose batch size b of 2048, corruption rate c of 0.5, and temperature τ of 1 for our contrastive representation learning.Note that our method is not sensitive to hyperparameter changes and has good performance overall.Results with other parameter settings are presented in Section 3.2.4 to demonstrate the robustness.A typical batch size of 128 is chosen in fine-tuning for all deep learning methods.Experiments are performed with Pytorch [16] (v1.8) on a NVIDIA GeForce RTX 2080 Ti GPU machine.For TractoSCR, each experiment (including training, validating and testing) takes about 30 seconds with 1.67GB GPU memory usage.
For the interpretation of prediction results, we implement our proposed feature permutation algorithm for prediction of three neurocognitive measures (PC1, General Ability; PC2, Executive Function; PC3, Learning/Memory) independently.For each permutation, we shuffle 95 out of 953 feature values across samples in the training dataset.Then we train using TractoSCR.The prediction accuracy is evaluated on the testing dataset, and the decrease of prediction accuracy (compared to the original prediction accuracy) is recorded along with the indices of the 95 shuffled features.For each of the 10 train/validation/test data distributions, we repeat this experiment 50,000 times (50,000 permutations).We obtain final overall importance scores for each feature (cluster) by averaging all recorded decreases of prediction accuracy from all permutations of that feature.Finally, three importance scores are obtained for each cluster, corresponding to the three prediction tasks.

Evaluation Metric
We computed Pearson correlation coefficients (Pearson's r) between the ground truth scores and predicted scores to quantify the prediction accuracy.The Pearson correlation coefficient is widely used for evaluation of cognitive prediction from neuroimaging data (Cui and Gong, 2018;Jollans et al., 2019;Gong et al., 2021;Mansour L et al., 2021;Sripada et al., 2020;Feng et al., 2022;Chen et al., 2022c;Jandric et al., 2022).It measures the linear correlation (normalized cosine similarity) between two sets of data.A higher value of r indicates a better prediction accuracy.We repeated each experiment 10 times with different train/validation/test splits (all methods use the same split).The mean and standard deviation of Pearson correlation coefficients across 10 splits are reported.

Comparison of Representation Learning Methods
We compared our proposed TractoSCR with one classical method (Au-toEncoder (Rumelhart et al., 1986)), two recently proposed methods (VIME (Yoon et al., 2020) and SCARF (Bahri et al., 2022)) for representation learning using tabular data, and one recent contrastive learning method (AdaCon (Dai et al., 2022)) for medical image-based regression.The autoencoder method is widely used for learning efficient representations.Here, the autoencoder has the same input as TractoSCR and the output has the same dimensionality as the input, and the MSE loss is applied.VIME uses a novel pretext task and data augmentation method for representation learning, and SCARF uses contrastive learning with random feature corruption.AdaCon utilizes its proposed contrastive loss together with an MSE loss for training, and for fair comparison to our method, we apply random corruption for data augmentation for AdaCon.In our study, we train these methods using the suggested settings in their papers and released codes.
Table 2 shows that our proposed method outperforms all compared methods on the three prediction tasks.Our method and AdaCon perform better than other representation learning methods.This result demonstrates the effectiveness of utilizing the relationship between regression labels during contrastive learning.Furthermore, compared to AdaCon, the prediction accuracy of our method achieves relative improvements of 2.4%, 2.6% and 6.7% on the prediction of three neurocognitive measures.This illustrates that using regression labels to enable positive and negative pair determination in contrastive learning can improve results on prediction of neurocognitive measures.

Comparison of State-of-the-art Methods for Regression
We also compared our proposed method with two SOTA machine learning methods for regression (ElasticNet (Zou and Hastie, 2005) and GBDT  Chen and Guestrin, 2016;Prokhorenkova et al., 2018)).ElasticNet is popularly used in cognitive prediction (Cui and Gong, 2018;Gong et al., 2021).It performs linear regression with L1 and L2 regularization.We used the implementation in the sklearn package (Pedregosa et al., 2011).GBDT is a strong non-deep competitor for deep learning methods in tabular data (Gorishniy et al., 2021).It iteratively constructs an ensemble of weak decision tree learners through boosting.We selected XGBoost (Chen and Guestrin, 2016), one of the most popular implementations of GBDT, for comparison.Parameters were tuned based on suggestions in (Gorishniy et al., 2021).In addition to the above SOTA methods, we also included a multilayer perceptron (MLP) that has the same network structure as ours for a baseline comparison.As shown in Table 2, MLP (our baseline) outperforms Elastic-Net and is competitive with GBDT.These results illustrate the power of deep learning methods for neurocognitive score prediction.In addition, compared to the MLP baseline, our proposed method obtains relative improvements in prediction accuracy of 3.7%, 15.3%, and 14.4% on all three prediction tasks.This demonstrates the effectiveness of our proposed TractoSCR method.

Comparison of Ablated Versions
An ablation study was conducted with two ablated versions (TractoSCR no-pd-fc and TractoSCR no-fc ) of our proposed approach.TractoSCR no-pd-fc performs contrastive learning without using regression labels for pair determination and without using random feature corruption.TractoSCR no-fc uses regression labels for pair determination but does not perform random feature corruption.As shown in Table 2, the comparison between TractoSCR no-pd-fc and TractoSCR no-fc illustrates a large improvement when using regression labels for pair determination in contrastive learning.In addition, by applying random feature corruption for data augmentation, the performance improves on all tasks.

Experiments under Different Hyperparameter Settings
Fig. 3 shows the accuracy of prediction of three neurocognitive component scores across four important hyperparameters in TractoSCR.Overall, TractoSCR achieves consistently high prediction accuracy (Pearson's r) on all three tasks, which demonstrates TractoSCR is robust to hyperparameter change.Batch sizes and temperatures are important to contrastive learning frameworks in general (Chen et al., 2020b;Khosla et al., 2020).Fig. 3 (i) and (iii) show that TractoSCR obtains similar results when the batch size changes from 256 to 4096 and the temperature changes from 0.5 to 10. Corruption rates control how heavy the data augmentation is in contrastive learning (Bahri et al., 2022;Yoon et al., 2020).A negligible change of the result occurs when corruption rates are varied from 0.3 to 0.7.The label difference threshold θ is the key parameter for positive and negative pair determination in TractoSCR.TractoSCR performs well under different θ thresholds ranging from 0.1 to 0.5.

Interpretation Results
Fig. 4 provides a visualization of the most predictive fiber clusters (defined as the fiber clusters with the top 50 highest importance scores for each prediction task).Together, these fiber clusters may form part of the putative structural networks relating to general cognitive ability (PC1), executive function (PC2), and learning/memory (PC3).The predictive fiber clusters span across all five anatomical tract categories (association, cerebellar, commissural, projection, and superficial tracts) (Zhang et al., 2018) and are found in both the left and right hemispheres.This finding is in line with neurocognitive research demonstrating that higher order cognitive functions, such as the ones presently under investigation, are broadly distributed across the brain (Goddings et al., 2021).When this result is examined in detail, we find that the predictive fiber clusters are predominantly located within the superficial and projection white matter (Table 3).This finding contrasts with the relative plethora of white matter and cognition studies that have focused on the role of the association connections (e.g., language in arcuate fasciculus, memory in the uncinate fasciculus, etc.) (Forkel et al., 2022).Details about the location of all predictive fiber clusters (Fig. 4) within specific tracts (as defined in the anatomically curated ORG atlas (Zhang et al.,   2018)) are provided in Supplementary Table S1.Overall, the most predictive tracts are the superficial frontal white matter and striato-frontal connections, which have the highest number of clusters found to be important across the three prediction tasks.

Discussion
In this study, we proposed a novel deep-learning-based regression method that enables improved prediction accuracy of neurocognitive measures.To our knowledge, we are the first to focus on deep representation learning for neuroimage-based prediction of neurocognitive measures.Unlike commonly used regression methods (Brown et al., 2022), (Feng et al., 2022), (Madole et al., 2021), (Li et al., 2020b), the proposed TractoSCR method allows us to effectively leverage information from regression labels during contrastive learning.A new strategy was proposed to use the absolute difference between two continuous regression labels to determine positive and negative pairs.We also employed random feature corruption, a data augmentation method for tabular data, in contrastive learning.By applying random fea-ture corruption, the performance improved on all prediction tasks (e.g., a relative improvement of 5.5% on PC3).
We showed that our proposed method achieved highly improved prediction performance on a large-scale ABCD dataset compared to existing methods, including SOTA regression methods and representation learning methods.For example, on PC3, our method outperformed the SOTA contrastive learning method (AdaCon) with a relative improvement of 6.7% in Pearson's r, and our method outperformed the baseline method (MLP) with a relative improvement of 14.4% in Pearson's r.We also illustrated that TractoSCR is robust to changes of hyperparameters (batch size b, corruption rate c, temperature τ , and label difference threshold θ).These results demonstrate the utility of contrastive representation learning methods for the neuroimaging-based prediction of higher-order cognitive abilities.In this study, we obtained Pearson's r values ranging from 0.24 to 0.43, indicating a moderate correlation between investigated white matter microstructural measures and neurocognitive scores.Our moderate correlation finding is in general in line with a body of recent work that uses neuroimaging measures to predict cognition (Sripada et al., 2020;Gong et al., 2021;Kim et al., 2021;Feng et al., 2022).
Predicting neurocognitive measures from the ABCD dataset is an interesting but challenging task that has been undertaken using various MRI modalities (Pohl et al., 2019;Sripada et al., 2020;Ooi et al., 2022).For example, T1-weighted MRI was used to predict fluid intelligence scores (Pohl et al., 2019), while a comparison across modalities suggested that information from fMRI could best predict a summary cognition score derived from 36 behavioral scores (Ooi et al., 2022).One recent study by Sripada et al. (Sripada et al., 2020) used resting-state fMRI to predict the same neurocognitive component scores (PC1, PC2, and PC3) that we have investigated in the current study.Their method obtained Pearson's r values of 0.33, 0.09, and 0.15 for the prediction of PC1, PC2, and PC3, respectively (Sripada et al., 2020).These results were based on a smaller dataset (2013 subjects from the first ABCD data release) and are not directly comparable to our results.However, we note that using tractography fiber cluster microstructure features as input and our novel TractoSCR regression framework for prediction, we obtained higher Pearson's r coefficients of 0.42, 0.24, and 0.27 for the prediction of PC1, PC2, and PC3, respectively.Overall, this suggests that fiber cluster measures can potentially provide highly informative features, in combination with TractoSCR that achieves higher prediction accuracy than commonly used linear regression methods.
In our data-driven analysis of imaging and neurocognitive data from 8735 participants of the ABCD study, we found that fiber clusters within the projection and superficial white matter were the most important for predicting neurocognitive scores related to general cognitive ability, executive function, and learning/memory.This result was enabled by the proposed permutation feature importance algorithm for identifying predictive features from highdimensional input.This finding may highlight the need for more investigations of the superficial and projection pathways in the context of cognition.
Potential limitations and future work of the present study are as follows.First, in the present study, we explored the relationships between neurocognitive scores and fiber cluster microstructural measures from a single imaging modality, dMRI.Future work may investigate TractoSCR for predicting neurocognitive scores based on features from multiple MRI modalities.Second, we focused on prediction of neurocognitive scores in healthy children.Future work may investigate the proposed TractoSCR framework to predict cognition in the context of aging or disease (e.g., Alzheimer's Disease (Fisher et al., 2019)).Third, we employed a relatively simple MLP network.Future developments can include the incorporation of more advanced deep learning networks (e.g., transformer (Vaswani et al., 2017)) and recently proposed regression losses (Engilberge et al., 2019;Li et al., 2020a;Chen et al., 2022a).Finally, our results demonstrate the utility of contrastive representation learning for neuroimaging-based prediction of cognition.However, our proposed TractoSCR and permutation feature importance methods can be applied to other regression tasks, and assessment of their performance is left for future work.

Conclusion
In this work, we have proposed TractoSCR, a simple yet effective contrastive representation learning method for regression.We applied our Trac-toSCR method on multi-site harmonized dMRI tractography measures from the large-scale ABCD dataset (8735 participants) to predict neurocognitive scores relating to general cognitive ability, executive function and learning/memory.We compared TractoSCR with several SOTA methods and showed highly improved prediction performance.Overall, we found that fiber clusters within the projection and superficial white matter were the most important for predicting neurocognitive scores.

Fig. 1 .
Fig. 1.Overview of our proposed TractoSCR framework for neurocognitive score prediction using dMRI tractography.Parcellation of tractography into fiber clusters enables the extraction of cluster-specific white matter measures.These measures are represented as tabular data and input to the TractoSCR framework, which outputs a neurocognitive score.FA: fractional anisotropy, MD: mean diffusivity, NoS: number of streamlines.
2.1.ABCD Dataset, Tractography Parcellation, and Microstructural Measures This study includes dMRI data and neurocognitive component scores from the Adolescent Brain Cognitive Development (ABCD) dataset 1 for 8735 American children (4560 males and 4175 females) between the ages of 9-11 (9.9±0.6)across 21 data collection sites

Fig. 2 .
Fig. 2. TractoSCR framework: (a) overview of contrastive representation learning and fine-tuning, (b) random feature corruption for data augmentation with a measure of interest (e.g., FA) (rows are randomly selected samples, and columns are cluster-specific microstructural measures), (c) positive and negative pairs determination with regression labels (e.g., PC1).

Fig. 4 .
Fig. 4. Visual presentation of most predictive fiber clusters (with the 50 highest importance scores) for each individual prediction task.Different fiber clusters are depicted in different colors and organized according to five anatomical tract categories.

Table 2 .
Comparison results (mean and standard deviation of Pearson's r across splits) for prediction of three neurocognitive component scores, PC1 (General Ability), PC2 (Executive Function), and PC3 (Learning/Memory)

Table 3 .
Number of predictive fiber clusters within each anatomical category.Categories with the highest number of predictive clusters are in bold.