Bayesian Optimisation for Neuroimaging Pre-processing in Brain Age Prediction

Neuroimaging-based age predictions using machine learning have been shown to relate to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalisation to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimising resampling parameters using Bayesian optimisation. Using data on N=2001 healthy individuals (aged 16-90 years) we trained support vector machines to i) distinguish between young (<50 years) and old (>50 years) brains and ii) predict chronological age, with accuracy assessed using cross-validation. We also evaluated model generalisability to the Cam-CAN dataset (N=648, aged 18-88 years). Bayesian optimisation was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 96.25% was achieved, with voxel size = 11.5mm3 and smoothing kernel = 2.3mm. For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, with voxel size = 3.73mm3 and smoothing kernel = 3.68mm. This was compared to performance using default values of 1.5mm3 and 4mm respectively, which gave a MAE = 5.48 years, a 7.3% improvement. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimisation framework to the new dataset, out-performing the parameters optimised for the initial training dataset. Our study demonstrates the proof-of-principle that neuroimaging models for brain age prediction can be improved by using Bayesian optimisation to select more appropriate pre-processing parameters. Our results suggest that different parameters are selected and performance improves when optimisation is conducted in specific contexts. This motivates use of optimisation techniques at many different points during the experimental process, which may result in improved statistical sensitivity and reduce opportunities for experimenter-led bias.


Introduction
The ageing process affects the structure and function on the human brain in a characteristic 51 manner that can be measured using neuroimaging. This quantifiable relationship was key to the 52 early demonstration of the proof-of-principle of voxel-based morphometry (Good et al., 2001) 53 and to this day represents one of the most robust known relationships between a measurable 54 phenomenon (i.e., ageing) and brain structure, making it ideal for evaluating novel neuroimaging 55 analysis tools. More recently, researchers have used this relationship to develop neuroimaging-56 based tools for predicting chronological age in healthy people using machine learning (Franke et 57 al., 2010;Cole et al., 2017b). A 'brain-predicted age' determined from magnetic resonance 58 imaging (MRI) scans represents an intuitive summary measure of the natural deterioration 59 associated with the effects of the ageing process on the brain, and may have the potential to serve 60 as biomarker of age-related brain, or even general, health (Cole, 2017). 61 62 The extent to which brain-predicted age is greater than an individual's chronological age has 63 been associated with accentuated age-associated physical and cognitive decline (Cole et al.,64 2017c). Specifically, an 'older'-appearing brain has been associated with decreased fluid 65 intelligence, reduced lung function, weaker grip strength, slower walking speed and an increased 66 likelihood of mortality in older adults (Cole et al., 2017c). Factors which could contribute to an 67 increased brain-predicted age include genetic effects, having sustained a traumatic brain injury, 68 certain neurological or psychiatric conditions, or poor physical health (Koutsouleris et  brain ageing, such as cognitive decline and neurodegenerative disease, could be identified by 72 measuring brain-predicted age in clinical groups or even screening the general population. 73 74 Despite promising results to-date, models for generating brain-predicted age still continue to 75 contain measurement error, and efforts to improve accuracy and particularly, generalisability, to 76 data from different MRI scanners are warranted. Training on large cohorts of healthy adults 77 gives the lowest mean absolute error (MAE) rates are between 4-5 years (Wang and Pham, 2011;78 Steffener et al., 2016). Notably, individual errors range across the population, from perfect 79 prediction, to discrepancies as great as 25 years. While brain-predicted age has high test-retest 80 reliability (Cole et al., 2017b), and a proportion of this variation likely reflects underlying 81 population variability, certainly a substantial amount of noise remains. Reducing noise and 82 improving prediction accuracy and generalisability is essential for if such approaches are to be 83 applied to individuals in a clinical setting, the ultimate goal of any putative health-related 84 biomarker. 85 86 A key issue in brain-age prediction, along with many other neuroimaging approaches, is the  , and ideally should be optimised on a case-by-case basis. This  96  optimisation is rarely conducted, as trial-and-error approaches are time-consuming and often ill-97  posed. Importantly, this issue may reduce experimental precision, which increases the likelihood  98  of false positives and reduces reproducibility. In the worst case scenario, this may encourage p-99  hacking, whereby pre-processing is manually optimised based on minimising the resultant p-100 values of the subsequent hypothesis testing. Here, we outline a principled Bayesian optimisation 101 strategy for identifying optimal values for pre-processing parameters in neuroimaging analysis, 102 implementing sub-sampling to avoid bias. We then demonstrate proof-of-principle applied to the 103 problem of age prediction using machine learning. 104 105 Bayesian optimisation is an efficient and unbiased approach to the parameter selection problem, 106 which avoids both the failure to adequately search the value space, and the drawbacks of an 107 exhaustive search. Instead, it utilizes a guided sampling strategy to observe a subgroup of points 108 from within the possible parameter space, testing values on subsets of the total subject 109 population (Brochu et al., 2010;Snoek et al., 2012). The data division strategy ensures 110 performance tests always reflect out-of-sample prediction, and always evaluate differing 111 conditions on separate data, reducing the risk of overfitting. Intelligent selection of a small 112 number of points for evaluation allows the characterisation of parameter space and the solution 113 of the optimisation problem to be accomplished in fewer steps (Pelikan et al., 2002). 114 115 The current work aimed to use a Bayesian optimisation framework to optimise image pre-116 processing parameters for: i) distinguishing the brains of young and old adults (classification), ii) 117 predicting chronological age (regression), and iii) evaluating the generalisability of the resulting 118 optima to an independent dataset. We hypothesised that by using Bayesian optimisation we 119 would improve model accuracy compared to previously used 'non-optimised' values. The study 120 was designed to show proof-of-principle of the applicability of Bayesian optimisation to help 121 improve neuroimaging pre-processing in a principled and unbiased fashion. 122 123

125
This study used the Brain-Age Healthy Control (BAHC) dataset, compiled from 14 public 126 sources (see Table 1) and used in our previous research (e.g., Cole et al., 2015).

144
Normalized brain volume maps were created following the protocol described in (Cole et al., 145 2015). This involved segmentation of raw T1-weighted images into grey matter maps using 146 SPM12 (University College London, London, UK). Images were normalized to a study-specific 147 template in MNI152 space using DARTEL for non-linear registration (Ashburner, 2007). This 148 step involved resampling to a common voxel size, modulation to retain volumetric information 149 and spatial smoothing; the specific voxel size and smoothing kernel size parameters were chosen 150 by the Bayesian optimisation protocol as detailed below. 151 152 After pre-processing, images were converted to vectors of ASCII-format intensity values. These 153 were used as the input features for subsequent classification or regression analysis. This was 154 performed in MATLAB using the support vector machine (SVM) program. For the binary 155 classification problem of predicting younger from older participants, SVMs were used. For 156 predicting age as a continuous variable, SVM regression (SVR) was used, using participants 157 from the full age range (16-90 years). Both SVM and SVR procedures used a linear kernel to 158 map the input data into a computationally-efficient feature space. 159 160

161
Bayesian optimisation was used to identify optimal pre-processing parameters, based on the 162 accuracy of the subsequent model predictions (either classification or regression). Hence, the 163 Bayesian optimisation procedure can be seen as an additional outer layer of analysis, that 164 surrounds the standard pipeline (pre-processing through to model accuracy evaluation). The 165 Bayesian optimisation process runs multiple iterations of this internal pipeline, exploring the 166 parameter space to select varying image pre-processing options based on their influence on the 167 objective function (i.e., classification or regression accuracy). 168 169 A key advantage of Bayesian optimisation derives from its 'surrogate' model that represents 170 relationship between an algorithm's and the currently unknown objective function. This 171 surrogate model is progressively refined in a closed-loop manner, by automatically selecting 172 points in the parameter space, in order to provide informed coverage of that based, based on the 173 performance of previously sampled points. This aspect makes Bayesian optimisation highly 174 efficient, reducing the number of iterations necessary to identify optima of complex objective 175 functions (Brochu et  Classifying Young and Old Adults 196 We defined the 500 oldest individuals (aged 51 to 90 years) and the 500 youngest (aged 16 to 22 197 years) as the "old" and "young" groups for classification. Each iteration of Bayesian optimisation 198 used a subsample of the total subject set, N = 1000, to test a combination of pre-processing 199 parameter values. Participants were divided into subsets of size n stratified by age, such that each 200 subset was approximately representative distribution of participants from across the age range, 201 resulting in a total of N/n iterations. We used n = 80 total (40 participants from each group) as a 202 sample for each iteration, giving 1000/80 = 12 iterations of Bayesian optimisation. This included 203 a burn-in phase (i.e., preliminary phase of unevaluated samples to initialise the process) of 5 204 initial, randomly-sampled points from within the parameter ranges to begin characterization of 205 the search space, followed by 7 iterations of 'guided' active sampling. In each iteration a voxel-206 size and smoothing kernel size combination was selected and used for resampling during 207 DARTEL normalisation of each subject's images. Normalised images were then converted to 208 feature vectors and a binary classifier was trained and assessed using 10-fold cross-validation. 209 Classifier accuracy was the objective function to be minimised. Bayesian optimisation used the 210 Expected Improvement Plus (EI+) acquisition function, with the default exploration-exploitation 211 ratio of 0.5. 212 213

214
Next, we used Bayesian optimisation to assess regression models of healthy brain ageing that 215 allows accurate prediction of age in new datasets. This was done by first identifying optimal 216 parameters through Bayesian optimisation, then applying them to the full training dataset and 217 comparing the resulting prediction accuracy to that achieved in the current literature.

263
Optimised model performance was an accuracy of 96.25% for correct classification of 264 neuroimaging data as either young or old, at a voxel size of 11.5mm 3 and smoothing kernel size 265 of 2.3mm. The parameter space exploring the expanded range of voxel size and smoothing 266 kernel size values yielded the model shown in Figure 1. The final resulting model was applied to predict ages for the remaining 200 holdouts and 288 achieved a MAE of 5.08 years. This was compared to MAE = 5.48 years when using the un-289 optimised pre-processing values. The absolute error observed in any single subject ranged from 290 0-22.78 years. Figure 3 shows the relationship between predicted age and chronological age for 291 each dataset: the 200 holdout test cases from the BAHC dataset (Fig. 3a). For the BAHC holdout 292 cases, the Pearson's correlation between predicted and true age was r = 0.941, with R 2 = 0.89 293 using optimised pre-processing. Using un-optimised pre-processing parameters: r = 0.927, R 2 = 294 0.86. 295 296 a b 297 Figure 3. Relationship between chronological age and brain-predicted age 298 Chronological age (x-axis) plotted against brain-predicted age (y-axis) when testing the BAHC-trained model 299 on (a) the hold-out N=200 test set from BAHC, and (b) on the full Cam-CAN data.

301
The model generated in the first study of regression was applied to the Cam-CAN dataset in 302 three different ways. 1) The BAHC-trained model was applied to the Cam-CAN data pre-303 processed with the BAHC-informed optimum voxel size and smoothing kernel size values. This 304 achieved a MAE of 6.08 years, r = 0.929, with R 2 = 0.86 (Figure 3b). This was an improvement 305 compared to the performance when using un-optimised values which had a MAE = 6.76 years. 2) 306 The Cam-CAN data was analysed entirely independently; the full Bayesian optimization 307 framework was instead applied to the Cam-CAN data to discover new, Cam-CAN-specific pre-308 processing optima, and a new regression model was trained with 588 participants and tested on 309 60 participants (giving a similar training-testing ratio as used in the BAHC dataset). This

335
Using Bayesian optimisation, we present a conceptual and practical improvement to 336 conventional pipelines for distinguishing young and old brains or predicting age using 337 neuroimaging data. The Bayesian Optimisation-derived optima for voxel size and smoothing 338 kernel size showed moderate improvement in model performance over 'un-optimised' defaults 339 used previously, suggesting that it would be beneficial to incorporate such a process into future 340 'brain-predicted age' research. Our results are important as they suggest that the same pre-341 processing parameters are not optimal for different prediction tasks (i.e., classification vs. 342 regression) or for different datasets (BAHC vs. Cam-CAN). Often, researchers will apply 343 parameters used in one context to another. This may not necessarily be best practice, and our 344 work shows proof-of-principle that Bayesian Optimisation can be used to improve image pre-345 processing in a principled and unbiased fashion. 346 347 Beyond optimising performance, our Bayesian optimisation approach also allows for relative 348 comparison of the relative influence of different parameters. This potentially provides novel 349 information regarding the prediction problem at hand. For example, here we found that varying 350 voxel size had a much greater impact on overall performance than did smoothing kernel size. 351 This was seen in all experiments; the change in performance across the full range of values was 352 much smaller for smoothing kernel size than voxel size, and is clearly seen in the surface plots 353 (Figs. 1, 2, 4). This suggests that in future neuroimaging pre-processing pipeline design, there is 354 more to be gained from optimising voxel size, rather than smoothing kernel size. The target 355 voxel size for normalisation is often not considered, though has an important impact on the 356 degree of partial volume effects, number of simultaneous statistical tests undertaken, spatial 357 resolution and subsequent inferences made about anatomical specificity. Our findings suggest 358 that more weight needs to be placed on this important parameter when relating volumetric MRI 359 data to age. 360 361 Importantly, the conclusions regarding specific optimal values are related to the particular 362 application in which they are tested. Within this study, we observed a notable difference in the 363 optimal voxel size for classification (11.5mm 3 ) compared to regression (3.7mm 3 ). Potentially, 364 the more gross distinction between young and old brains benefits from a coarser resolution which 365 increases signal-to-noise ratio, while the more subtle patterns underlying gradual age-associated 366 changes in brain structure requires finer-grained representation. Alternatively, the much larger 367 voxel size identified here could result in better classification by reducing data dimensionality, 368 with this size representing the optimal trade-off between representing the information and 369 simplifying a crowded feature space for more effective classification. Either way, the 370 discrepancy in optimal voxel size between classification and regression reinforces the point that 371 systematic evaluation of parameter specifications should be conducted case-by-case. Our study shows the value of Bayesian optimisation to improve neuroimaging pre-processing for 433 estimate brain-predicted age, a potential biomarker of healthy brain ageing. Future research into 434 brain ageing and other neuroscientific areas could benefit from applying principled optimisation 435 approaches to improve study sensitivity and reduce bias.