Objective Bayesian fMRI analysis—a pilot study in different clinical environments

Functional MRI (fMRI) used for neurosurgical planning delineates functionally eloquent brain areas by time-series analysis of task-induced BOLD signal changes. Commonly used frequentist statistics protect against false positive results based on a p-value threshold. In surgical planning, false negative results are equally if not more harmful, potentially masking true brain activity leading to erroneous resection of eloquent regions. Bayesian statistics provides an alternative framework, categorizing areas as activated, deactivated, non-activated or with low statistical confidence. This approach has not yet found wide clinical application partly due to the lack of a method to objectively define an effect size threshold. We implemented a Bayesian analysis framework for neurosurgical planning fMRI. It entails an automated effect-size threshold selection method for posterior probability maps accounting for inter-individual BOLD response differences, which was calibrated based on the frequentist results maps thresholded by two clinical experts. We compared Bayesian and frequentist analysis of passive-motor fMRI data from 10 healthy volunteers measured on a pre-operative 3T and an intra-operative 1.5T MRI scanner. As a clinical case study, we tested passive motor task activation in a brain tumor patient at 3T under clinical conditions. With our novel effect size threshold method, the Bayesian analysis revealed regions of all four categories in the 3T data. Activated region foci and extent were consistent with the frequentist analysis results. In the lower signal-to-noise ratio 1.5T intra-operative scanner data, Bayesian analysis provided improved brain-activation detection sensitivity compared with the frequentist analysis, albeit the spatial extents of the activations were smaller than at 3T. Bayesian analysis of fMRI data using operator-independent effect size threshold selection may improve the sensitivity and certainty of information available to guide neurosurgery.

Functional MRI (fMRI) used for neurosurgical planning delineates functionally eloquent brain areas by time-series analysis of task-induced BOLD signal changes. Commonly used frequentist statistics protect against false positive results based on a p-value threshold. In surgical planning, false negative results are equally if not more harmful, potentially masking true brain activity leading to erroneous resection of eloquent regions. Bayesian statistics provides an alternative framework, categorizing areas as activated, deactivated, non-activated or with low statistical confidence. This approach has not yet found wide clinical application partly due to the lack of a method to objectively define an effect size threshold. We implemented a Bayesian analysis framework for neurosurgical planning fMRI. It entails an automated effect-size threshold selection method for posterior probability maps accounting for inter-individual BOLD response differences, which was calibrated based on the frequentist results maps thresholded by two clinical experts. We compared Bayesian and frequentist analysis of passive-motor fMRI data from 10 healthy volunteers measured on a pre-operative 3T and an intra-operative 1.5T MRI scanner. As a clinical case study, we tested passive motor task activation in a brain tumor patient at 3T under clinical conditions. With our novel effect size threshold method, the Bayesian analysis revealed regions of all four categories in the 3T data. Activated region foci and extent were consistent with the frequentist analysis results. In the lower signal-to-noise ratio 1.5T intra-operative scanner data, Bayesian analysis provided improved brain-activation detection sensitivity compared with the frequentist analysis, albeit the spatial extents of the activations were smaller than at 3T. Bayesian analysis of fMRI data using operator-independent effect size threshold selection may improve the sensitivity and certainty of information available to guide neurosurgery.

Introduction
Magnetic resonance imaging (MRI) is today commonly used in planning neurosurgical treatment, offering exquisite soft tissue contrast and geometric accuracy. A neurosurgical intervention, e.g., a brain tumor resection, has two principal aims: a maximal resection of the pathology with minimal damage to functionally important proximal brain structures. Modern MRI methods can support both goals. Standard structural MRI provides anatomical information about the tumor and surrounding brain tissue (Hall and Truwit, 2008;Wengenroth et al., 2011). Advanced MRI techniques, such as functional MRI (fMRI), yield otherwise unavailable spatial and functional localization of eloquent brain areas potentially invaluable in the neurosurgical planning stage . Structural MRIs are routinely used for neuro-navigation during surgery.
Brain activity related to stimuli or tasks is localized by exploiting the blood oxygenation level dependent effect (BOLD effect). A per-voxel time-series statistical analysis estimates the temporal BOLD response amplitude identifying significantly task-related brain areas, the results being displayed as a statistical parametric map (SPM). Several methods have been established for this analysis. The most commonly used is the frequentist statistical approach based on a t-statistic (Friston, 2007), applying frequentist statistics to a general linear model (GLM) (Friston et al., 1994) describing the experimental paradigm. This approach rejects a given null hypothesis H0 (usually H0 = no activation present) at a pre-chosen level of significance (α value), which determines the level of false positives (Type I error) under a valid H0. In other words, the α-value determines the probability of labeling a region as active even though it is not. However, the frequentist statistic does not control directly for false negatives (Type II errors), i.e., that H0 is not rejected even though a region is in fact active. That is, the user is provided with no information regarding the probability of overlooking true brain activity. Despite this limitation frequentist statistics is widely used for analysis of clinical fMRI.
False positive and false negative results are a potential risk for the patient and must be controlled in planning surgical intervention (Bartsch et al., 2006;Haller and Bartsch, 2009). While false positive results compromise the extent of tumor resection if they obscure tumor tissue (Gorgolewski et al., 2012), false negative results may precipitate an overly aggressive resection by obscuring eloquent brain tissue (Haller and Bartsch, 2009). Two recent publications proposed solutions for this problem. Johnson et al. (2012) proposed a computationally demanding Bayesian statistics approach comprising of a mixed Gaussian model with different weighting of false negative and false positive findings via a loss function, demonstrated on a single tumor case. Durnez et al. (2013) proposed an advanced definition of the p-value based using frequentist statistics and Abbreviations: CSB, cluster size in the Bayesian analysis; CSF, cluster size in the frequentist analysis; γ, effect size threshold; γ loc , minimal BOLD response effect size for activation loci; γ ex , minimal BOLD response effect size for activation extent; LBT, Log Bayes factor thresholdl; t loc , observer selected t-threshold for activation loci; t ex , observer selected t-threshold for activation extent; J, Jaccard index. independent component analysis, demonstrated in a group of five patients.
In this context Bayesian statistics have advantages over the frequentist approach, as this method does not suffer from multiple comparison limitations, and is not limited to controlling for false positives only. Bayesian statistics estimate the posterior probability of the effect given the fMRI data (Friston et al., 2002a,b). This probability is expected to be high in taskrelated brain regions and low in both brain areas not mediating performance of the task and in voxels suffering from poor data quality. Therefore Bayesian analysis can not only test for BOLD signal increases and decreases, but also detect non-responsive areas, and separate these from areas with unclear activation status which therefore provide statistical results with low-confidence. These results are encoded in posterior probability maps (PPMs) (Friston and Penny, 2003). This Bayesian approach is therefore particularly attractive for neurosurgical planning since it reveals explicitly both activated and non-activated regions. Moreover, the Bayesian approach could be particularly relevant for intraoperatively and post-surgically acquired fMRI data, as, during surgery both the functional localization and BOLD response could change due to tissue resection effects, e.g., active areas become non-activated or vice versa (Duffau, 2001). Such changes are not detectable via common frequentist analysis. Despite the advantages of Bayesian statistics for neurosurgery, clinical fMRI currently remains commonly analyzed using the frequentist techniques, since Bayesian methods are less widely known and practical implementation is computationally demanding.
A key benefit of Bayesian approaches is that they can automatically adapt to the spatial scale of signal in the data. One framework for doing this is to describe spatial dependencies using Markov Random Field priors (Gössl et al., 2001;Woolrich et al., 2004a). The original implementation of these algorithms is particularly computationally intensive, but this has since been ameliorated by the adoption of Variational Bayesian approaches (Celeux et al., 2003;Penny et al., 2005;Woolrich and Behrens, 2006). These approaches were generalized further using spatial Gaussian priors with the ability to model spatial non-stationaries (Harrison et al., 2007) and combine spatial and non-spatial priors (Groves et al., 2009). Alternative spatial models such as Bayesian wavelets (Flandin and Penny, 2007) and mixture models representing active and non-active voxels (Everitt and Bullmore, 1999;Hartvig and Jensen, 2000;Woolrich et al., 2005;Woolrich and Behrens, 2006) have also been proposed. Other authors proposed spatial non-stationarities not relying on Gaussian Markov Random Field priors permitting both nonstationary spatial and regressor-specific regularization (Vincent et al., 2010;Risser et al., 2011).
All Bayesian models provide posterior probability maps, which require thresholds to mask non-relevant brain activation and highlight significant activated regions. One of these thresholds, the "effect size threshold" (γ), is a potential obstacle to the practical use of Bayesian methods since it requires the definition of a minimal BOLD response effect size. In the presurgical environment this is the minimal BOLD response amplitude considered clinically meaningful. Obtaining an optimal and objective choice of this threshold is challenging, since the amplitude of the minimally expected BOLD response in any given situation is usually unknown and a subject-specific value. Furthermore, it may depend on several factors such as the paradigm, subject performance, physiological artifacts, and post processing steps. Thus, it is difficult to prescribe an a priori value. So far, no systematic method for determining a reasonable minimal effect size from a data set has been reported, and BOLD response amplitudes are rarely available in the literature: typical effect sizes for different paradigms, brain areas and populations including patients are not routinely reported. This may be viewed as a generic weakness of current brain imaging approaches, as effect sizes are routinely reported and discussed in many other scientific disciplines.
The aim of our study was to implement a complete Bayesian analysis framework for fMRI neurosurgical planning within a pipeline to allow automated on-site data processing. This required the development of a method for objective and algorithmic estimation of the effect size threshold, suitable for deployment in clinical applications. We developed and tested our framework in three parts. (1) Development-3T presurgical scanner: We developed the approach in healthy volunteers performing a passive motor paradigm in a standard 3T clinical MRI system. To calibrate the Bayesian analysis results we performed conventional frequentist inference analysis of the same data as reference. This was done for two surgical planning scenarios to determine the activation loci or the extent of the activated region. We developed a linear model to estimate the Bayesian effect size threshold from the individual volunteer data set. Compared to frequentist analysis, Bayesian fMRI analysis provided identification of activated, deactivated and nonactivated brain regions. (2) Test-intra-operative 1.5T scanner: We measured the same healthy volunteers in a 1.5T intraoperative MRI scanner, to test our approach under technically challenging conditions. Compared with the 3T data, frequentist and Bayesian results provided less comprehensive information regarding the brain response to the task. (3) Clinical scenariobrain tumor patient: We tested the approach in a case study with a patient candidate for tumor resection. Bayesian statistics revealed similar activation patterns compared to the frequentist results and additional information about non-activated areas and areas with low statistical confidence.

Ethics Statement
The study was conducted with institutional research ethics committee approval of The National Hospital for Neurology and Neurosurgery and Institute of Neurology Joint Research Ethics Committee, ref 09/H0716/18. The healthy volunteers and the patient gave written consent to their participation after receiving oral and written information as required and approved by the research ethics committee.

Equipment
Each volunteer was examined in a standard clinical radiology suite with a 3T MAGNETOM Trio TIM system whole-body MRI (Siemens Healthcare, Erlangen, Germany) equipped with the manufacturer's 32-channel RF head coil. The data were processed on a computer with 16 3.4 GHz AMD Opteron (tm) processor cores with 64 GB RAM and Debian Linux "squeeze" version 6.0.7.

FMRI of Passive Hand Motion
A passive motor paradigm with simple cueing was used, chosen for operational simplicity and applicability in motor functionally impaired or anaesthetized patients. An operator standing next to the scanner bed flexed the volunteer's upwards-facing right hand fingers toward the palm and then extended them back. Operators were trained to flex with a frequency of 1-1.5 s. Stimulation consisted of a passive hand motion block of 16 s duration followed by 16 s rest. The stimulation was repeated 10 times. The operator was visually cued to start and stop the movements in synchrony with the acquisitions but did not receive any other prompts such as prompts for the frequency of flexing.

Data Processing
The data sets were individually processed for each volunteer. The analysis pipeline was automated using a custom-written toolbox MRIST (MR Imaging and Spectroscopy Toolbox 1 ), which combines tools mainly from SPM12b 2 , FSL 5.0.1 3 , MATLAB 4 (R2012a, 1984-2012 The Math Works, Inc.) and BASH into a Debian Linux "squeeze" version 6.0.7 5 command line based data analysis pipeline. Pre-processing was performed with SPM12b and MATLAB after DICOM to NIfTI image file format conversion. Functional EP images were corrected for susceptibility-related distortion based on a voxel displacement map (in the phase encoding direction) determined from the B 0 fieldmap. Motion correction also included correction for static susceptibility-related distortions and correction for the interaction between head orientation and B 0 distortion, i.e., dynamic susceptibility-related distortion effects (Andersson et al., 2001;Hutton et al., 2004). Finally, the distortion corrected images were smoothed with an isotropic Gaussian kernel with a full width at half maximum (FWHM) of 4 mm. The smoothed images were used in the next step for both frequentist and Bayesian frameworks, to ensure equal pre-processing steps for both frameworks.
For the reference frequentist and Bayesian statistical analyses, the smoothed data were modeled using a general linear model (GLM) comprising two groups of regressors modeling different aspects of the in vivo situation. The first group modeled the task-related BOLD response. It was constructed by convolving the stimulus function (boxcar function describing the block design) with the canonical hemodynamic response function as implemented in SPM (Friston, 2007). To accommodate for variability in the hemodynamic response a regressor for the temporal derivative of the canonical response (Friston, 2007) was also added. The second group included the six rigid-body movement parameters estimated in the motion correction step to account for potential motion induced artifacts. Two t-contrasts were tested in the frequentist analysis. One tested for a significant positive BOLD response modeled by the task block indicator convolved with the hemodynamic response function (HRF). The second one tested for a negative response.
For the Bayesian analysis 6 additional Bayesian-specific SPM12 analysis parameters were selected: Unweighted Graph-Laplacian (UGL) for signal and noise priors comprising local adaptive smoothing, and an autoregressive model of order two for the model of serial correlation of the noise. The priors described in Penny et al. (2005) allow for the prior spatial smoothness to be different for each regression coefficient. This makes sense, as different experimental effects are likely to exist at different spatial scales. Moreover, because these smoothness parameters are estimated from the data (in an empirical Bayes optimization scheme) the final estimated parameters are in this sense rather insensitive to the prior. However, it is nonetheless assumed that the prior smoothness for each regression coefficient does not vary over space. One can relax this assumption by using a prior based on a Weighted Graph Laplacian (Harrison et al., 2007). This has the advantage of preserving edges in functional activation maps but the disadvantage of increased computational complexity and was not applied here.
In the followingN (x; m, C) denotes a multivariate normal distribution of random variable x having mean m and covariance C. The algorithm fits a GLM to fMRI data according to (1) 6 The mean processing time for the Bayesian analysis was 11 min with a standard deviation of 1:15 min.
where y i is the fMRI time series at voxel i, X the design matrix, β i are the unknown regression coefficients and e i is the error time series. The Bayesian algorithm then estimates a posterior distribution over regression coefficients as described in Penny et al. (2005), where β i is the posterior mean, i is the posterior covariance and Y denotes fMRI data over all voxels. Regression coefficients at a given voxel are (softly) constrained to be similar to those at nearby voxels. The strength of this constraint is determined by a spatial precision parameter that is estimated from the data. Different regression coefficients have different spatial precisions allowing each putative experimental effect to have its own spatial regularity. Contrasts are then used to test for specific effects where a i is the effect size at voxel i, and c is the contrast vector used to test for that effect. This gives rise to a posterior distribution over effect size where The effect sizes reported in this paper are expressed in percentage of signal change (task regressor in SPM) compared to the temporal mean (mean regressor in SPM) in each voxel. Thus an effect size of 1.5 is a 1.5% increase in local activity. We used a contrast to test for positive BOLD responses modeled by the task stimulus function. The probabilities of activation (PPMa), deactivation (PPMd), and non-activation (PPMn), are given by: where Ncdf(γ ; µ, σ ) denotes the cumulative density function for a univariate Gaussian with mean µ, standard deviation σ , evaluated at γ . Graphically illustrated, PPMa corresponds to the shaded area in Figure 1A, PPMd corresponds to the shaded area in 1b and PPMn to the shaded area in 1c. Subfigure 1d illustrates the case with low-confidence where none of PPMa, PPMd, and PPMn lead to significant classification of a voxel. In addition to the specification of an effect size threshold γ , the Bayesian PPM approach (Friston and Penny, 2003) also requires a threshold on the posterior probability itself. Voxels with posterior probability greater than this threshold will appear in the relevant PPM. We denote this threshold as p T . Once the voxel probabilities have been computed they can then be assigned to one of four categories as follows: voxels for which PPMa > P T are classified as activated, voxels for which PPMd > P T are classified as deactivated, voxels for which PPMn > P T are classified as non-activated, and voxels categorized as "low-confidence." If a voxel is "activated" we are confident the effect is positive; if it is "deactivated" we are confident it is negative; and if it is "nonactivated" we are confident it is around zero. If none of the above criteria are met the voxel is assigned to the "low-confidence" category. It is also convenient to express this probability in the form of a log odds ratio threshold or log Bayes Factor Threshold (LBT) where For example, p T = 0.95 corresponds to LBT = 3, and p T = 4.5 × 10 −5 corresponds to LBT = 10. A value of LBT = 10 is commonly used in Bayesian neuroimaging to determine relevant effects (Penny and Ridgway, 2013).

Effect Size Threshold (γ ) for the Bayesian Analysis
In order to estimate the above posterior probability maps, the Bayesian analysis requires a threshold for the effect size threshold γ (Figure 1). We defined γ as a certain percentage of the median BOLD amplitude of the voxels with the top 0.1% positive BOLD response of the whole brain, based on the rationale that maximal and minimal relevant BOLD responses are tightly coupled and that γ is the same throughout the brain. Since it is not known from the literature whether this is a reliable estimate of the minimal γ , or what percentage of the peak BOLD amplitude should be chosen, we tested and cross-calibrated the approach based on clinical best judgment. In current practice the best available clinical fMRI evaluation is based on frequentist statistics and expert judgment (FitzGerald et al., 1997;Rutten et al., 2002). Here, a clinically knowledgeable expert observer determines a p-value or t-value threshold for the SPM taking into account the clinical purpose, paradigm, localization, background noise, amplitude, and extent of activation, in order to achieve an optimally plausible estimate of the activated motor and somatosensory cortex.
In the present study two expert observers (TY, LM) with 19 and 7 years neuroradiological fMRI experience defined, in consensus, for each volunteer, two clinically plausible t-value thresholds α for the positive BOLD response of the primary motor cortex on the SPM t-map from the 3T pre-operative scanner. The first t-value threshold t loc was chosen to most sensitively indicate the activation loci (with the criteria that activations should be limited to only pre-and post-central gyri regions and not extend further) and the second t ex to visualize the extent of the activation area (defined as the maximum extent of plausible activation without apparent activation in the white matter). Both criteria were based on two concepts: first, identifying the primary motor cortex with anatomical landmarks (Yousry et al., 1997); second, using local established experience of (pragmatically) thresholding clinical fMRI for presurgical planning, as no published and generally accepted standard methods exist. For this purpose t-maps up-sampled to 1 mm 3 resolution using a 7th order spline interpolation method in SPM12b were displayed overlaid on an equivalent up-sampled anatomical scan obtained in the same scanning session having the same resolution. The resulting thresholds served in the following analyses as the reference for choosing two effect size thresholds required for the above described Bayesian analysis: activation loci γ loc and the extent γ ex . In the further processing steps only these areas in the motor cortex were considered and other activated areas were neglected, but not classified further, e.g., "not-active." For cross-calibration of the standard frequentist and the Bayesian approaches, we used a custom-written MATLAB script to estimate γ for the corresponding Bayesian PPM resulting in a best match between the SPM calibrated by the clinical expert observers and PPM using the Jaccard index (Jaccard, 1901) as the maximal overlap measure. The calibration was based on a binary reference map of the right hand motor cortex based on the expert observer selected thresholds: A cluster map with connectivity 7 18 was generated from each low resolution SPM, thresholded using one of the observer-determined values t loc , t ex . All clusters except for the right hand motor cortex cluster C Ref were erased manually from the map. For comparison the PPMs were clustered with the same algorithm, but without erasing any clusters. These clustered PPMs were voxel-wise compared with the respective C Ref To estimate the effect size with the best overlap the procedure was repeated with different effect sizes from γ = 0 to γ max where no activation was present (in the individual volunteer) using an upper threshold of LBT = 10 and an effect size increment of γ = 0.01. All Jaccard indices were plotted against γ and smoothed with a box kernel of eight. The γ with the highest Jaccard index was identified as the optimal effect size threshold yielding the best match of the SPM and PPM activation extents. The procedure was performed individually for t loc and t ex . We tested whether γ loc and γ ex could be predicted from the percentage of the top 0.1% positive BOLD responses by robust regression analysis. With the rationale that a region with no brain activation does not show a BOLD effect, and hence has an effect size of zero, an intercept-free bisquare-weighted robust regression analysis (Mohammadi et al., 2013) was implemented in MATLAB. We used the same model parameters for automatically determining the effect size thresholds for the data recorded on the intra-operative scanner since our method is based on the data itself and therefore was presumed generally applicable. For the regression as well as for the later data analysis, the brain was masked with a binary image derived from tissue segmentation of the anatomy to exclude unwanted apparent activation in non-brain regions.

Result Maps
The 3T frequentist result maps (SPMs) were thresholded based upon volunteer-specific t-values determined by the two clinical 7 MATLAB function "bwlabeln" with connectivity 18. experts for the 3T data. The Bayesian results (PPMs) are based on the effect size threshold provided by the linear regression detailed in the previous section, and using LBT = 10. To compare the 1.5T and 3T scanner data all statistical maps were registered and resliced to the AC-PC reoriented intra-operative scanner anatomical reference images (acquired in the second part "test-intra-operative 1.5T scanner") using 7th order spline interpolation to 1 mm 3 . SPMs were plotted for all volunteers with three orthogonal slices intersecting the center of mass of the t loc thresholded t-map from the frequentist analysis of the 3T scanner data located in the right hand motor area. For comparing results, the experienced observer-defined t-thresholds were converted into a familywise error corrected p-value, since this metric is more commonly reported in the literature. The Bayesian analysis labeled voxels as activated for a positive BOLD response, deactivated for a negative BOLD response, and nonactivated for no relevant changes in the BOLD signal. For easier volumetric comparison of the activated region we included the cluster volumes [mm 3 ] for frequentist analysis (CSF) and Bayesian analysis (CSB) in Figures 3, 4.

Quantitative Cluster Analysis
To support the visual assessment of the result maps, a quantitative cluster analysis of the activated motor cluster of the spatially up-sampled Bayesian maps was performed. The results were compared with the up-sampled frequentist maps as reference. We counted true positive (TP), true negative (TN), false positive (FP) and false negative (FN) voxels in the masked brain. We calculated the sensitivity = TP / (TP + FN) and false discovery rate (FDR) = FP / (FP + TP). The FDR (Benjamini and Hochberg, 1995) was chosen in preference to the specificity = TN / (TN + FP) as the large proportion of true negative voxels (TN >> TP) led to a specificity ∼1.

Test-Intra-Operative 1.5T Scanner
For purposes of comparison we measured the same volunteer group in a 1.5T intra-operative scanner with a dedicated surgery head coil. While this system opens up the possibility of intraoperative fMRI (ifMRI), coil and scanner specifications in this case provide further technical challenges, in particular a low signal-to-noise ratio (SNR) profile and nonlinear distortions due to the short gradient system. Currently the scanner is used to obtain pre-and intra-operative standard anatomical imaging to guide neurosurgery and the potential use of intra-operative fMRI has not been explored so far at our institution. Hence this study section was a pilot with the aim of testing our methods under instrumentally challenging, but otherwise physiologically optimal conditions (healthy subjects, no anesthesia, no craniotomy), as a first step to the future implementation of ifMRI.

Equipment
Each volunteer was examined in an intra-operative 1.5T MAGNETOM Espree TIM system whole-body MRI (Siemens Healthcare, Erlangen, Germany) located in a neurosurgical theater. This scanner was equipped with a dedicated 8-channel receive-only surgical head coil (NORAS MRI products GmbH, Hoechberg, Germany). For safety reasons, the head-holder pins of the 8-channel coil, designed for surgical (invasive) head fixation were replaced by a foam pad for these experiments. The scans were performed approximately concurrently with the 3T scans: For five volunteers the time interval between scans was 1 week; for the remaining volunteers the intervals were 0, 2, or 10 weeks.

Data Processing
The data pre-processing was similar to that used for the 3T data. An additional step was included to correct for the image distortions originating from the relative non-linearity of the imaging gradients resulting from engineering limitations of the short and wide intra-operative scanner magnet bore. This problem was exacerbated by the necessity to scan with an off-centered head position (as is routine practice in our iMRI practice) due to restrictions of the intra-operative headclamp set-up: the head (patient orientation: head first supine) was shifted 60 mm from isocentre along the anterior direction (positive y-axis of the scanner) out of the sphere of optimal gradient linearity (d∼120 mm). The non-linearity correction was applied to the functional and the structural NIfTI images as the first processing step, and to the voxel displacement map estimated using the B 0 fieldmap. The correction was implemented in MATLAB using the manufacturer's spherical harmonic description of the imaging gradient field non-linearity and a correction method published in Janke et al. (2004). The underlying image resampling was done with SPM12b using nonlinear three-dimensional image deformations and a 7th order spline-based interpolation.

FMRI Analysis and Result Maps
The smoothing and statistical processing with frequentist and Bayesian modeling were kept identical as for the 3T data (see first part "development-3T presurgical scanner"). The threshold process was adapted to cope with the limited SNR of the surgery head coil. An initial frequentist statistical analysis of the 1.5T functional data revealed that the activation extent was constrained to the center of the motor region in all volunteers. Thus a comparison of our Bayesian model with expert observer thresholds was not possible and a fixed familywise error corrected p-value of p(FWE) = 0.05 was chosen.
In contrast to frequentist statistics we could apply our proposed effect-size model for motor loci and motor extent, as the Bayesian analysis applies additional data-driven smoothing increasing the effective Signal-to-Noise.
We generated the presented high-resolution result maps with the same procedure as in the first part "Development-3T presurgical scanner."

Quantitative Cluster Analysis
To support the visual assessment of the result maps, a quantitative cluster analysis was performed in the same way as described in the part "development-3T presurgical scanner."

Clinical Scenario-Brain Tumor Patient
We tested our Bayesian analysis framework on data obtained from a 58y male patient with an oligodendroglioma in the left inferior precentral gyrus. The patient was suffering from speech disturbance, but had no loss of motor functions and was receiving no medication at the time of the study.

Data Processing, FMRI Analysis, and Result Maps
The data preprocessing and statistical modeling were identical to the first part "development-3T presurgical scanner." We used our effect size threshold model to generate Bayesian result maps for the motor localization (γ loc ) and the motor extent (γ ex ). To compare these results with a frequentist statistics reference, one clinical expert (LM) selected respective thresholds for motor localization (t loc ) and motor extent (t ex ) on the tmap from the frequentist statistics. Based on these references we estimated the best fitting effect size threshold as described in the first part "development-3T presurgical scanner" and compared these effect sizes with γ loc and γ ex estimated by our model.
For the results all fMRI result maps and the FLAIR images were registered to the AC-PC oriented structural data set and up-sampled to 1 mm 3 resolution using 7th order interpolation.

Estimating the Effect Size Threshold
The robust fit revealed a slope of 0.497 for the activation loci effect size γ loc using an intercept-free regression of the median of the top 0.1% positive BOLD response amplitudes against the expertdefined γ as independent variable (Figure 2A). The fit algorithm excluded one outlying point (circle at x = 5.2). All remaining points were weighted between 0.06 and 1.0. For activation extent effect size γ ex , the same robust fit revealed a slope of 0.144. No points were excluded. The weights were between 0.84 and 1.0 ( Figure 2B).

FMRI Result Maps
Results from the 3T scanner revealed activation in the left hemispheric motor and somatosensory cortex corresponding to the right hand in all volunteers in agreement with the literature (Lotze et al., 1999) for frequentist (Figure 3) and Bayesian analysis (Figure 4). Compared with the frequentist analysis the Bayesian analysis showed similar activation patterns (yellow/red) in seven of 10 volunteers using the automatically determined γ loc and similar patterns in all volunteers for γ ex . The somatosensory cortex was conspicuously visible using the γ ex for frequentist (Figure 3, right) and Bayesian analysis (Figure 4, right) and less prominent in the γ loc threshold results (Figure 3, left and  Figure 4 left). Volunteer 9 showed conspicuous deactivation on the hemisphere ipsilateral to the movement and volunteer 10 small deactivated clusters for the γ ex thresholded PPM. For those volunteers the clinical observers chose a low t-threshold leading to a high familywise error-corrected p-value above the accepted value of p = 0.05. Hence both data sets may be considered as outliers.
The non-activated areas (blue) revealed by the Bayesian analysis using γ loc (Figure 4, left) were pronounced and tightly encapsulated the activated areas in most of the volunteers except volunteers 9 and 10. Volunteer 2 showed a weaker and lesspronounced probability of non-activated brain regions. The extent of the non-activated areas based on γ ex were overall reduced (Figure 4, left) compared with the γ loc based results and pronounced only in three volunteers. Non-colored regions in the brain correspond to the low-confidence category (neither activated, deactivated or non-activated).

Quantitative Cluster Analysis
The quantitative cluster analysis reflected the results of visual comparison between frequentist and Bayesian results. Six out of Ten volunteers showed high sensitivity (Figure 5) for revealing the motor loci with the new Bayesian framework. For the remaining four volunteers, a lower sensitivity was observed in two volunteers (three and seven) due to the underestimation of the motor loci cluster size and a data quality problem in volunteers 9 and 10. The sensitivity for revealing the motor extent was higher than the sensitivity for revealing the motor loci in all volunteers. False discovery rates were low for all assessments.

FMRI Result Maps
Frequentist analysis results from the 1.5T intra-operative scanner revealed overall less prominent activation patterns (Figure 6) compared with the 3T data. Only volunteers 1, 4, 5 and 6 showed a somewhat pronounced activation response compared with other 1.5T maps. Most of the 1.5T Bayesian analysis maps (Figure 7) showed more widespread activation patterns than the frequentist maps, but these were still less conspicuous than the equivalent 3T results (Figures 3, 4); only volunteers one, four, and five showed a larger activation area in the frequentist and Bayesian maps. Overall the reduced cluster sizes CSF and CSB at 1.5T corroborated this visual assessment. For the activation  The left side shows the activity maps using the operator selected t-threshold revealing the central motor area location; on the right side the operator selected t-threshold approximating the motor area extent. The maps are labeled as activated=positive BOLD response, deactivated=negative BOLD response. The used familywise error (FWE) threshold is the FWE-value converted from the t-threshold estimated by the operators. The cluster size of the frequentist analysis (CSF) is displayed in number of voxels.
Frontiers in Neuroscience | www.frontiersin.org FIGURE 4 | Bayesian analysis for the 3T pre-operative scanner: Log Bayes factor maps showing the activation pattern and strength expressed by the voxel-wise log Bayes factor. The left side shows the activity maps using the effect size threshold γ loc revealing the central motor area localization; on the right side the effect size threshold γ ex revealing the motor area extent. The maps are labeled as activated=positive BOLD response, deactivated=negative BOLD response and non-activated = no changes in the BOLD contrast, non-colored = low-confidence, i.e., BOLD activation status cannot be determined based on data. The effect size threshold (γ ) calculated with the proposed linear model and the cluster size in voxels extracted from the Bayesian analysis (CSB) are displayed. Results from volunteers 9 and 10 are considered as outliers due to data quality problems.
Frontiers in Neuroscience | www.frontiersin.org The quantitative cluster analysis reveals sensitivity and false discovery rate (FDR) for the 3T Bayesian results. The activated motor clusters using the 3T frequentist result maps as reference.
loci base effect size γ loc a marked reduction in the probability for non-activated areas was observed in all images. Except for volunteers one, four, and six the majority of the voxels could not be classified as activated, deactivated, or not active and were marked as low-confidence. The extents of the non-activated areas based on γ ex were overall further reduced (Figure 6, right) to isolated small clusters in some volunteers. Compared to the 3T data, at 1.5T all frequentist and Bayesian maps provided less comprehensive information regarding the brain response to the task.

Quantitative Cluster Analysis
The quantitative cluster analysis (Figure 8) reflects the visual assessment between frequentist and Bayesian results. Bayesian and frequentist statistics showed generally lower sensitivity compared with the 3T data (Figure 5), but a higher sensitivity for Bayesian analysis compared with the 1.5T frequentist results. The results from volunteers 9 and 10 revealed a similar data quality problem as seen at 3T. False discovery rates were low to moderate.

Results: Tumor Case
Both threshold models for the Bayesian effect size showed prominent activation in the right hand motor cortex for motor localization and motor extent (Figure 9). The activated region was not infiltrated by MRI-visible tumor tissue (hyperintense region), but slightly dislocated posteriorly by the spaceoccupying lesion. Similarly to the healthy subjects' 3T results, the motor location threshold revealed non-activated areas. Deactivated regions were prominent with the motor extent threshold, but not close to the activated areas of the motor cortex. In the frequentist statistic both expert observer thresholds revealed similar activation in the right hand motor cortex and overall pronounced deactivated regions.   The quantitative cluster analysis reveals sensitivity and false discovery rate (FDR) for the 1.5T Bayesian results and frequentist results of the activated motor clusters using the 3T frequentist result maps as reference.

Discussion
We introduced a novel approach for calculating Bayesian statistical maps of task-related BOLD activity in pre-operative fMRI. The Bayesian analysis permits identification of brain areas that are explicitly activated, deactivated, non-activated and areas with low-confidence, unlike conventional frequentist statistical analysis that only identifies activated and deactivated brain areas. We developed a new automatic estimation method for the effect size threshold required for the Bayesian analysis. We tested the approach in a group of healthy, awake volunteers in a 3T pre-operative scanner.
We measured fMRI activation due to a passive movement paradigm. We calibrated and tested for self-consistency the Bayesian analysis by comparison with our local practice based on frequentist statistical inference incorporating expert clinical judgment. Since definition of an effect size threshold is a central requirement of Bayesian analysis, we devised a method for operator-independent determination of a clinically-meaningful effect size threshold based upon the acquired fMRI data for two clinical scenarios: activation loci and activation area extent. The method relates the top 0.1% positive BOLD effect amplitude excursions observed in the data to the clinically meaningful minimal BOLD response amplitudes as judged by expert observers. In this study the maximal and minimal BOLD response amplitudes were found to be related by a simple factor that depended on the clinical objective. If activation loci were targeted, the relative factor was 0.497; if the activation extent was of interest, the factor was 0.144.
We tested the Bayesian analysis approach and automated γ estimation for self-consistency using data from a conventional 3T pre-operative scanner, specifically by comparing frequentist and Bayesian analysis. Comparing Bayesian versus frequentist analysis results, similar activation patterns were found in seven of 10 volunteers for the activation loci and similar motor area extent for all volunteers.
The analysis showed the expected surgically-relevant motor loci, motor area extent and somatosensory brain areas, except for the data of volunteers 9 and 10 that were categorized as outliers. These data showed implausible small effect sizes and activations. The sensitivity was high for 6 out of 10 volunteers for the motor localization and further overall increased for the motor extent. The false discovery rate was generally low.
The choice for frequentist analysis as reference is based on the fact that clinicians are familiar with the commonly used frequentist statistics for fMRI analysis and therefore we resort to this method as we have established in-house expert knowledge. Furthermore, for the healthy subjects scanned, electro-cortical mapping as an alternative independent reference was not available, and the representative patient data was obtained in advance of surgery so electro-cortical mapping or confirmatory surgical outcome data were also unavailable.
The impact of additional smoothing on the quantitative analysis by the interpolation process was negligible because we used a 7th order spline interpolation with high SNR (Thévenaz et al., 2000). The analysis on the upscale maps was chosen since it reflects the surgical workflow in which the fMRI results are presented on a high resolution structural scan.
An alternative Bayesian approach for estimating nonactivated regions for surgical planning has been presented by Johnson et al. (2012). This alternative approach differs from our method in a number of ways. First, the method operates on Z-scores from a previous GLM analysis, rather than by direct Bayesian estimation of the parameters of a GLM as in our approach. Second, the prior distribution is based on a Potts model with Dirichlet process priors, rather than the spatial Gaussian priors in our approach. Third, the method has been applied to data from only a single subject, as compared to the 11 subjects in our study. Furthermore, the (implicit) effect size is estimated by the algorithm and not based on a calibration framework informed by clinical expert knowledge.
Our novel estimation of the minimal effect size is automatic and operator independent. It is calibrated with respect to the current expert-knowledge based clinical best judgment. As a caveat, care must be taken to exclude non-brain regions showing artefactual apparently high activation, e.g., in the orbits and veins, in applying this automated method (Turner, 2002), since these data will bias the effect size model. We overcame this using masks derived from brain tissue segmentation with SPM12b.
We postulate that our model is widely applicable to Bayesian fMRI analysis, although this requires validation in further studies using alternative paradigms, e.g., visual stimulation, active motor tasks or tactile stimulation. These studies will also address the potential circularity in the present study, which used the same data for both calibration and self-consistency test at 3T. In the worst-case scenario, the circularity may indicate validity of an incorrect effect size model, leading to under-or overestimation of the BOLD activation. We note that the effect size model was validated on the independent 1.5T dataset, avoiding FIGURE 9 | Bayesian analysis and frequentist analysis results of the patient data: The Bayesian analysis for the tumor patient data shows the activation pattern and strength expressed by the voxel-wise log Bayes factor (similarly labeled to Figure 4). The activated motor region matches the respective area in the frequentist statistics results.
circularity. We recommend further validation on independent datasets under different conditions and at different field strengths.
We understand further that the use of fMRI and frequentist analysis as the reference is a potential limitation. Future studies may compare the results with the gold standard intrasurgical electrocortical stimulation mapping (ESM).
Since the calibration reflects our workflow for presurgical planning, it might not be the calibration of choice for other sites. Nevertheless, the calibration is based on commonly used frequentist statistics and can be adapted to match different threshold procedures for presurgical fMRI at other sites. As an alternative to calibration, function arterial spin labeling techniques (Raoult et al., 2011;Vincent et al., 2013) or calibrated BOLD (Leontiev and Buxton, 2007) could be considered if available at the site and suitable for the patients.
We explored two strategies for thresholding fMRI maps using the motor extent and the motor loci. While the maps based on the motor extent calibration include small and potentially borderline activated areas, the motor loci threshold maps include mainly the central primary focus of the activation, potentially obscuring lower level activations. Thus the methods are expected to yield different, complementary maps. As a pragmatic solution for clinical interpretation we suggest the possibility of producing combined maps exploiting both thresholds.
The passive movement used here was designed to be readily performed by a trained operator, with no reliance on mechanical and electrical devices, to facilitate clinical and especially intraoperative implementation in anesthetized subjects. It has been shown recently that passive motion is equally reliable as an active finger task . However, a potential drawback of this method is that operator-dependent variations of passive stimulation frequency and amplitude could affect the neuronal and BOLD responses and hence alter the activation pattern. It is possible that more consistent results may be achieved using a metronome temporal cue for the operator action or simple mechanical devices enforcing consistent guided motion of the fingers. It is, however, possible fully device-guided finger motion may also lead to decreased BOLD activation, as recently shown for certain tactile stimuli (van der Zwaag et al., 2013).
The Bayesian analysis of the 1.5T intra-operative scanner data showed activation patterns that were less clear but similarly located compared with the 3T data. In contrast, frequentist analysis revealed almost no active regions for the 1.5T scanner data. The 1.5T intra-operative scanner results showed generally lower significance and less certain localization information than the 3T results.
The quantitative cluster analysis revealed an improved sensitivity for the 1.5T Bayesian analysis results compared with the 1.5T frequentist results. This could be explained by the improved spatial SNR of the data due to the additional local smoothing of the Bayesian algorithm. The higher FDR rates are prominent in subjects with small activated clusters, increasing the impact of FP in the denominator.
While the benefits of our proposed fMRI analysis methods for pre-operative guidance are conceptually straightforward, advanced use for intra-operative guidance is more contentious. Despite our results showing that it is possible to acquire fMRI with the surgery head coil, the benefit of ifMRI as a tool for surgical re-planning during a craniotomy remains to be determined. The benefit and a potential improved patient outcome needs to be assessed and compared with alternative methods such as electrocortical stimulation mapping (Berger and Rostomily, 1997). Future work at our center and elsewhere will explore ifMRI as an additional data source to our routinely performed intra-operative structural imaging to update neuronavigation as procedures progress.
We also note that the direct use of the 1.5T system PPMs (from ifMRI) without the PPMs from the 3T pre-operative scanner may be problematic, since the activity status of large brain regions could not be classified on the intra-operative scanner data. Hence, where possible it may be helpful to use pre-and intra-operative fMRI results in combination.
Several issues remain to be addressed before initial application of our method for ifMRI during neurosurgery, including the potential effects of anesthetic level on the BOLD signal, and other factors that may affect BOLD signal detection during craniotomy and dural incision, e.g., brain pulsation, air cavities, and blood coagulation (Gasser et al., 2005). In our study the effect-size model was developed in healthy awake volunteers; across the spectrum of neurological disease the effect size model may be inaccurate and require further validation. For example, the model could not account for hypothetical pathological nonlinear BOLD response changes, i.e., a disease condition affecting only the most highly activated voxels rather than causing a global scaling of the BOLD response. Thus pathology in the highly activated voxels would bias estimation of the clinicallymeaningful minimal effect-size threshold yielding inaccurate brain activity maps.
The 8-channel intra-surgery coil used here provides sufficient quality for structural images and is successfully used for routine iMRI neuro-navigation. However an improved-sensitivity coil design may be beneficial, since the multi-channel receive coil elements of our existing unit cover only the anterior and posterior cranial regions, with no coil elements located adjacent to the vertex near the motor cortex. Hence we presume that the regionally reduced coil sensitivity contributed in part to the lower activation significances and more extensive unclassified regions in the intra-operative scanner data.
We tested our novel approach for thresholding Bayesian statistical maps of task-related BOLD activity on a brain tumor patient. We acquired our passive movement paradigm on the 3T presurgical scanner and derived activation maps for the motor localization and motor extent using our Bayesian statistics framework. For the Bayesian result maps we used our novel threshold method. The frequentist statistics results were based on the expert observers thresholds. Bayesian and frequentist statistics revealed activation in the motor cortex for both clinical scenarios (targeting the extent or location of the motor cortex), as well as deactivation in various areas. Bayesian statistics revealed additionally non-activated and low-confidence areas.
In this clinical case the motor-strip BOLD activation was not expected to suffer from tumor-related abnormal vessel blood flow and lack of auto-regulation in the tumor, as the lesion is inferior to the hand knob. However, in future studies a perfusion map to exclude abnormal vascularization, not part of the current study protocol, might be helpful.
The deactivated areas are neither in the primary motor cortex nor in the tumor and are not directly adjacent to the activated area. Therefore we consider those areas to be correctly identified and not be paradoxically negative BOLD (Fujiwara et al., 2004). However in patients with tumors in eloquent brain areas this might apply and therefore activated and deactivated areas must be assessed in combination (Fujiwara et al., 2004).
We can further exclude influences on the BOLD response by medication, e.g., acetazolamide (Brown et al., 2003) and impaired attention. However, we can not exclude that the patient age (58 years) had an impact on the image intensity (Chen et al., 2008). To cope with potential abnormal hemodynamic response functions in tumors patients, dispersion derivatives (Friston et al., 1998) or more sophisticated approaches (Chaari et al., 2013;Woolrich et al., 2004b) could be used in future applications of our method.
As an alternative approach to passive fMRI or direct cortical stimulation, resting state fMRI has recently been proven to offer similar sensitivity and specificity without requiring the patient's active cooperation (Qiu et al., 2014). However the authors did not address whether this still pertains when a lesion affects the motor cortex, or how resting state fMRI performs under tumor-induced brain plasticity. Alternative, topological analysis (Due-Tonnessen et al., 2014) also cannot detect tumor induced brain plasticity, because only structural brain parameters are considered, although this method is reliable where the central sulcus is not, or is only moderately distorted by the lesion (Due-Tonnessen et al., 2014).
General application in patients may also pose other new challenges: cerebral blood flow and BOLD response may be affected by tumor pathology or other malformations. One approach to reduce the potential impact of physiological noise is advanced correction (Josephs et al., 1997;Hutton et al., 2011) incorporating physiological monitoring signals. Patient compliance may also differ significantly (claustrophobia, impaired attention, . . . ), although the passive motion paradigm is expected to be minimally affected by this.

Conclusion
We implemented, calibrated and tested for self-consistency a Bayesian fMRI analysis framework using passive motor task data from a conventional 3T MRI scanner and a 1.5T intraoperative MRI scanner. In comparison to frequentist analysis we demonstrated that Bayesian fMRI analysis yields identification of activated, deactivated and non-activated brain regions, important for guiding brain tissue resection avoiding functionally eloquent brain areas. Additionally, the ability to explicitly identify remaining regions, which could not be classified in the Bayesian analysis, is also important, as these represent the limits of the information available from the fMRI experiment. These maps will further empower surgeons to address the practical problem of ambiguity in fMRI results, which may be more pronounced in intra-operative MRI scanners due to generally lower static magnetic field strengths and RF coil placement restrictions.