Correcting for Superficial Bias in 7T Gradient Echo fMRI

The arrival of submillimeter ultra high-field fMRI makes it possible to compare activation profiles across cortical layers. However, the blood oxygenation level dependent (BOLD) signal measured by gradient echo (GE) fMRI is biased toward superficial layers of the cortex, which is a serious confound for laminar analysis. Several univariate and multivariate analysis methods have been proposed to correct this bias. We compare these methods using computational simulations of 7T fMRI data from regions of interest (ROI) during a visual attention paradigm. We also tested the methods on a pilot dataset of human 7T fMRI data. The simulations show that two methods–the ratio of ROI means across conditions and a novel application of Deming regression–offer the most robust correction for superficial bias. Deming regression has the additional advantage that it does not require that the conditions differ in their mean activation over voxels within an ROI. When applied to the pilot dataset, we observed strikingly different layer profiles when different attention metrics were used, but were unable to discern any differences in laminar attention across layers when Deming regression or ROI ratio was applied. Our simulations demonstrates that accurate correction of superficial bias is crucial to avoid drawing erroneous conclusions from laminar analyses of GE fMRI data, and this is affirmed by the results from our pilot 7T fMRI data.


INTRODUCTION
Different layers in the neocortex support different types of neural computations. For instance, different layers of the visual cortex are preferentially involved in feedforward vs. feedback connectivity (Rockland and Pandya, 1979;Rockland, 2017), suggesting that they encode distinct "bottom-up" and "top-down" processes. However, because the cortical ribbon is only 2-3 mm thick in the sensory cortices (vonEconomo, 1929), conventional fMRI acquisitions with 2-3 mm isotropic voxels cannot resolve these layers (Dumoulin et al., 2017). Thus, most previous investigations of laminar organization involved invasive measurements that are generally unsuitable for human in vivo studies (Takahashi et al., 2016;Van Kerkoerle et al., 2017).
Recent advancement in MRI scanners, however, has changed the landscape. With higher field strengths (e.g., 7T), and in turn higher signal-to-noise ratios, human scanners are able to acquire data at submillimeter resolution, and thereby offer layer-specific or "laminar fMRI." Recent 7T fMRI studies (Polimeni et al., 2010;Muckli et al., 2015;Kok et al., 2016;Lawrence et al., 2019) have suggested that top-down modulation of neural activity (for example, by attention or expectation) occurs in specific layers, though there is a disagreement about which layers. For example, Kok et al. (2016) used the Kanizsa triangle illusion to study attentional effects. By restricting their analysis to an regions of interest (ROI) that responded to the part of the visual field containing illusory contours but no actual stimulus input, they observed evidence of strongest top-down feedback effects in the activation of deep layers of visual cortex. Lawrence et al. (2019) utilized visual gratings and manipulated both bottom-up effects of visual contrast and topdown effects of spatial attention. While they found bottom-up effects on activation were strongest in the middle layer, their topdown effects were strongest in the superficial layers, contrary to Kok et al. (2016). Muckli et al. (2015) used partially occluded images, where part of a visual scene was replaced with a blank square. They again focused on an ROI that responded to the occluded region of the visual field, but rather than measuring the mean activation of that ROI, they attempted to decode the pattern of activity over all voxels within that ROI. While decoding of bottom-up information (in unoccluded images) was above chance and relatively constant across layers, they found that above-chance decoding of top-down information (in partially occluded images) only occurred in superficial layers. These discrepancies in layer selectivity could reflect a dissociation between superficial and deep layers in terms of the type of feedback they receive (depending on the precise paradigm), or could arise due to different analysis methods, which may result in different sensitivity to effects in particular layers. This highlights the importance of examining the assumptions behind the acquisition and analysis methods that researchers use.
In terms of acquisition, most laminar fMRI studies, including the ones cited in the above paragraph, have used gradient echo (GE) MRI sequences that are sensitive to the blood oxygenation level dependent (BOLD) signal (Olman and Yacoub, 2011), but alternative sequences are also increasingly popular (De Martino et al., 2013). The BOLD-GE sequences tend to provide good sensitivity to functional changes, but are also susceptible to signal artifacts, particularly from the large draining veins on the cortical surface (Boxerman et al., 1995;Rua et al., 2017).
An alternative class of sequences measure BOLD with spin echo (SE), or gradient and spin echo (GRASE) (Feinberg et al., 2015). These sequences are less prone to superficial bias but have lower functional contrast to noise ratios (Beckett et al., 2019) and are also limited by more stringent SAR constraints. A final class of sequences focuses on alternative metrics of neural activation. Cerebral blood volume (CBV) can be measured with vascular space occupancy (VASO) sequences (Lu et al., 2013;Huber et al., 2017b), and cerebral blood flow (CBF) can be measured using arterial spin labeling (ASL) sequences (Petcharunpaisan, 2010;Huber et al., 2017b;Kashyap et al., 2019). While these blood flowbased sequences are able to remove the spatial blurring due to draining veins, they come with drawbacks relative to GE-BOLD MRI. These drawbacks include the largest CBV changes being localized in the arteries and potential dilation retrogradely in the upper layers relative to the location of neuronal activation (Uludag and Blinder, 2018). These methods also tend to have less sensitivity as a trade-off for their higher specificity (Huber et al., 2017b). Finally, VASO and ASL generally have longer TRs than GE and so are generally restricted to a smaller field of view (Huber et al., 2017a).
These limitations of alternative sequences have resulted in continued popularity for BOLD-GE laminar fMRI (Kok et al., 2016;Lawrence et al., 2019;de Hollander et al., 2020;Liu et al., 2020). However, the main drawback of using BOLD-GE sequences is the presence of superficial bias, where larger signals are observed in the superficial layers relative to the deep layers. Specifically, we are referring to the mitigation of variations in signal across layers at high resolution, and not the initial bias that arises due to large pial veins at low resolution fMRI. There are two neurological bases for the presence of this superficial bias: the presence of intra-cortical ascending draining veins and variations in baseline physiological parameters across cortical depths. Draining veins carry deoxygenated blood from the deep layers toward the superficial layers, which results in leakage effects across all voxels, whereby the measured signal is a combination of the laminar signal and unwanted signals from all the layers beneath (Markuerkiaga et al., 2016;Havlicek and Uludag, 2020). Secondly, variations in baseline CBV and relaxation parameters such as T2 * across different cortical depths have a multiplicative effect on the signal (Kashyap et al., 2017) and thus, can also give rise to a superficial bias. The net effect is that BOLD-GE sequences have lower specificity and a bias toward higher signal in superficial layers, which complicates comparisons of task-related activation across layers.
There have been many different approaches to characterize and correct for this superficial bias, although few investigators have systematically explored whether different superficial biascorrection methods are expected to successfully correct bias under a particular model. In terms of bias models, Huber (2020) distinguishes three general classes of models: the linear-offset model, the multiplicative model and the leakage model. The multiplicative model seeks to emulate the effect of variations in baseline parameters across cortical depths, which has a multiplicative effect on the fMRI signal. Meanwhile, the leakage model attempts to capture the effect of draining veins on the fMRI data as it simulates the propagation of signal from the deep layers to the superficial layers, mimicking the effect of draining veins carrying oxygenated blood toward the superficial layers. In terms of methods that seek to correct bias in empirical studies, proposed solutions include Z-scoring timecourses (Lawrence et al., 2019), L2 normalization of beta estimates across conditions (Kay et al., 2019), taking the ratio of activations in two experimental conditions (Kashyap et al., 2017;Liu et al., 2020) and decoding using multi-voxel classification techniques (Muckli et al., 2015;de Hollander et al., 2020).
Here, we explore how seven of these superficial bias-correction methods perform under a simple multiplicative model applied to a visual attention paradigm. Specifically, we simulated data with a (non-linear) laminar profile, and compared how different superficial bias-correction methods perform in recovering these ground truth profiles in the context of a superficial bias nuisance effect. We also introduce a new application of Deming regression to this problem, and found that this method outperformed other commonly used approaches when conditions do not differ in regional-mean offset, while performing comparably to a region-mean ratio metric when conditions do differ. By introducing a method that provides robust correction for superficial bias, future studies will be able to benefit from the high sensitivity of BOLD-GE sequences for laminar fMRI.

MATERIALS AND METHODS
Since the simulations are matched to the design of the example dataset, we describe this experiment first. Processed simulation and fMRI data is available at https://github.com/MRC-CBU/ LaminafMRIsimulations/tree/1.1. Full simulation and analysis code is also available at the same repository. Note that the main fMRI task is carried out at 7T while the localizer tasks were conducted at 3T.

Experimental Design
We designed a block-based sustained attention paradigm. On each trial, participants performed a same-different judgment of two images that were presented at diagonally offset locations around a central fixation cross (Figure 1). We varied the attended location (positive diagonal vs. negative diagonal of a 2 × 2 stimulus array around the fixation cross) and stimulus category (face or house images) between blocks of 10 trials. Each run comprised 20 blocks in total: attending to houses along the positive diagonal (H 45 ), attending to houses along the negative diagonal (H 135 ), attending to faces along the positive diagonal (F 45 ) and attending to faces along the negative diagonal (F 135 ). The conditions were presented in a sequence that was randomized separately for each run.
Between runs, we also manipulated the presence of a second pair of distractor image sequences, which, when present, were located at the opposite location and drawn from the opposite stimulus category. For both simulation and our fMRI acquisition, there were a total of eight runs: four with distractors present (TaskD+) and four with distractors absent (TaskD−). For the fMRI acquisition, we alternated the presence or absence of distractors between runs. In the simulation, each run is detrended individually and hence, there is no overarching effect due to the order of runs.
We attribute any selectivity for category or location when the distractor is present (TaskD+) to attentional selection, because both locations and both categories are present on each block, which provides complete matching of the conditions at the level of bottom-up stimulus-driven responses. By contrast, location and category selectivity during the no distractor (TaskD−) context could also reflect stimulus-driven effects of the stimulated visual quadrants or the presented object category. Some of the superficial bias-correction methods we explore below are based on the notion that the no distractor (TaskD−) context provides a control condition that can be used to correct any superficial bias in the estimates for attentional selectivity when the distractor is present (TaskD+). We will expand on this idea below.

Stimulus and Procedure
All stimuli were created using Matlab (2009a, The MathWorks, Natwick, MA, United States) and presented in the scanner using Presentation (v17.2). For the main experiment, the category stimuli were presented in a circular patch at four locations, diagonally from the fixation cross at 45, 135, 225, and 315 • , respectively and spanning 0.16-2.42 • eccentricity. The fixation cross was shifted up from the center of the screen by 2 • visual angle due to visual obstruction of the lower segment of the screen by the head coil. There were a total of 20 face images and 20 house images for each category. All images were presented in grayscale and histogram-matched to equate luminance and root mean squared contrast. Thus, any category selectivity cannot be attributed to differences in brightness or contrast between the categories.
At the start of each block, two white dots (0.10 • eccentricity) appeared for 350 ms indicating the pair of patches [either 45 and 225 • (indicated by 45 ) or 135 and 315 • (indicated by 135 )] to which the participant should attend. This was followed by 550 ms of fixation. The stimuli then appeared for 950 ms, during which the participant was required to perform a same-different judgmenton the two attended stimuli, followed by 550 ms of fixation. The two stimuli were identical on 50% of trials. This stimulus-fixation trial was repeated 10 times within each block ( Figure 1A). Between blocks, there was a rest block of fixation with a duration of 1560 ms (1260 ms for the two participants with TR = 2440 ms, see section "Data Acquisition" for more details). The duration was chosen to ensure that the start of each block was in sync with the start of a volume acquisition. In the distractor-absent condition, the display consisted of two stimuli from the attended category in the attended locations, while in the distractor-present condition, two additional stimuli from the other category appeared in the non-attended locations ( Figure 1C).

Stimulus Design for Localizer (fMRI)
The localizer session comprised of four runs of a categoryselective localizer.
The category-selective localizer task comprised 15-s blocked presentations of sequences of faces, scenes, objects, scrambled objects and fixation, with 1 s fixation between blocks. Each of these five block types appeared in a random order in each run. There were 20 blocks per run (four presentations of each block type). Within each block, 25 randomly selected stimuli from the current category were presented consecutively for 800 ms each. Participants performed a 1-back matching task while fixating on a black dot in the middle of the screen.
We used the categorical localizer data to define the following ROIs according to conventional functional criteria (see below for details): occipital face area (OFA), fusiform face area (FFA), scene-selective transverse occipital sulcus (TOS), and parahippocampal place area (PPA).

Data Acquisition
Participants provided informed consent under a procedure approved by the institution's local ethics committee (Cambridge Psychology Research Ethics Committee). A total of six healthy participants (two females, age range 20-41, two participants were authors of this study) were scanned for this pilot study.
Prior to the experiment, participants underwent a separate session of behavioral training with eye-tracking using an SMI FIGURE 1 | Panel (A) shows the experimental paradigm for each trial block. An initial pair of white dots cued participants to attend to a specific diagonal at the beginning of each task block. Ten image pairs from one category (faces or houses) appeared sequentially along the attended diagonal. Panel (B) illustrates the four main stimulus block types. In the "TaskD+" distractor-present condition, image pairs from the other category appeared along the opposite diagonal; in the "TaskD-" distractor-absent condition, no stimuli appeared in the opposite diagonal (C). Participants decided whether the two stimuli on the attended diagonal were "same" or "different" (D); 50% of trials involved the same stimuli. The purple dotted circle indicates the attended regions and was not present for the participant.
Frontiers in Neuroscience | www.frontiersin.org high speed eye tracker. The participant attempted the same task and received feedback on their fixation levels after each run. We recorded calibrated eye position during each block, and measured fixation stability as the difference in standard deviation along the attended and neglected axes over the block. We repeated the task until this fixation stability metric met a criterion level of under 0.5 • visual angle mean difference for two consecutive runs. As eye-tracking was not available in the 7T scanner, the behavioral training was important to ensure that participants were able to perform the task while maintaining fixation.
Participants contributed data over a total of two MRI sessions; one session of retinotopic and category localizers at 3T and one main experimental session at 7T. The 3T data were acquired on a Siemens 3T Prisma-Fit scanner using a standard 32-channel head coil, while the 7T data were acquired on a Siemens 7T Terra scanner using the Nova Medical 1Tx/32Rx head coil. At the start of each session, we also acquired a MPRAGE (3T) or MP2RAGE (7T) (Marques et al., 2010) structural that was used for co-registration across sessions. Participants maintained fixation on a cross in the middle of the screen throughout, and fixation accuracy was verified using the 50 Hz SMI MRI eye tracker system at 3T.

MRI Data Pre-processing
For the functional volumes acquired on the 3T scanner, the images underwent slice time correction and rigid body realignment using SPM12 1 . For the volumes acquired on the 7T scanner, the images first underwent slice time correction using SPM12, and then TOPUP (Andersson et al., 2003) in FSL v6.0 (Niazy et al., 2004) was applied to estimate the susceptibility-induced distortions, using the first five volumes of each experimental run plus five additional volumes acquired before each run with the reverse phase-encoding direction. The resultant distortion correction was applied to the entire run. Each fMRI volume was then individually realigned to the structural using boundary-based registration (BBR) (Greve and Fischl, 2009). In doing so, we use the structural as a reference to ensure 1 www.fil.ion.ucl.ac.uk/spm that the volumes are co-registered with each other, which we previously showed to provide better realignment within and across runs for high-resolution fMRI data compared to standard methods that separate motion correction and co-registration (Huang et al., 2020).
Of note, some previous work has attempted to correct for superficial bias in this stage via the exclusion of voxels with low tSNR and high t-values on a fMRI statistical map (Olman et al., 2007;Jia et al., 2020;Zamboni et al., 2020). We attempted to exclude voxels with the bottom 30% tSNR in our analysis. However, this did not change our findings substantially (see Figure 7 and Supplementary Figure 1) and hence, was not included in the main analysis to reduce the number of potential confounds.

Regions of Interest
For the categorical ROIs, activation t-maps where obtained using SPM12 by fitting a GLM to the fMRI data from the categorical localizer runs. The activation maps were corrected for multiple comparisons using Gaussian random field theory in SPM12. The face-selective areas (FFA and OFA) were obtained from the t-score map from subtracting the object conditions from the face conditions. Similarly, the scene-selective areas (TOS and PPA) were obtained from the t-score map from subtracting the object conditions from the scene conditions. For each ROI, we identified the peak voxel in the expected anatomical location in a statistical map thresholded at p < 0.05 (uncorrected). We then grew the complete ROI by adding the contiguous voxel that was most selective for the localizing contrast iteratively until the ROI comprised 100 voxels (for implementation) 2 .
To improve the stability of the estimates and our sensitivity to any variations in the data across layers, we combined the TOS, PPA, OFA, and FFA regions to generate a pooled category-selective ROI. This approach is further justified in section "Laminar Analysis of Real 7T Data." Note that we flipped the sign of the contrast vector to align with the region's expected preference (e.g., F 45 +F 135 -H 45 -H 135 for FFA and H 45 +H 135 -F 45 -F 135 for PPA). All subsequent analyses concern this pooled category-selective ROI unless specifically noted otherwise.
We co-registered the 3T localizer session to the main experiment 7T session. This was done by first co-registering the 3T functional images to the 3T structural image using the SPM coreg function. The 3T structural image was then coregistered to the 7T structural image, again using the SPM coreg function. The transformations from both co-registration steps were then applied to the ROI mask images. No further transformation was necessary since the BBR realignment process realigns the functional 7T images to the 7T structural image. Note that the registration of the 3T localizer data to the 7T fMRI session results in an upsampling of the voxels from 3 mm isotropic resolution to 0.8 mm isotropic resolution to be congruent with the 7T fMRI data. This also results in the number of voxels in each ROI to increase from 100 voxels to an average of 2500 voxels.

Cortical-Depth Definition
The gray matter-white matter (GM-WM) and gray mattercerebrospinal fluid (GM-CSF) boundaries were obtained from the FreeSurfer's (version 6.0.0) reconstruction of each participant's 7T structural image (Fischl et al., 2002). These boundaries were visually inspected to ensure that the segmentation was accurate. In cases of poor segmentation, the realignment between the structural and the FreeSurfer segmentation template was manually adjusted prior to repeating the FreeSurfer reconstruction. The boundaries were exported to CBStools (version 3.0.8) (Bazin et al., 2012) and used to generate three equivolume (Leprince et al., 2015) segmentations of the GM. Each GM voxel was then assigned to one of the three layers using a winner-takes-all approach, in which the voxel is assigned to the layer with which it has the largest overlap.

Computational Simulations
Our computational simulations were carried out in Matlab (2019a, The MathWorks, Natwick, MA, United States); the code is available on github 3 .
For clarity, we describe our simulations in terms of categorical attentional selectivity for a face-selective area such as the FFA. However, the model is a general account for how fMRI responses arise as a function of neural modulations of interest and layer bias of no interest, so is applicable to other scenarios.
We model the effects of attention on neuronal responses, and how such modulations manifest in fMRI activations. In our model, the signal component in N = 2500 fMRI voxels is expressed as a sum over neuronal populations that are purely selective for houses or for faces. Neurons fire at rate f = 1 when their preferred category is present in the stimulus display, and f = 0 to the presence of the non-preferred category. We generate the number of neurons n that prefer each stimulus category for each voxel by sampling a halfnormal distribution, and control category selectivity by varying the standard deviation of the distribution over categories. At the population level, for an ROI that prefers faces, we set the standard deviation of that distribution to 1.1 for faces and 0.5 for houses; for an un-selective ROI (see section "Simulating the Effects of No Region-Mean Preference"), we equate the standard deviations to 0.5 for faces and houses. To simulate inter-subject variability, the specific standard deviation used for each subject was sampled from a truncated Gaussian with mean equal to the above population values and standard deviation of 0.25 and values restricted between 0 and double that of the population value. Although selectivity at the voxel level in this formulation arises from differences in the relative frequency of face and house neurons, one could also interpret these standard deviation parameters as controlling the firing rates of equally sized populations.
In our model, attention operates as a scale factor a that increases the responses of neurons coding the attended category, while leaving the unattended category unaffected. We simulate layer differences in the strength of attentional modulation by varying this a parameter as a function of the layer l. The signal component of the response R at voxel v across all categories c can be expressed as where l(v) indicates the (dominant) layer captured by voxel v. We modeled the neural response R(v) for the duration of each block, based on the timings of conditions in our experimental paradigm, and convolved these responses with a canonical hemodynamic response function to generate a timeseries for each voxel, h(v,t).
We added two sources of noise: A physiological noise source, which has a low-dimensional structure over voxels and which scales with layer depth, and a white thermal noise source, which is constant in magnitude over the simulated volume. The rationale for the physiological noise is to model known lowdimensional fMRI noise processes such as heartbeat, respiration and residual head motion (Liu, 2016), which voxels exhibit to different extents. Formally, the physiological noise, E p , was modeled by generating 20 independent Gaussian noise vectors for the entire timecourse and projecting a randomly weighted combination of the vectors onto each voxel (Huang et al., 2018). This physiological noise was scaled to match a random zero-mean Gaussian distribution with standard deviation of σ p . The resulting timeseries was then multiplied by a scalar L Bias to capture the bias toward superficial layers. Lastly, we added thermal noise, E t , from Rician distribution (Gudbjartsson and Patz, 1995) with standard deviation of σ t , which is independent of superficial bias. This was done for all 2500 voxels to generate a simulated timecourse, y(v,t), for voxel v in layer l. Thus: Having generated synthetic BOLD timeseries for each voxel, one can estimate the response to each condition by applying the general linear model (GLM), as is standard in fMRI analysis. To simulate the high-pass filtering commonly performed on real data to remove low-frequency noise in fMRI, first order sinusoidal and linear detrending was also performed. The GLM is based on the block timings and a canonical HRF, and produces a parameter estimate β for each condition and each voxel. The final contrast estimate (for a face-selective ROI) for the TaskD+ and TaskD− contrasts is then the subtraction of β for blocks when faces were unattended from that when blocks were attended.
To calculate the attentional contrast for our TaskD+ condition (when both positive and negative diagonals contain faces and houses, and the category being attended is equally often faces or houses), we take the simulated fMRI response during blocks where the participant is attending to the faces and subtract blocks where the participant is attending to the houses. The response for the TaskD+ contrast is therefore: where f = face, h = house, while in our TaskD− contrast, where faces or houses are only presented on the attended diagonal, the response is: We explored a wide range of physiological and thermal noise parameters, and the final values of σ p = 11 and σ t = 15 were chosen by visually comparing scatterplots of TaskD+ against TaskD− for our simulated dataset against the experimental fMRI dataset for the pooled ROI (Figure 2). Furthermore, the combination of noise parameters and standard deviation values for the half-normal distributions were chosen such that the distribution of t-statistic for the simulated contrast was similar to the experimental dataset (see Supplementary Figure 2). Similarly, an attentional modulation factor of a = 3 was used to reflect the estimate generated from the experimental fMRI dataset. To simulate weaker attentional modulation in the middle layer, an attentional modulation factor of a = 2 was used for the middle layers. Note that while the exact parameter values were coarsely estimated, the results of our simulation hold across a wide range of parameters (Supplementary Figure 3) and are thus not restricted to the specific values used in this simulation. The seven attentional modulation metrics were evaluated using simulated datasets with a V-shaped attentional modulation profile across layers of a l = 1...3 = [323]. This profile indicates stronger attentional modulation in the superficial and deep layers, as might be expected from stronger feedback connectivity in these layers (Rockland and Pandya, 1979;Rockland, 2017). We repeated the simulation 10,000 times to calculate the accuracy and reliability of the seven attentional modulation metrics.

Attentional Modulation Metrics
We were interested in validating the accuracy and reliability of the following seven metrics of attentional modulation, which were applied to both the simulated and real data. The metrics were analyzed in two ways. Firstly, the variation of each metric across layers was plotted and compared to the "ground truth" attentional modulation graph. Secondly, for each metric, we quantified the contributions of the superficial bias and attentional modulation toward the laminar profile. This was achieved by correlating the simulated data with either a [1 0 −1] contrast vector (for superficial bias) or [0.5 −1 0.5] (for attentional modulation) to determine the contributions of the two factors. The ideal laminar profile correlates with attentional modulation and not with superficial bias.

Ratio Metrics
The first group of attentional modulation metrics can be classified as ratio metrics; these metrics take the ratio between two experimental conditions (in this case, TaskD+ and TaskD−) to remove the superficial bias across layers. This method is motivated by the assumption that the baseline parameters, including baseline blood volume, oxygen extraction fraction, T2 * relaxation time and echo time, have a multiplicative effect on BOLD sensitivity, as suggested in Kashyap et al. (2017) and conforms to the single vascular compartment BOLD signal models (such as in Buxton et al., 2004). Thus, we can formulate the BOLD signal change as δS = L(l) × R, where L is a function of the baseline physiological parameters that influence the BOLD response and R is the actual change in CBV and concentration of deoxygenated hemoglobin in response to neural activity. In our simulation, L bias (l) is used to model L(l). We assume that L(l) is constant within each cortical layer but varies across cortical layers, while R is the quantity of interest. Thus, by taking a ratio of the signal changes for two contrast estimates for each voxel, we can remove the dependence of the contrast on the baseline parameters: Note that while the attentional modulation, a, is defined to be greater than 1, the ratio metric of TaskD+/TaskD− in our simulation and experiment should always be less than 1. We refer to this ratio metric of TaskD+/TaskD− as selectivity, S, to differentiate it from the attentional modulation. The selectivity looks at the magnitude of top-down attentional modulation in TaskD+ as a fraction of both top-down attentional modulation and bottom-up stimulus activation in TaskD−. If we assume that our model is accurate, the relationship between selectivity and attentional modulation is:

Ratio of individual voxels (voxel ratio)
Given the above argument, one possible approach is to calculate the ratio of two values the GLM contrast for the TaskD+ condition relative to that for the TaskD− condition, for each voxel separately, and then average these ratios across voxels in a layer: We refer to this metric as the "voxel ratio" throughout. However, this method has a high sensitivity to voxels where the TaskD− contrast is close to zero (e.g., because no attentional effect, or owing to random noise or fMRI susceptibility artifacts), which can produce extreme outliers. Assuming both TaskD+ and TaskD− contrasts follow a Gaussian distribution, the resultant voxel ratio would have a heavy-tailed distribution, on which statistical tests are difficult. As such, we do not expect the voxel ratio to generate stable estimates but include it for completeness.

Ratio of entire ROI (ROI ratio)
To reduce the effect of outlier voxel values, one can first average (or sum) responses across voxels in a layer, before taking the ratio of the averages in each condition (cf. Liu et al., 2020): We refer to this metric as the "ROI ratio" for simplicity. A similar approach was adopted by Kashyap et al. (2017), where they compared the average of the activation peak against the average post-stimulus undershoot. However, by taking the average prior to the ratio, this method discards any pattern FIGURE 2 | Scatterplots of the voxel responses (contrast estimates from the GLM) to TaskD+ against TaskD-comparing the simulated data with a = 3 (A) against the real 7T data (B). The scatterplots look similar, suggesting that the simulation is capturing the behavior of real voxels. Note that the methods we evaluate are invariant to the absolute scale of the response, so these do not require matching; rather, we are only interested in the relation of the values across task types, i.e., the gradient of the best fit line (indicated by the red line), which is comparable across simulated (0.65) and real (0.72) data.
information that might be present within the voxels (we return to this point below).

Deming regression
Here, we chose not to average across voxels, but to regress one condition against the other condition and estimate the gradient of the best fit line. Importantly, we used Deming regression to estimate this gradient, rather than ordinary least squares, because Deming regression accounts for errors in both variables (Adcock, 1878). We theorize that this gradient would provide a stable estimate of the selectivity. A key benefit of Deming regression is that it does not require a regional-mean activation level difference between the two conditions, unlike the ROI ratio approach above. This makes the method potentially attractive for cases where voxels within an ROI have strong but opposing selectivity, as might for instance be expected in retinotopic maps of the visual field. In our study, we assume the errors in both variables have similar variances (i.e., δ = 1) as data for both TaskD− and TaskD+ were acquired during the same scanner session and in an interleaved manner across runs. Thus, we can assume that the variance of the noise is similar across TaskD+ and TaskD−. Note that this special case of Deming regression can also be referred to as Orthogonal regression.

Z-Scoring of Data
Rather than taking ratios or gradients of the estimated responses (i.e., contrasts of GLM Betas), another approach is to normalize the original data (y(v,t) in Eq. 2) before fitting the GLM, for example by using Z-scoring (Lawrence et al., 2019), in an attempt to equate the scaling across layers. Here, after Z-scoring the data and fitting the GLM, we simply calculated the contrast estimate for the TaskD+ condition. Z-scoring assumes that the noise and the signal of interest have a linear correlation. However, given that this is not the case in our simulation (physiological noise scales with the signal but thermal noise does not), we expect Z-scoring to perform worse than the ratio approaches.

L2 Normalization
The last method included in our analysis is L2 normalization, as suggested in Kay et al. (2019). Similar to Z-scoring, this method seeks to eliminate the multiplicative effect of superficial bias by normalizing the response. However, the normalization (scaling factor) is estimated across parameter estimates (from a GLM fit) rather than across scans (i.e., the original timeseries, as in Z-scoring). The scaling factor (the square root of the sum of squared values) is also relative to zero, which depends on the baseline in the GLM, rather than being relative to the mean, as in Z-scoring.

Multivoxel Pattern Analysis
The methods described above that first average over voxels within an ROI are only able to estimate a layer's univariate activation. An alternative approach is to examine the multivariate information across voxels, e.g., by comparing the ability to classify whether faces or houses were attended. This was the approach taken by Muckli et al. (2015). The logic underlying this approach is that cross-validation performance will be unaffected by superficial bias as long as the effect of interest and the noise scale together with overall signal magnitude. Like Z-scoring above, any additive (thermal) noise components will invalidate this assumption and reintroduce sensitivity to layer depth.
Here, we compared the ability of two methods of classification, support vector machines (SVMs) classification and linear discriminant contrast (LDC) to discern between attending to houses vs. faces in the TaskD+ condition. SVM is a robust method that has been widely used in fMRI analysis (Hoeft et al., 2011;Meier et al., 2012;Weygandt et al., 2012;Abdulkadir et al., 2013) and have also been used previously for laminar analysis (Muckli et al., 2015). LDC is often preferable for fMRI decoding since it is a continuous measure that does not exhibit ceiling effects (Zhu et al., 2008;Misaki et al., 2010;Yoon et al., 2012;Kriegeskorte and Diedrichsen, 2016;Huang et al., 2018). Furthermore, unlike SVM, it does not require multiple instances for effective training, and therefore avoids issues to do with small numbers and temporal autocorrelation of single-trial/block estimates from the same run. Previous work demonstrated that LDC outperforms SVM in terms of detecting a difference in discriminability (Huang et al., 2018) and in terms of overall reproducibility of pairwise discriminants over splits in the data (Walther et al., 2016). Both SVM and LDC used leave-one-out cross-validation across the four runs of each task.

SVM
We used the "fitcsvm" classifier in the Matlab Bioinformatics toolbox, with default settings (specifically, a linear kernel with C = 1). To obtain multiple patterns for training, each block within a run was modeled with a separate regressor in a new GLM, producing 60 block estimates for training and 20 for testing on each fold of leave-one-run-out cross-validation. The final result is mean classification accuracy (% of test blocks correct).

LDC
Linear discriminant contrast is a continuous statistic related to Fisher's linear discriminant. First, the training data is used to generate a set of weights to maximize the distance between the conditions of interest (e.g., houses vs. faces). This set of weights is referred to as the discriminant. The LDC quantifies the difference between the two conditions in the testing data, measured on this discriminant.
As LDC does not require multiple training patterns, all blocks of the same type (within the training set) were modeled with a single regressor in the GLM. The contrast between attending to faces vs. houses generates a distance metric, which is normalized using the sparse covariance matrix of the noise residuals (Ledoit and Wolf, 2003) to produce a weights vector. The dot product of the weights vector with the pairwise contrast estimate from the held-out test run produces the LDC test statistic. The final result is the average LDC across the four folds of leave-one-out cross validation runs and normalized by the square root of the number of voxels (which differed by ROI in the real data).

Comparison of the Seven Attentional Modulation Metrics
For our main simulations, we simulated an attentional modulation that is strongest in the superficial and deep layers and weaker in the middle layer. The simulated fMRI responses were then measured according to the seven different metrics and compared against the ground truth profile to verify their accuracy and precision. Selectivity estimates using Deming regression, Voxel ratio and ROI ratio were plotted on the same axis as they generate commensurate estimates; the remaining three metrics were plotted on individual axes and scaled to best match the ground truth profile. In addition, we also plotted the raw contrast estimates and "ground truth" attentional modulation as a baseline comparison. The "ground truth" attentional modulation ( Figure 3A) is obtained from the attentional modulation [a(c,l(v))] that is used as a model input while the raw contrast estimates (Figure 3B) is given by R D + (v) in Eq. 4 and reflects the response obtained by fitting a GLM to the simulated data and contrasting the blocks, identical to what is done to the real fMRI data in section "Laminar Analysis of Real 7T Data." As can be seen in Figure 3, both Deming regression and ROI ratio were able to replicate the attentional modulation profile with high precision: The variability across iterations was low, as the percentile error bars illustrate. Though the voxel ratio recovered the V-shaped profile, it had very large error bars, owing to extremely high values for some voxels with a denominator (TaskD−) close to zero.
Z-scoring the data before fitting the GLM produced a slanted V profile, indicating contributions from superficial bias as well as attentional modulation. L2 normalization performed a bit better, being able to retain the V shaped profile, but still showing a significant difference between the deep and superficial layers due to superficial bias. Furthermore, the L2 normalization is also substantially noisier than both Deming and ROI ratio metrics.
The SVM and LDC multivoxel classification methods produced similar "slanted V" profiles, with residual effects of superficial bias. The LDC method exhibited particularly dramatic superficial bias, with strong discriminant values for the superficial layer where the response magnitude was greatest.

Quantifying the Relative Contributions of Superficial Bias and Attentional Modulation to Laminar Profile
We correlated the laminar profile of each metric (normalized by their mean) with either a [1 0 −1] or a [0.5 −1 0.5] vector to obtain a summary measure of the contributions of superficial bias and attentional modulation across layers. This was done by multiplying the estimates of each layer with the corresponding vector element and summing the result. These summary measures indicate the reliability of any apparent layer differences and provide a means to compare the magnitude of attentional modulation and superficial bias effects within each metric.
As shown in Figure 4, both Deming regression and ROI ratio were highly correlated with the ground-truth attentional modulation effect, with almost zero contribution of superficial bias. Both metrics also exhibited low variability over iterations, as indicated by the 25-percentile error bars. By contrast, the voxel ratio was highly variable. While relatively stable over iterations, the Z-scoring metric was substantially influenced by the nuisance superficial bias effect. This superficial bias was less pronounced after L2 normalization, but still present. Similar effects were observed for the SVM metric. The LDC metric had the lowest variability over iterations, but exhibited the strongest influence of superficial bias out of all evaluated metrics.
The SVM and LDC metrics exhibited distinct effects even though both are based on linear discriminants. We reasoned that this could reflect compressive effects of close-to-ceiling performance for the SVM (Figure 3F). This could obscure a superficial bias effect since the highest accuracy is in the  superficial layer, where the bias is strongest. As a continuous metric, LDC does not exhibit ceiling effect. This can be demonstrated by repeating the simulation with increased noise levels (such as σ p = 20 and σ t = 30, Supplementary Figure 4), under which the profiles for the SVM and LDC metrics were comparable, confirming that the apparent differences between the SVM and LDC metrics in Figures 3, 4 reflect an SVM ceiling effect.

Simulating the Effects of No Region-Mean Preference
Both the Deming regression and ROI Ratio showed similar results in our initial simulations. However, Deming regression is potentially sensitive to local variations within an ROI, even if there is no global preference of that ROI. To illustrate this, we repeated the simulation with the same underlying [3 2 3] attentional modulation ( Figure 5A) but removed the global preference for houses by sampling the density of house responsive cells and face responsive cells from the same distribution (folded normal distribution with mean 0 and standard deviation 0.7). Thus, each voxel has an equal chance of preferring faces or houses, with a resultant nearzero global preference for either house or faces (on average across simulations). As Figure 5B shows, there was no distinguishable variation in the measured response. However, Figure 5C shows that only Deming regression was able to recover the V-shape with sufficient sensitivity (i.e., relative to the error bars).
It is interesting to note that, while the ROI ratio (like the Voxel ratio) is extremely noisy when there is no global preference, there is still a hint of a V-shape in the means in Figure 5C. We suspect this occurs because, while the overall preference averages to zero across all simulations, by chance, most individual simulations still have some global preference for either faces or houses. These small, but persistent, global preferences would still be sufficient for a ratio metric to recover the attentional modulation to some extent. However, as the percentile error bars illustrate, these ratio metrics are likely to be too variable to be useful when global preferences are near zero. This result demonstrates that Deming regression is the only method that can robustly detect layer specific modulations even in the absence of regionaverage differences.

Laminar Analysis of Real 7T Data
Initial laminar analysis of the 7T data showed strong selectivity in the superficial layers with a constant decrease toward deeper layers, consistent across all ROIs. In Figure 6, we plotted the contrast estimates for the TaskD+ condition for the ROIs to illustrate the consistency of the superficial bias profile across the ROIs. For the scene-selective TOS and PPA, the contrast estimate is defined as taking the house attention condition and subtracting the face attention condition, while for the faceselective OFA and FFA, the contrast estimate is defined as taking the face attention condition and subtracting the house attention condition. This definition is chosen to align with the category preference of the ROI so as to generate a positive contrast for all ROIs. This reduction in selectivity with layer depth is consistent with previous evidence for a superficial bias in GE sequences (Polimeni et al., 2010;De Martino et al., 2013). In order to obtain maximally robust estimators for the following analyses, we pooled the voxels from TOS, PPA, OFA, and FFA into a single pooled "category-selective" ROI. This step can be justified by the considerable variability across the six participants in the context of relatively higher similarity of selectivity profiles across the four ROIs. The pooled ROI exhibited substantially reduced between-participant variability (Figure 7A), while preserving the superficial bias profile we observed in the individual ROIs. In the following analyses, we focus on this pooled ROI, since we had no hypotheses concerning regional specificity of these effects. Furthermore, repeating the analysis on individual ROIs (Supplementary Figure 4) yielded similar, but substantially noisier, results as compared to the pooled analysis (Figure 7).

Laminar Analysis of 7T Data Using Attentional Modulation Metrics
The attentional modulation of the voxels as estimated with Deming regression was constant across layers ( Figure 7B). Considered together with the simulation results, this indicates that the increased selectivity in superficial layers that we observed in Figure 6 can be explained by GE superficial bias. The ROI ratio metric demonstrated a similar lack of variation across layers, and comparable inter-subject variability. The voxel ratio was highly unstable across participants (see inset panel for full data range). The remaining four measures (Z-scoring, L2 normalization, SVM classification and LDC) exhibited stronger effect in superficial layers (Figures 7C-F), consistent with the sensitivity to superficial bias that we observed in our simulations.
Taken together, our results suggest that the superficial bias, as shown by the decrease in measured response across layers (Figure 7A), is successfully removed by both Deming regression and ROI ratio metrics (Figure 7B), giving rise to the flat selectivity profiles. Other methods were unsuccessful in removing the superficial bias.

DISCUSSION
A key challenge for interpreting layer analysis of high-resolution GE EPI fMRI data is that signal magnitude decreases with layer depth. This study investigates methods for correcting such superficial bias effects. We used computational simulations to evaluate the ability of seven different metrics to recover an attentional modulation layer effect of interest in the presence of a superficial bias nuisance effect. Two of the evaluated methods were proposed by us (Deming regression, LDC) and the remaining have been previously used in the literature (Voxel and ROI Ratio, Z-scoring, L2 normalization, and SVM). Only the ROI Ratio and Deming Regression metrics were able to recover the attentional modulation layer effect accurately and precisely. While the remaining metrics were able to detect some underlying differences in attentional modulation across layers, they were either noisier than the aforementioned two metrics (in the case of the Voxel Ratio) or retained a substantial component of residual superficial bias (in the case of Z-scoring and L2 normalization, or the multivoxel methods of SVM and LDC). Such partial correction for layer bias is particularly concerning because it might lead to mistaken inferences. Thus, the main contribution of our study is to demonstrate that Deming regression and ROI ratio are the most promising metrics for layer analysis of GE fMRI data.
Although the Deming regression and ROI ratio metrics performed similarly in the context of a region-mean activationlevel difference between the conditions, we found that Deming Regression was better able to extract layer-profiles when we simulated a scenario where activation-level differences were near zero. This property may be useful in early visual areas such as V1, where an ROI might not show a global preference for two orientations of a visual stimulus, even though voxels within that ROI often show a local preference for one or other orientation (Kamitani and Tong, 2005;Alink et al., 2013). Conversely, our simulations suggest that the ROI ratio metric produces slightly less variable estimates when ROIs do exhibit strong activationlevel differences. In summary, we recommend Deming regression as a general solution for correcting superficial bias in GE fMRI, although studies focused exclusively on effects carried by the regional mean may realize a small improvement by adopting the ROI ratio metric instead.
The substantial residual superficial bias in the Z-scoring, L2 normalization, SVM and LDC metrics can be explained by the characteristics of fMRI noise, which likely exhibit components that are both additive and multiplicative with respect to layer depth in GE fMRI data. For instance, Z-scoring assumes a linear relationship between the noise and the contrast of interest, and normalizes the contrast by dividing it by an estimate of noise from the variance in the data. However, since our model includes both thermal (laminar invariant) and physiological (laminar dependent) noise sources, the variance in the data is not a perfect reflection of the superficial bias. Thus, Z-scoring is unable to fully correct the superficial bias in our simulations. Similarly, L2 normalization assumes that averaging the voxel response across time can accurately capture the superficial bias, an assumption that would only hold true without additive noise. Finally, multivariate pattern analysis methods (SVM and LDC) fail to account for the superficial bias in the data because relatively greater contrast to noise ratios (due to superficial bias) leads to more robust contrast and better classification/LDC values. These metrics are only expected to correct layer bias successfully if all noise components scale with layer depth; a noise model that would be implausible for high-resolution fMRI where substantial additive thermal noise is inevitable.
Interestingly, LDC and SVM showed different sensitivities to superficial bias vs. attentional modulation in the simulations (Figure 4). As we have observed previously (Huang et al., 2018), LDC results had smaller variability across iterations, reflecting greater stability in the estimates. However, LDC also demonstrated higher sensitivity to the laminar bias. We believe that this is primarily caused by the performance of the SVM classifier being close to 100% in the superficial layer. This ceiling effect results in a non-linear relationship between the functional activation and classification accuracy and partially masks the superficial bias. By contrast, LDC is expected to scale linearly with response magnitude (Arbuckle et al., 2019), which in this context means it is better able to capture the full extent of the superficial bias. A repeated analysis with higher noise levels confirmed that SVM and LDC metrics performed more similarly when the SVM classifier's performance was brought down from ceiling levels.
Our numerical simulations (see text footnote 3) may be useful to test yet other approaches to correcting for superficial bias. Nonetheless, the code makes several simplifying assumptions for ease of calculation and generalization, which we list here for clarity. Firstly, we assumed that neurons are purely responsive to faces or houses only, and that the voxel BOLD response is a simple sum of populations of face-and house-cells within the voxel. Secondly, we assumed that attention modulates responsive cells only by applying a gain factor and there is no constant additive component of attention. Thirdly, superficial bias was modeled as a gain factor on responses. Fourthly, we only modeled two sources of noise: at the level of the true hemodynamic response and at the level of measurement of that response, with only the former noise term being scaled by the laminar bias. Moreover, while we introduced some spatial covariance in this noise across voxels (which could reflect, for example, effects of head motion), we did not explore the degree of this spatial covariance, nor explore effects of the temporal autocorrelation known to exist in fMRI data (e.g., from physiological noise sources). While it is not obvious that changes in these assumptions would affect the current conclusions, this may be worth exploring in future simulations. Note also that we set the parameters of the simulation to approximately match the ratio of activations in the two conditions from the voxels in real 7T data. Supplementary Figure 3 shows that the same pattern of results emerges across a range of values for the signal and noise within voxels. One might also wonder whether the results depend on the number of voxels, e.g., for smaller ROIs. When we explored a range 2500 down to 100 voxels, we did not observe any differences in the relative performance of the metrics (only the expected increase in overall variance for all metrics). Thus while other parameter settings could produce different results, we think that it is unlikely that the other four metrics would outperform ROI ratio and Deming regression within the range of realistic parameters.
A final key limitation of our numerical simulations is that they are based on a multiplicative scaling model of superficial bias (Kashyap et al., 2017;Huber, 2020). While we believe this is more plausible than the linear-offset model discussed by Huber in the above reference, there are more sophisticated models that consider leakage effects (Markuerkiaga et al., 2016;Havlicek and Uludag, 2020). Based on our anatomical understanding of the basis for superficial bias, it is likely that a more complex model consisting of both multiplicative effects (variations in BOLD response parameters across layers) and leakage effects (presence of draining veins across layers) is needed to fully capture the intricacies of the superficial bias. However, the implementation of such a model, specifically the leakage effects, would require making specific assumptions of the vascular architecture that are likely to vary across different ROIs, and are not universally applicable. Moreover, modeling the leakage effects requires substantially more parameters to be defined, such as the radius of ascending veins, volume of capillaries, baseline intravascular and extravascular relaxation rate and their signal ratio, relative amounts of CBV in the microvasculature and ascending veins, among many others. These parameters either have to be obtained experimentally through additional measurements, or assumed using a priori knowledge of the vascular structure and associated parameters. In contrast, the multiplicative model, and in turn the ratio and Deming Regression methods, relies on fewer assumptions, and is likely to generalize across a variety of brain regions. This simple multiplicative model can serve as a preliminary filter: methods that fail to account for the multiplicative effect of superficial bias is unlikely to perform well in a more complex model with both multiplicative and leakage effects, even if they successfully account for leakage effects.
While more complicated models and methods may be necessary to extract laminar profile for responses of a single condition vs. baseline, we suspect that when the interest is in the difference between two or more conditions, taking an ROI ratio or estimating the slope of a Deming Regression is a simpler and effective way of removing the scaling component of the superficial bias effect. However it must be kept in mind that leakage effects might not be corrected by these methods as they are not simulated in the model. This limitation is particularly pressing when differences between layers are not found, since this can potentially reflect leakage.
We also applied the seven correction methods to the GE-fMRI 7T data acquired on the same paradigm. The uncorrected data showed the characteristic increase in BOLD signal toward superficial layers (Kok et al., 2016;de Hollander et al., 2020). This superficial profile remained after Z-scoring, L2 normalization and applying SVM or LDC, while the Voxel Ratio produced an extremely variable estimate, suggestive of greatest modulation in the middle layer. However, the Deming Regression or ROI Ratio metrics produced a largely flat profile across layers, with low variability across participants compared to the other metrics. These results are broadly consistent with the numerical simulations, and suggest that apparent modulations in selectivity across layers in the initial regional-mean contrasts can be explained in terms of superficial bias in the GE-fMRI signal, rather than a difference in attentional modulation as such. However, the small sample size of the initial study reported here prevents us from drawing strong inferences about the magnitude of layer-specific effects of attention in the population.
Our failure to observe a layer-specific effect of attention contrasts with other findings using similar manipulations of "top-down" processes (Muckli et al., 2015;Kok et al., 2016;Lawrence et al., 2019). These studies are quite heterogeneous in experimental design and have yet to be independently replicated, so completely consistent findings may not be expected. However, we note that some of these results were obtained using metrics that our numerical simulations suggest do not successfully correct superficial bias, which may provide a further explanation for any discrepancies. In particular, we believe that inadequate correction for superficial bias may explain the superficial profile reported by Lawrence et al. (2019) for spatial attention with a Z-scoring metric, and the superficial profile reported by Muckli et al. (2015) for SVM classification of visual content outside the voxelwise receptive field. Thus, it would be difficult to draw any conclusions as to the presence of any neuronal response differences across layers based on those results. By contrast, Kok et al. (2016) reported relatively stronger selectivity for illusory contours in deep layers, a result that runs opposite to the expected direction of a superficial bias effect. Such a pattern is not expected by superficial bias-instead the selectivity estimates in Kok et al. (2016) would likely be stronger after correction for superficial bias. In general, we urge caution in interpreting GE fMRI layer analyses where superficial bias has not been appropriately corrected.
For completeness, we would also like to highlight that this is a rapidly growing field and newer methodologies, such as phase regression (Stanley et al., 2021), are constantly emerging to address the problem of superficial bias in GE fMRI. In their paper, Stanley et al. (2021) utilized the phase differences to estimate the contribution from vessels, and were able to generate similar laminar profiles to those observed using SE fMRI.

CONCLUSION
In this study, we demonstrate that Deming regression and ROI ratio can adjust for superficial bias across cortical layers. This is important because there is a growing wealth of high resolution, GE-fMRI data that can be used for laminar analysis, but a lack of consensus on the optimal way to analyze these data, which could partially explain the variance in conclusions drawn from these data. By proposing robust yet simple analysis methods to remove the superficial bias, such as the use of Deming Regression, we hope these will help reconcile results across different studies, and allow researchers to probe deeper into the function of neuronal activity in different cortical layers.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://github.com/ MRC-CBU/LaminafMRIsimulations/releases/tag/1.2.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Cambridge Psychology Research Ethics Committee. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the relevant individuals for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
PH, MC, RH, and JC designed the experiments and analyzed the data. CR and CTR set up the acquisition protocols and assisted with data acquisition. PH wrote the manuscript. All authors reviewed the manuscript.

FUNDING
This research was funded by the National Institute for Health Research (NIHR) Biomedical Research Centre (BRC). The BRC is a partnership between Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge, funded by the NIHR. CTR is funded by the Wellcome Trust and the Royal Society (098436/Z/12/B). MC and JC were supported by MRC Intramural grant SUAG/058 G101400. RH was supported by MRC Programme Grant SUAG/046 G101400.

ACKNOWLEDGMENTS
The authors are grateful to all the radiographers at MRC Cognition and Brain Sciences Unit, University of Cambridge and the Wolfson Brain Imaging Centre, University of Cambridge for their assistance in performing the scans and to the volunteers for participating in the experiment. The authors would also like to thank Kendrick N. Kay for helpful comments on a previous version of the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.

2021.715549/full#supplementary-material
Supplementary Figure 1 | Plots of the simulated measured response (A) and attentional modulation metrics (Deming regression, Voxel ratio, ROI ratio, Z-scoring, L2 norm, SVM classification, and LDC, B-F) across different layers where voxels with the bottom 30% tSNR were excluded. These bars represent the median of all six participants, with each set of joint circles represent an individual subject. The same color represents the same participant throughout all plots. The axis range of panel (B) has been restricted due to extreme outliers for the voxel ratio. Note that these results are similar to that of Figure 7, suggesting that the exclusion of low tSNR voxels does not change the findings.
Supplementary Figure 2 | Comparison of mean (A) and standard deviation (B) of t-statistic of the voxels in the ROI for real and simulated data. The error bars indicate the standard deviation of the summary metrics of the voxels across participants/iterations. Figure 3 | Plots of the simulated response and attentional modulation metrics (Deming regression, Voxel ratio, ROI ratio, Z-scoring, L2 norm, SVM classification, and LDC) across different layers for different simulation parameters [reduced signal (A), increased signal (B), reduced noise (C), and increased noise (D)]. The general trend is similar to that of the parameters used in the main paper (which were σ face = 1.1, σ house = 0.5, σ phsiological = 11, and σ thermal = 15). Key differences are that Deming regression becomes noisier with low signal or high noise while L2 normalization is able to remove the majority of superficial bias at high SNR. . These bars represent the median of all six participants, with each set of joint circles represent an individual subject. The same color represents the same participant throughout all plots. Note that the results are similar to that of Figure 7, except with increase variance between participants. These results justify the pooling of the ROIs to reduce the impact of noise.