What kind of noise is brain noise: anomalous scaling behavior of the resting brain activity fluctuations

The study of spontaneous fluctuations of brain activity, often referred as brain noise, is getting increasing attention in functional magnetic resonance imaging (fMRI) studies. Despite important efforts, much of the statistical properties of such fluctuations remain largely unknown. This work scrutinizes these fluctuations looking at specific statistical properties which are relevant to clarify its dynamical origins. Here, three statistical features which clearly differentiate brain data from naive expectations for random processes are uncovered: First, the variance of the fMRI mean signal as a function of the number of averaged voxels remains constant across a wide range of observed clusters sizes. Second, the anomalous behavior of the variance is originated by bursts of synchronized activity across regions, regardless of their widely different sizes. Finally, the correlation length (i.e., the length at which the correlation strength between two regions vanishes) as well as mutual information diverges with the cluster's size considered, such that arbitrarily large clusters exhibit the same collective dynamics than smaller ones. These three properties are known to be exclusive of complex systems exhibiting critical dynamics, where the spatio-temporal dynamics show these peculiar type of fluctuations. Thus, these findings are fully consistent with previous reports of brain critical dynamics, and are relevant for the interpretation of the role of fluctuations and variability in brain function in health and disease.


INTRODUCTION
It is now recognized that important information can be extracted from the brain spontaneous activity, as exposed by recent analysis (Biswal et al., 1995;Fox and Raichle, 2007;Smith et al., 2009). For instance, the structure and location of large-scale brain networks can be derived from the interaction of cortical regions during rest which closely match the same regions responding to a wide variety of different activation conditions Smith et al., 2009). These so-called resting state networks (RSNs) can be reliably computed from the fluctuations of the blood oxygenated level dependent signal (BOLD) signals of the resting brain, with great consistency across subjects (Xiong et al., 1999;Cordes et al., 2000;Beckmann et al., 2005) even during sleep (Fukunaga et al., 2006) or anesthesia (Vincent et al., 2007).
In the same direction, the information content of the brain BOLD signal's variability per se is receiving increasing interest. Recently (Garret et al., 2010) it was shown in a group of subjects of different age, that the BOLD signal variability (standard deviation) is a better predictor of the subject age than the average. Furthermore, additional work focused on the relation between the fMRI signal variability and a task performance, concluded that faster and more consistent performers exhibit significantly higher brain variability across tasks than the poorer performing subjects (Garrett et al., 2011). Overall, these results suggest that the understanding of the brain resting dynamics can benefit from a detailed study of the BOLD variability per se.
In this work we characterize the statistical properties of the spontaneous BOLD fluctuations and discuss its possible dynamical mechanisms. The paper is organized as follow: in the next section the origin of the data is described as well the pre-processing of the signal. The definitions of regions of interest is described as well as how to construct subsets of different sizes, needed to compute fluctuations. The results section starts with the analysis of the average spontaneous fluctuations for each RSN, which identify anomalous scaling of the variance as a function of the number of elements. Next, this anomaly is explored to determine its origins by studying in detail the temporal correlations in clusters of different sizes. Finally the analysis of the correlation length is described, revealing a distinctive divergence with the size of the cluster considered. The paper close with a discussion of the relevance of the uncovered anomalous scaling for the current views of large scale brain dynamics. For clarity of presentation, the calculations that are not central to the main message of the paper, are presented separately in an Appendix.
whole-body scanner with echo-planar imaging capability and the standard radio-frequency head coil. Subjects were scanned following a typical brain resting state protocol  lying in the scanner and asked to keep their mind blank, eyes closed, and avoid falling asleep. All participants gave written informed consent to procedures approved by the IRB Committee of the University of Islas Baleares (Mallorca, Spain) who approved the study.

IMAGE PRE-PROCESSING AND ANALYSIS
In each subject, 240 BOLD images, spaced by 2.5 s, were obtained from 64 × 64 × 49 voxels of dimension 3.4375 mm × 3.4375 mm × 3 mm. Pre-processing was performed using FMRIB Expert Analysis Tool [FEAT, Jezzard et al., 2001, http:// www.fmrib.ox.ac.uk/fsl], involving motion correction using MCFLIRT; slice-timing correction using Fourier-space timeseries phase-shifting; non-brain removal using BET; spatial smoothing using a Gaussian kernel of full-width-half-maximum 5 mm. Brain Images were normalized to standard space with the MNI 152 (average brain image at Montreal Neurological Institute) template using FLIRT (http://www.fmrib.ox.ac.uk/ analysis/research/flirt) and resampled to 2 × 2 × 2 mm resolution. Data was band pass filtered (0.01 Hz-0.1 Hz) using a zero lag filter to avoid scanner drift and high frequency artifacts.

CHOICE OF REGIONS OF INTEREST
It is known that the brain activity fluctuations at rest exhibit large-scale spatial correlations. The presence of these robust correlations is reflected on the coherent activity which determine the spatial domains of the RSN. Therefore, our analysis is focussed on the statistical analysis of the RSN fluctuations. At least since Beckmann et al. (2005). Probabilistic Principal Component Analysis (PICA) is used to identify the eight most relevant RSN. Each component corresponds to a characteristic time series, and its respective spatial Z-score map. Under a Gaussian/Gamma mixture model these Z-maps were thresholded in order to find the locations of the voxels which significantly contribute to each of the eight time-courses [see Figure 6 in Beckmann et al. (2005)] and used to define the clusters here. This is illustrated in Figure 1A, where the depicted regions correspond to the territory covered by each of the RSN extracted in Beckmann et al. (2005) using ICA techniques. For each independent component Z-map we arbitrarily set a threshold that segment the map into isolated regions of different sizes (see Figure 1B). The criteria to select regions is arbitrary, but the present results are independent of the selection criteria, as long as the regions belong to the same RSN. Alternatively, functional areas (such as Brodmann areas) can be used to define clusters of different sizes (as for a portion of the results in Figure 3). We proceed by using a spatial mask for each of the eight networks to extract the time series of the BOLD fMRI time series. The masks, in Figure 1, correspond to the eight most important RSN, namely the visual medial (box a) and lateral (b) cortical areas, the auditory (c), sensory motor (d), default mode (e), executive control (f), and the frontoparietal right (g) and left (h) regions. Each network is comprised by a variable number of spatial clusters, each cluster composed by a variable number of voxels. For instance the visual RSN (VIS) includes just three relatively large clusters, each one composed by thousand of voxels, contrasting with the Fronto-Parietal Left (FPL) network which involves seven clusters with sizes ranging from a few up to thousands of voxels. Table 1 shows the thresholds used in each independent component and how many regions have been defined. The results presented in this paper are independent of the particular value of threshold used.

RESULTS
To analyze the noise properties, we look at the behavior of the variance and correlations under various manipulations of the size of the ensemble of voxels where these fluctuations occurs. This is a common strategy in other statistical physics problems where very distinctive scaling behavior can be observed depending of the type of fluctuations the system is able to exhibit (Stanley, 1987).

ANOMALOUS SCALING OF THE VARIANCE
We start by studying the fluctuations of the BOLD signal around its mean. The signal of interest, for the 35 RSN clusters, is defined as where x i represents the position of the voxel i that belongs to the cluster H of size N H . These signals will be used to study the correlation properties of the activity in each cluster. The mean activity of each h cluster is defined as and its variance is defined as where B = 1 T T t=1 B(t) and T the number of temporal points. Please notice that the average subtracted in Equation 1 is the mean at time t (computed over N voxels) of the BOLD signals, not to be confused with the BOLD signal averaged over T temporal points.
Since the BOLD signal fluctuate widely and the number N of voxels in the clusters can be very large, one might expect that the aggregate of Equation 1 obeys the law of the large numbers. If this was true, the variance of the mean field σ 2 B(t) in Equation 3 would decrease with N as N −1 . In other words one would expect a smaller amplitude fluctuation for the average BOLD signal recorded in clusters [i.e., B(t)] comprised by large number of voxels compared with smaller clusters. However, the data in Figure 2 shows otherwise, the variance of the average activity remains approximately constant over a change of four orders of magnitude in cluster' sizes. The strong departure from the N −1 decay is enough to disregard further statistical testing. Nevertheless, we test a null hypothesis recomputing the variance for artificially constructed clusters having similar number of voxels but   composed of the randomly reordered B k (t) BOLD raw time series (panels in Figure 2A denoted "Shuffled"). As expected, in this case the variance (plotted using squares symbols in Figure 2B) obeys the N −1 law (dashed line in Figure 2B). The variance of the average BOLD signal is directly proportional to the coordination between the voxels involved. In particular, under the hypothesis that the BOLD signal of voxel k, B k (t), is a stationary stochastic process (indexed by time t) with E(B k (t)) = μ k , and Var(B k (t)) = σ 2 k , the variance of the average signal is maximum in the case where there exist perfect coordination (i.e., all BOLD signals are perfectly synchronized). In this last case, the value of ) as a function of N.
The inset of Figure 3 (circles) shows that this maximum value does not depend on N, i.e., the mean value of the variance of the BOLD signal from a region does not depend on its size. Now we ask how far from its maximum value is the observed variance of the BOLD average signal. In particular, we compute the quotient for this purpose. As it is shown in Figure 3 (empty circles) the value of q decreases rather slowly with the size of the cluster. In order to distinguish how much of the constancy of the variance demonstrated up until now is related with the fact that the time series belong to clusters that are independent components (Beckmann et al., 2005) we repeated the scaling analysis using clusters defined by the Brodmann areas. The results in Figure 3 confirm the same anomalous scaling behavior demonstrated for the regions selected from the RSN networks, as shown by the values ofσ 2 B(t) and q for the Brodmann areas (filled triangles). As before, we control the expected 1/N scaling for independent time series by computing the quotient q for clusters of various sizes constructed from a random selection of voxels. This is shown by the filled square points in Figure 3.

TEMPORAL FLUCTUATIONS AND SPATIAL CORRELATIONS
For spatio-temporal signals the relationship between the temporal fluctuations of the average signal and its space correlation function is well defined (Ross, 1996). In our case, for the normalized (see Appendix) BOLD signals, Z i (t) (Var(Z k (t)) = 1 and E(Z k (t)) = 0), the relationship is: Where C is the mean spatial correlation, ρ i,j the correlation between voxels i and j, and σ 2 is the variance of the average signal (defined in Equation 3). Equation 6 shows that the variance of the mean activity depends on the size of the region, and on C , which is determined by the shape of the correlation function, C(r) (see Appendix for a formal discussion).
Equation 6 suggest that it can be productive to investigate the correlations properties of the BOLD data. The point to clarify is whether the average spatial correlation C is constant throughout the entire recordings or, alternatively, its average value is the product of a combination of some instances of high spatial coordination intermixed with moments of dis-coordination. The relevance of this distinction, which will be further discussed latter, is to establish up to which point correlations are dictated by the structural (i.e., fixed) connectivity or by the dynamics. In order to answer this question we study the mean correlation ( C ) as a function of time for regions of interest of various sizes. In particular, we compute this value using Equation 7 but estimating ρ i,j for non-overlapping periods of 10 temporal points. Figure 4 shows the behavior of C over time for four different cluster's sizes. Notice that, in all cases, there instances of large correlation followed by moments of week coordination, as those indicated by the arrows in the uppermost panel. We have verified that this behavior is not sensitive to the choice of the length of the window in which C is computed (see the Appendix). These bursts keep the variance of the correlations almost constant (i.e., in this example, there is a minor decrease in variance (by a factor of 0.4) for a huge increase in size (by a factor of 170). This peculiar behavior of the correlation is observed for any of the cluster sizes as shown in the bottom panel of Figure 4 where the variance of C is approximately constant, despite the four order of magnitude increase in sizes.
The results of these calculations implies that independently of how large the size of the cluster considered, there is always an instance in which a large percentage of voxels are highly coherent and another instance in which each voxels activity is relatively independent.
A very metaphorical way to visualize the behavior of the correlations is to think of the patterns of spontaneous activity as "clouds" of relatively higher activity moving slowly throughout the brain's cortex. Thus, the moments of large coordination shown in Figure 4 correspond to the passage of a "cloud" throughout the entire region, regardless of how large the region is.

DIVERGENCE OF THE CORRELATION LENGTH
The results in the previous paragraphs indicate that the anomalous scaling of the variance can be related to dynamical changes in the correlations. A straightforward approach to understand the correlation behavior commonly used in large collective systems (Cavagna et al., 2010) is to determine the correlation length at various system's sizes. The correlation length is the average distance at which the correlations of the fluctuations around the mean crosses zero. It describes how far one has to move to observe any two points in a system behaving independently of each other. Notice that, by definition, the computation of the correlation length is done over the fluctuations around the mean, and not over the raw BOLD signals, otherwise global correlations may produce a single spurious correlation length value commensurate with the brain size. Thus, we start by computing for each voxel BOLD time series their fluctuations around the mean of the cluster that they belong. Recall the expression in Equation 1: where B is the BOLD time series at a given voxel and x i represents the position of the voxel i that belongs to the cluster H of size N H . By definition the mean of the BOLD fluctuations of each cluster vanishes, Next we compute the average correlation function of the BOLD fluctuations between all pairs of voxels in the cluster considered, which are separated by a distance r: 10) where u is a unitary vector, and . w represent averages over w. The typical form we observe for C(r) is shown in the top panel of Figure 5. The first striking feature to note is the absence of a unique C(r) for all clusters. Nevertheless, they are qualitatively similar, being at short distances close to unity, to decay as r increases, and then becoming negative for longer voxel-to-voxel distances. Such behavior indicates that within each and any cluster, on the average, the fluctuations around the mean are strongly positive at short distance and strongly anti-correlated at larger distances, whereas there is no range of distance for which the correlation vanishes.
The most notorious result is the fact that correlations decay with distance slower in larger clusters than in relatively smaller clusters, giving rise to the family of curves shown in Figure 5 (top panel). This is condensed in the calculation of the correlation length ξ, which is the zero of the correlation function, C(r = ξ) = 0 (as in the example shown by the arrow in Figure 5, top). The correlation length diverges with the size of the cluster, as demonstrated in the middle panel of Figure 5. This divergence extends up to the size of the brain, as shown by the ξ values (red squares in middle panel of Figure 5) computed for the eight unpartitioned RSN. Note that while the existence of a zero crossing in C is warranted by the subtraction of the mean cluster activity (in Equation 8), its divergence with cluster size is not.

MUTUAL INFORMATION
Although the present observations can be appropriately described solely in terms of correlations, the same concept can be also casted in terms of information measures, which are often used to estimate the degree of coherence between regions or neural structures. The mutual information between any two X and Y time series from different brain voxels is defined as: where H(X) is the entropy of X and H(X|Y ) is the entropy of X given Y computed as usual (Press et al., 1988). In principle, given the behavior observed for the correlations, these information measures should exhibit scale-invariant scaling as well. This is confirmed by the results in Figure 6, which demonstrate that the average mutual information is not affected by the size of the cluster considered, since information decays slower in larger clusters. This analysis shows that, as was shown for the

DISCUSSION
In this work, key statistical properties of the brain BOLD signal variability were investigated. The results are relevant to the understanding of the brain spontaneous activity fluctuations in health and disease. The three most relevant findings that we may discuss are: • the variance of the average BOLD fluctuations computed from ensembles of widely different sizes remains constant, (i.e., anomalous scaling); • the analysis of short-term correlations reveals bursts of high coherence between arbitrarily far apart voxels indicating that the variance' anomalous scaling has a dynamical (and not structural) origin; • the correlation length measured at different regions increases with region's size, as well as its mutual information.
Concerning the constant variance of the BOLD activity, the present results imply that the usual framework in which the BOLD signal and noise are discussed need to be reconsidered. For instance, it is commonplace to consider that the non-coherent part of the activity (i.e., the noise) can be averaged out by enlarging the spatial (i.e., more voxels) or temporal (i.e., more samples) scale. The presence of anomalous scaling implies that signal and noise in the brain are at least ill defined and that filtering by averaging (to improve its quality) signals with anomalous variance, by definition, can be anomalous as well. The anomalous scaling also has implication for the monitoring of the RSNs activity, a topic that has received wide attention recently for its potential to track healthy or pathological conditions. The results here imply that, under these anomalous conditions, the signal of a few voxels can be, asymptotically, as representative and informative as the average of the entire RSN. It need to be noted, that the anomalous scaling discussed here due to the emergence of collective dynamics is not new, Kaneko (1990) demonstrated the breach of law of large numbers in numerical models more than two decades ago. The second finding, showing that the observed dynamical short-term changes in the correlations drives up the variance, is relevant for the interpretation of the brain functional connectivity. The evaluation of functional connectivity between regions often uses the average correlation, and the results in Figure 4 show that, despite the relatively weak average functional connectivity values, it can be instances in which the correlation reaches high levels. In other words, under the demonstrated anomalous scaling conditions, the usual pairwise measures has inherent limitations for the proper interpretation of these collective states. In passing, it need to be noted that these instances of high coherence were recently confirmed using a different method, which demonstrate avalanches of activity encompassing relatively large regions of each RSN (Tagliazucchi et al., 2012). Of course, the role of these epochs of transient synchronous states in driving perception, awareness, and consciousness are consistent with the views championed by Varela and coworkers more than a decade ago (Varela et al., 2001) as discussed recently (Werner, 2010).
The third result concerning the divergence of the correlation length with increasing cluster size is perhaps the most telling one, because is in contrast with the prevailing viewpoints about brain functional connectivity. Indeed it is implicit in the interpretation of functional connectivity studies the notion that brain activity propagates between and across brain regions. However, for such propagating waves a constant correlation length (i.e., its wavelength) is always expected, which is not what it is consistently found in the present data. The divergence of correlations with size (and its associated anomalous scaling) suggests, in addition, that our current mathematical approaches to model cortical dynamics could be ill-fated. The issue is that most of the large scale models [for superb reviews see Rolls and Deco, 2010;Sporns, 2010] are defined by an adjacency matrix specifying the "structural connectivity" between a large number of regions and some kind of neural dynamics assigned to each node (i.e., cortical region). Lets imagine that such model is scaled up by increasing the number of regions an order of magnitude, while the correlation length of the activity fluctuations is measured as in the experiments here. A reasonable conjecture is that current largescale brain models would have problems to replicate the present findings, since anomalous scaling only appears at criticality (discussed below) while current models are purposely tuned to the ordered regime.
Finally, an important question is concerned with the origin of the statistical properties unveiled in this work. We suggest that a candidate explanation which is able to unify all the observations presented here can be found in the context of critical phenomena (Stanley, 1987;Bak, 1996;Christensen and Moloney, 2005). It is well known that dynamical systems composed of very large number of interacting non-linear elements, under some conditions, exhibit emergent collective behavior with ubiquitous properties (Anderson, 1972). Examples of emergent phenomena sharing common features are the collective dynamics of birds in a flock (Cavagna et al., 2010), spins of a magnet (Stanley, 1987), water molecules in the atmosphere (Peters and Neelin, 2006), peoples financial decisions (Lux and Marchesi, 1999), or ants traffic in a foraging swarm (Rauch et al., 1995;Beekman et al., 2001). In all these cases, each agent in isolation may have its own stereotypical behavior, but when placed to interact in very large numbers, and under certain conditions, the entire system will drift toward a type of collective dynamics which lies in between complete order and complete disorder. At this point [known as an order-disorder phase transition (Stanley, 1987;Chialvo, 2010)] the collective dynamics of the system exhibit distinctive universal properties. Amongst them, the most significant common features include the divergence of correlations, the anomalous scaling, and the presence of moments of high coordination seen here for the RSN fluctuations. Since the emergence of these properties require conditions near an order-disorder phase transition, its observation it is often considered a distinctive signature of critical dynamics, as reported recently by Cavagna et al. for sterling flock dynamics (Cavagna et al., 2010). In particular, it is known that only near a critical point ξ can grow with system size, where the collective global effects overcomes the individuals dynamics, resulting in the emergence of correlated domains of arbitrary size, where information propagates equally well up to the size of the entire system.
In summary, the analysis of the BOLD' fluctuations of the resting brain shows anomalous statistical properties, bursts of highly correlated states and divergence of correlation length, which are dynamical properties known to be found only near a critical point of a phase transition. These findings are fully consistent with previous reports of large-scale brain critical dynamics (Fraiman et al., 2009;Kitzbichler et al., 2009;Chialvo, 2010;Expert et al., 2011;Tagliazucchi et al., 2012) and may be one answer to the question in the title in the sense that brain noise corresponds, rigorously speaking, to the type of (spatial and temporal) fluctuations only observed in systems near criticality. This view may be relevant for the interpretation of the role of fluctuations and variability in brain function in health and disease.

APPENDIX
Additional information is provided here to supplement the main results. The first item is concerned with the robustness of the short-term correlations presented in Figure 4. The second point deals with the generality of the divergence of correlations and the last one discuss formally the presence of long-range correlations in the fMRI data.

SHORT-TERM CORRELATIONS
As discussed in Figure 4, the presence of bursts of high and low correlations observed throughout clusters of very different size is the dynamical base for the violation of the law of the large numbers. It is then important to demonstrate that the estimation of the short-term correlations' variance is robust. For that, we recomputed the results in Figure 4 for various window lengths. This is presented in Figure A1 which shows that the variance of C is independent of N regardless of the window length at which it is estimated.

ξ SCALING
The divergence of correlation length discussed in Figure 5 predicts a functional dependence ξ ∼ dN 1/3 , i.e., ξ grows linearly with the average cluster' diameter d. The results in Figure A2, obtained from the analysis of fMRI data from four different subjects, confirm such scaling relation.

LONG-RANGE CORRELATIONS
In spatio-temporal data it is well known the relationship between the temporal fluctuations of a mean magnitude and the space correlation function. Let suppose we want to study a brain region (our clusters in the main text) of N voxels. Denote a voxel of the region as i which is characterized by its position in space ( − → r i ), and by its dynamics represented in the BOLD signal [B i (t)]. In addition, to simplify the notation we are going to work here with normalized BOLD signals, where has now zero mean and variance one. The average signal over the region, which is: fluctuates in time. Our interest here are the fluctuations of Z(t).
It can be shown, using the definition of the variance of a sum of random variables, that the variance of the average signal of the cluster is: where C is the mean spatial correlation, Var(Z) = 1 T T t = 1 (Z(t) − Z) 2 and Z = 1 T T t = 1 Z(t). Since we are interested also on how correlations affect variance, let consider some cases. If there exist null variability between all the voxels in the region, that is all voxels of the region do exactly the same in time, the left term of Equation 14 remains equal to one no matter the size (N) of the region is. In any other case Var(Z) will be less than one. The variance of the mean activity depends on the size of the region, and on C , which is determined by the shape of the correlation function, C(r). Therefore, in order to understand the asymptotic behavior of Var(Z) with N we need to make some hypothesis over C(r).
First, the mean correlation, is approximated by its continuous version where r * is the radius of the spherical region under study, and V its volume. Now, we discuss some hypothesis about the asymptotic behavior of C(r). For example, if there exist an exponential decay, then the mean correlation satisfies: In the case where long-range correlations are present, 10 1 10 2 10 3 10 1 10 2 10 3 10 1 10 2 10 3 FIGURE A2 | Estimation of the correlation length ξ divergence. In all cases the results are very close to the scaling found in Figure 5; ξ ∼ dN 1/3 . the mean correlation satisfies: Putting all together in Equation 21, the spatial decay of the fMRI data correlations will be given by the product of N by Var(Z) as a function of N, leading to two very different asymptotic statistical behavior: For long-range correlations NVar(Z) ∼ N 1−α/3 For short-range correlations NVar(Z) ∼ k.
(21) Figure A3 shows N.Var(Z) as a function of N for brain data. The straight line in the log-log plot confirm that in the brain data there exist long range correlations. In particular, we obtain a exponent α = 0.9 (for C(r) ∼ 1 r α ) which agrees very well with the result recently obtained by Expert et al. (2011). For completeness we plot also the results of numerical calculations using an exponential correlation function which clearly depart from the brain data. Black circles correspond to brain data and red squares to the results from an exponential interaction model with the same geometry for each cluster. In the exponential case as N grows the average correlation converges, meanwhile for the brain data it continues growing demonstrating the presence of long-range correlations in the data. The black line corresponds to a power law fit y = kx −0.7 . From Equation 21 we obtain an exponent α = 0.9. The inset corresponds to the variance of the mean activity as a function of N.