Empirical and Theoretical Aspects of Generation and Transfer of Information in a Neuromagnetic Source Network

Variability in source dynamics across the sources in an activated network may be indicative of how the information is processed within a network. Information-theoretic tools allow one not only to characterize local brain dynamics but also to describe interactions between distributed brain activity. This study follows such a framework and explores the relations between signal variability and asymmetry in mutual interdependencies in a data-driven pipeline of non-linear analysis of neuromagnetic sources reconstructed from human magnetoencephalographic (MEG) data collected as a reaction to a face recognition task. Asymmetry in non-linear interdependencies in the network was analyzed using transfer entropy, which quantifies predictive information transfer between the sources. Variability of the source activity was estimated using multi-scale entropy, quantifying the rate of which information is generated. The empirical results are supported by an analysis of synthetic data based on the dynamics of coupled systems with time delay in coupling. We found that the amount of information transferred from one source to another was correlated with the difference in variability between the dynamics of these two sources, with the directionality of net information transfer depending on the time scale at which the sample entropy was computed. The results based on synthetic data suggest that both time delay and strength of coupling can contribute to the relations between variability of brain signals and information transfer between them. Our findings support the previous attempts to characterize functional organization of the activated brain, based on a combination of non-linear dynamics and temporal features of brain connectivity, such as time delay.


INTRODUCTION
Recently, significant progress has been made showing that cognitive operations result from the generation and transformation of cooperative modes of neural activity (Bressler, 1995(Bressler, , 2002McIntosh, 1999). Specifically, the progress in this field was based on the principle that emphasizes the integrative capacity of the brain in terms of ensembles of coupled neural systems (Nunez, 1995;Jirsa and McIntosh, 2007). In turn, we have witnessed advances both in the modeling endeavors to explore brain integration and the collection of empirical evidence in support for this integration.
From the theoretical point of view, the neural ensembles can be represented by single oscillators (Haken, 1996). Further, different neural ensembles can be coupled with long-range connections, forming a large-scale network of coupled oscillators. Due to the separation of sources in the space and limited transmission speeds, communication between brain regions may include time delays. Thus, the coupling between two nodes in a brain network can be characterized by the connection strength, directionality, and time delay. In turn, time delays in coupling can influence the dynamical properties of coupled oscillatory models (Niebur et al., 1991). Encouraging results were obtained in modeling the resting state network dynamics wherein time delays play a crucial role in generation of the realistic fluctuations in brain signals (Ghosh et al., 2008;Deco et al., 2009).
At the same time, from the perspective of empirical analysis, recently developed non-linear tools were able to characterize variability of local brain dynamics and interaction effects of distributed brain activity (see Stam, 2005 for a review). Informationtheoretic techniques provide a model-free non-linear approach to address both issues (Pereda et al., 2005;Vakorin et al., 2011).
First, such techniques can be used to characterize the variability in brain signals as a consequence of more complicated neural processing. A typical application includes a comparative analysis of different groups, for example, in brain development  or clinical versus normal populations (Stam, 2005), or different conditions within the same groups (Lippé et al., 2009). Traditionally, the analysis is performed at the level of electroencephalographic (EEG) or magnetoencephalographic (MEG) scalp measurements that do not directly represent localized brain regions in the vicinity of one electrode due to volume conduction (Nunez and Shrinivasan, 2005). The translation to source space would be a logical extension, and it has been recently shown that Frontiers in Systems Neuroscience www.frontiersin.org entropy-based techniques are sensitive enough to discriminate the variability of neural activity within a network of sources (Mišić et al., 2010;Vakorin et al., 2010b). Second, a number of studies have explored methods of assessing linear and non-linear interactions between dynamics of neuronal sources, reconstructed using beamformers (Hadjipapas et al., 2005;Vakorin et al., 2010b;Wibral et al., 2011). Analyses of asymmetries in non-linear interdependency between different brain areas, both in normal and clinical populations, may provide an insight upon processing and integration of information in a neuronal network. The time course of one process may predict the time course of another process better than the other way around. This enhancement in predictive power can characterize the coupling between these two processes (Blinowska et al., 2004;Hlavackova-Schindler et al., 2007). This idea was originally proposed by Granger (1969), who used autoregressive models to describe the interaction between the processes as well as the time courses of the processes themselves. A non-linear extension of the framework of predicting a future of one system from the past and present of another one is based on estimating the information transfer, using information-theoretic tools. Two measures can be used, namely transfer entropy (Schreiber, 2000) or conditional mutual information (Palus et al., 2001), which are essentially equivalent to each other under certain conditions . Transfer entropy has been applied in both EEG (Chavez et al., 2003;Vakorin et al., 2010a) and MEG data Wibral et al., 2011), as well in functional magnetic resonance imaging (Hinrichs et al., 2006).
Differences in signal variability among brain areas constituting an activated network as a reaction to a cognitive or perceptual task, can be indicative of how that task is being processed in the brain (Mišić et al., 2010;Vakorin et al., 2010b). In this study, we explored empirical aspects of the relations between complexity of individual sources constituting a network and the exchange of information between them. The analysis was performed under the assumption that the neuronal ensembles activated in performing the task can be represented by non-linear dynamic systems interacting with each other.
The first part of this study presents a data-driven pipeline for non-linear analysis of neuromagnetic sources reconstructed from human MEG data collected in reaction to face recognition task. Specifically, we first computed the asymmetries in mutual interdependencies between the original MEG sources using the conditional mutual information as a measure of information transfer. We then estimated variability of the MEG sources using the measure of sample entropy. Sample entropy was designed in essence as an approximation to the Kolmogorov entropy (Richman and Moorman, 2000), which can be interpreted as the mean rate of information generated by a dynamic system (Kolmogorov, 1959). Sample entropy can be used to infer the presence of non-linear effects. In practice, however, sample entropy is sensitive not only to non-linear deterministic effects but also to the linear stochastic effects such as, for example, auto-correlation. A number of studies indicate that the information averaged over a larger time horizon can reflect non-linear determinism with higher confidence (Govindan et al., 2007;Kaffashi et al., 2008). Multi-scale entropy represents an approach when sample entropy is estimated at different time scales (Costa et al., 2002). In this study, we explored how the differences in variability of the source dynamics, estimated at fine and coarse time scales, can be explained, in a statistical sense, by an asymmetry in the amount of information transferred from one source to another. In the second part of this study, using synthetic data based on a model of coupled non-linear oscillators with time delay in coupling, we demonstrated how the effects found in the MEG data, may arise from time delayed interactions.

PARTICIPANTS
Twenty-two healthy adults (20-41 years, mean = 25.7 year, 9 female) took part in the study. None of the participants wore any metallic implants or had metal in their dental work and all reported normal or corrected-to-normal vision. Experiments were performed with the informed consent of each individual and with the approval of the Research Ethics Board at the Hospital for Sick Children.

STIMULI AND TASK
Participants performed a one-back task in which they judged whether the currently viewed stimulus was the same as the previous one. The stimulus set comprised 240 grayscale photographs of unfamiliar faces of young adults (2.4˚× 3˚visual angle) with neutral expressions. All faces were without glasses, earrings, jewelry, or other paraphernalia. Male and female faces were equiprobable. In each block of trials, one-third of the faces immediately repeated. Thus, there were 120 new faces that either did or did not repeat on the subsequent trial (N1 and N2, 60 trials each), as well as 60 repeated faces (R) per block (180 faces in total). Upright faces were presented in one block and inverted faces in the other, with the order of the two blocks counterbalanced across participants. For more information on stimulus control please see Taylor et al. (2008). The tasks will be coded as invN1, invN2, invR, upN1, upN2, and upR.

MEG SIGNAL ACQUISITION
The MEG was acquired in a magnetically shielded room at the Hospital for Sick Children. Head position relative to the MEG sensor array was determined at the start and end of each block using three localization coils that were placed at the nasion and bilateral preauricular points prior to acquisition. Motion tolerance was set to 0.5 cm. Surface magnetic fields were recorded using a 151-channel whole-head CTF system (MEG International Services, Ltd., Coquitlam, BC) at a rate of 625 Hz, with a band pass of DC-100 Hz. Data were epoched into [−100 1500] ms segments time-locked to stimulus onset. Structural Magnetic Resonance Imaging (MRI) data were also acquired for each participant. Following the MEG recording session, the three localization coils were replaced by MRI-visible markers and 3D SPGR (T1-weighted) anatomical images were acquired using a 1.5-T Signa Advantage system (GE Medical Systems, Milwaukee, WI).

EXTRACTION OF NEUROMAGNETIC SOURCES
Individual anatomical MR images were warped into a common Talairach space using a non-linear transform in SPM2. Latencies Frontiers in Systems Neuroscience www.frontiersin.org of interest were chosen from the group average event-related fields (ERFs). Source analysis was performed using event-related beamforming (ERB; Robinson and Vrba, 1999;Sekihara et al., 2001;Cheyne et al., 2007), a 3D spatial filtering technique which is used to estimate instantaneous source power at desired locations in the brain. To model the forward solution for the beamformer, multiple sphere models were fit to the inner skull surface of each participant's MRI using BrainSuite software (Shattuck and Leahy, 2002). Activity at each target source was estimated as a weighted sum of the surface field measurements. Weight parameters and the orientation of the source dipole were optimized in the least squares sense, such that the average power originating from all other locations was maximally attenuated without any change to the power of the forward solution associated with the target source. The weights were then used to compute single-trial time series for each source. Two prominent peaks sensitive to facial orientation were observed at 100 ms and 150 ms following stimulus onset ( Figure 1A) and were localized bilaterally to the primary visual cortex ( Figure 1B, sources 1 and 2) and bilaterally to fusiform gyrus ( Figure 1B, sources 3 and 4), respectively. A third, less prominent peak was observed at 220 ms ( Figure 1C) and was most affected by the memory manipulation (i.e., it differed most between the first presentation of a face and its repeat). To avoid any confounding interaction between the effects of face inversion and working memory, the N2-R difference waves were computed and localized separately for Upright and Inverted faces ( Figure 1D, sources 5 and 6, respectively). Both were localized to the anterior cingulate cortex. Thus, neuromagnetic activity was extracted from all six source locations, in all six conditions. For the purpose of this paper, the sources were coded as follows: (1) VIS L ; (2) VIS R ; (3) FUS L ; (4) FUS R ; (5) ACC UP ; (6) ACC INV .

INFORMATION GENERATED BY A SYSTEM
Many complex biophysiological phenomena are due to non-linear effects. Recently there has been an increasing interest in studying complex neural networks in the brain, specifically by applying concepts and time series analysis techniques derived from nonlinear dynamics (see Stam, 2005 for a comprehensive review on non-linear dynamical analysis of EEG/MEG). Various statistics quantifying signal variability based on the presence of non-linear deterministic effects, were developed to compare and distinguish time series. Among others, sample entropy was developed as a measure of signal regularity (Richman and Moorman, 2000). The sample entropy was proposed as a refined version of approximate entropy (Pincus, 1991), compensating for self-matches in the signal patterns. In turn, approximate entropy was devised as an attempt to estimate Kolmogorov entropy (Grassberger and Procaccia, 1983), the rate of information generated by a dynamic system, from noisy and short time series of clinical data.
One approach to non-linear analysis consists of reconstructing the underlying dynamical systems underlying EEG or MEG time series through time delay embedding. Specifically, let x t denote the delay vectors, describing recent history of the observed process x t : where d is embedding dimension, and τ is embedding delay measured in multiples of the sampling interval.
For estimating sample entropy of time series x t , two multidimensional representations of x t are used, as defined by two sets of embedding parameters: {d, τ } and {d + 1, τ }. Typically, the values of the time embedding delay τ are kept equal to 1, measured in data points of a given time series for which sample entropy is to be estimated. Sample entropy can be estimated in terms of the average natural logarithm of conditional probability that two delay vectors (points in a multi-dimensional state-space), which are close in the d-dimensional space (meaning that the distance between them is less than the scale length r), will remain close in the (d + 1)-dimensional space. A greater likelihood of remaining close results in smaller values for the sample entropy statistic, indicating fewer irregularities. Conversely, higher values are associated with the signals having more variability and less regular patterns in their representations.
where j goes from 1 to N − d, and ||·|| stands for the maximum norm distance between two state vectors. Then, averaging across Similarly, the equivalent of which can be averaged across (M − n) points as Sample entropy is defined as Frontiers in Systems Neuroscience www.frontiersin.org was not localized directly from the surface field ERFs, but rather at the latency at which the difference in global field power (GFP) was greatest between the N2 and R conditions (C).
Multi-scale entropy (MSE) was proposed to estimate sample entropy of finite time series at different time scales (Costa et al., 2002). First, multiple coarse-grained time series are constructed from the original signal. This is performed by averaging the data points from the original time series within non-overlapping windows of increasing length. Specifically, the amplitude of the Frontiers in Systems Neuroscience www.frontiersin.org coarse-grained time series y (θ ) (t ) at time scale θ is calculated according to wherein the fluctuations at scales smaller than θ are eliminated. The window length, measured in data points, represents the scale factor, θ = 1, 2, 3,. . .. Note that θ = 1 represents the original time series, whereas relatively large θ produces a smooth signal, containing basically low frequency components of the original signal.
To obtained the MSE curve, sample entropy is computed for each coarse-grained time series.

INFORMATION TRANSFER
A number of studies have used information-theoretic tools to characterize coupled systems (see Pereda et al., 2005 for a comprehensive review). Within this approach, predictive information transfer is a key concept used to define asymmetries in mutual interdependence (Palus et al., 2001;Lizier and Prokopenko, 2010). Information transfer I k (x → y) is defined as the conditional mutual information I (x t , y t + k |y t ) between the past and present of one system, x t , and a future of another system, y t + k , provided that information about the past and present of the second system, y t is excluded (Palus et al., 2001). The subindex k is used to designate dependence of the conditional mutual information I (x t , y t + k |y t ) on the latency k, which typically is measured in units of data points. Thus, I (x t , y t + k |y t ) can be considered as a function of the latency between the past and present of the first system and the future of the second one. The measure I (x t , y t + k |y t ) can be expressed in terms of individual H (·) and joint entropies H (·,·) and H (·,·,·) as follows: In a similar way, we can define the transfer of information from the past and present of the second system, y t , to the future of the first one, x t + k : I (x t , y t + k |y t ) or I (y t , x t + k |x t ) are closely related to the statistic termed transfer entropy, a measure of the deviation from the independence property for coupled systems evolving in time (Schreiber, 2000). It can be shown that under proper conditions the transfer entropy is equivalent to the conditional mutual information : Net transfer entropy or information transfer, T (x → y) = T k (x → y) − T k (y → x), can be used to infer the directionality of the dominant transfer of information between coupled systems. Positive T (x → y) would imply that the system x t has a higher predictive power to explain the time course of the system y t , than vice verse.
In estimating transfer entropy, the key issue is estimation of the entropies themselves. The straightforward approach is to divide the state-space into bins, i = 1, 2, 3,. . ., of some size δ and calculate the entropy of the multi-dimensional dynamics through constructing a multi-dimensional histogram, estimating probabilities of being in the ith bin. This study took another approach, as proposed by Prichard and Theiler (1995) and tested using linear and linear models (Chavez et al., 2003;Gourévitch et al., 2007). Specifically, individual and joint entropies H (x) are approximated by estimating the corresponding correlation integral C q (x, r) computed as where N is the number of data points, and is the Heaviside function. Specifically, the correlation integral C q (x, r) is a function of a scale parameter r, which in general, can be related to the bin size δ, and the integral order q. The second order (q = 2) correlation integral, as used in this study, is interpreted as the likelihood that the distance between two randomly chosen delay vectors (points in the multi-dimensional state-space) is smaller than r.

PIPELINE OF THE ANALYSIS
The dynamics of the networks consisting of six sources were identified for 22 participants in 6 conditions, as described in the Section 1. To determine the optimal embedding parameters for reconstructing the delay vector from the observed time series, we applied the information criterion proposed by Small and Tse (2004). For most of the time series, with a few exceptions, the embedding window was estimated to be equal to 2, which implies the embedding dimension d = 2 (a two-dimensional system) and the embedding delay τ = 1. For each subject and condition, sample entropy was computed for the scales 1-20 for all of the single trials. The information rate produced by a system underlying the observed signal was computed by averaging the sample entropy statistic across the trials, as well as over some range of scale factors. Specifically, the information rate at fine time scales was estimated by averaging the first five scale factors, whereas the information rate of coarsegrained time series was computed by averaging the time scales 16-20. Thus, for a network of six sources, each source was associated with two values: information rate at fine and coarse time scales. For the purpose of this study, we use the terms variability, sample entropy and information rate interchangeably.
For the same networks, transfer entropy was computed as a function of the latency between the past of dynamics of one source and a future of the dynamics of another source (k = 1, 2,. . .,50), for all possible pairs of the sources (30 connections in total) and for all single trials. Following Palus et al., 2001, the transfer entropy was averaged across the latency k with the idea to decrease the variability of estimated statistics and to increase Frontiers in Systems Neuroscience www.frontiersin.org the robustness of the results. Note that as the MEG epochs were relatively short, the transfer entropy was computed only at time scale θ = 1, which corresponds to the original time series. For each trial and pathway, the information transfer was estimated in both directions: I k (x → y) and I k (y → x), as described in Section 6. The net information transfer was computed as the difference between two amounts of transfer entropy, averaged across trials. Thus, for a network of six sources, each pathway between two sources was associated with a value of the net information transfer, reflecting the asymmetry in the predictive power between the source activity.

MEG DATA
In Figure 2, the relations between asymmetry in mutual interdependence and variability are shown across subjects, separately for all the conditions. Specifically, the figure shows the net information transfer between two sources as a function of difference in sample entropy computed at fine (Figure 2A) and coarse ( Figure 2B)

FIGURE 2 | Net information transfer between sources within the same network versus the difference in sample entropy, computed (A) at fine time scales; (B) at coarse time scales.
Each point is associated with one subject (22 in total) and one connection (out of 30 possible pathways between 6 sources). The top of each plot shows the correlation value r between the two measures (significant for all the conditions with p-values less than 0.001). A positive correlation implies that the net information is transferred from a source with higher sample entropy to a source with lower sample entropy. Negative correlations imply that more information is transferred toward a system with a higher sample entropy.
Frontiers in Systems Neuroscience www.frontiersin.org time scales, separately for each condition. Each point is associated with one subject and a pair of sources. Correlations between the two variables are given at the top of corresponding plots. In all the cases, the correlations are relatively strong (on the order of 0.5-0.8), statistically significant with p-values less than 0.001. Positive correlations in Figure 2A imply that a system with higher variability can better predict the behavior of a system with lower variability, than the other way around. Conversely, negative correlations observed in Figure 2B support the conclusion that at coarse time scales more information is transferred from sources with lower variability to sources with higher variability, than vice versa. In addition to the relations between information transfer and complexity, it may be important to explore the connectivity maps of the networks based on neuromagnetic sources, in the context of the latencies between the peaks of the event-related fields (ERF). Figure 3 illustrates the measures of transfer entropy for a pair of sources, shown as functions of the latency k between the future of one signal and the past of the other signal. Figure 4 shows the reconstructed connectivity patterns masked by the bootstrap ratio maps, computed separately for six conditions. The significance of the couplings was estimating by bootstrapping the subjects (selection with replacement). The bootstrap ratio threshold of 3.0, which corresponds roughly to a 95% confidence interval, was used to define the connections which were robust across the subjects.
Connections can essentially be divided into two groups. One group represents the connections between the brain regions with the asymmetry in predictive power leading from right to left. Those are VIS R → VIS L , FUS R → FUS L , and FUR R → VIS L . The other group unites the connections with the net information transfer directed from the sources with smaller latencies in the peaks of ERF to those with larger latencies, such as VIS R → FUS L , VIS R → ACC UP , or FUR R → ACC UP .

SYNTHETIC DATA
In the previous section, we considered some empirical aspects of the interplay between sample entropy (information rate) and transfer entropy (information transfer) in the pairwise relations between the neuromagnetic sources. In the following section, we propose that such an interplay might be explained by coupling parameters, such as time delays or coupling strength, characterizing coupled non-linear dynamic systems. Our objective would be to demonstrate the same pattern of relationships between variability computed at different time scales and asymmetry in mutual interdependence between the original time series, using a simple computational model of interacting sources. Specifically, we will consider a model of coupled oscillators with time delay in coupling. We will show that such a model has a potential to explain the peculiarities we observed in Figure 2. The model we simulate is based on unidirectionally coupled chaotic Rössler oscillators. Hadjipapas et al. (2009) used coupled Rössler systems to study collective dynamics in oscillatory networks as a simple case of periodic systems perturbed by a noise that has a deterministic rather than stochastic nature. Such systems represent a relatively simple non-linear system able to generate self-sustained non-periodic oscillations. In turn, oscillatory behavior and rhythms of the brain have been extensively studied as a plausible mechanism for neuronal synchronization (Varela et al., 2001). Under this context, the coupled Rössler oscillators can be viewed as a prototypical example of oscillatory networks. Explicitly, the model reads where ω 1 = ω 2 = 0.99 are the natural frequencies of the oscillators, ∈ is the coupling strength, and T denotes the delay in coupling.
In the model, the dynamics of the first system determined by a behavior of three variables (x 1 , y 1 , z 1 ) is the response driven by the second system based on a behavior of (x 2 , y 2 , z 2 ). Further analysis is based on an assumption that only the dynamics of the variables x 1 (t ) and x 2 (t ) can be observed. Our specific goal is threefold: (i) to reconstruct the directionality of coupling between x 1 (t ) and x 2 (t ), (ii) to analyze the complexity of these signals, and (iii) explore relations between the complexity and causal information. Numerical solutions of Eqs. (12) were obtained using the dde23 Matlab function (the Mathworks, Natick, MA) with a subsequent resampling of the time series with a fixed step 0.1. The dynamics were solved on the interval [0, 600], subsequently discarding the interval [0, 300] to avoid transitory effects. Thus, each time series had 3000 data points.
For a given pair of parameters, ∈ and T, the signals were generated 20 times. Analyses of sample entropy and transfer entropy were performed similarly to the pipeline for the analysis of the MEG data, as described in Section 1. The only difference was that for synthetic data, we had a network consisting of two systems, and Frontiers in Systems Neuroscience www.frontiersin.org realizations of the model as an equivalent to trials. Transfer entropy between the two systems was computed for all the realizations, as functions of the past of system #1 and the future of system #2. The latency varied from 1 to 100 data points, which corresponded to the interval [0, 10]s. To obtain a value of the net information transfer, the difference between two amounts of transfer entropy was averaged across realizations and latency range. For the same data, sample entropy was computed as a function of scale factors 1-20. As in the MEG data analysis, the variability at fine time scales and coarse time scales ( Figure 5D) as functions of the time delay T. Note that, when we deal with real data, such relations cannot be observed as typically the true values of T are not known (see, however, Prokhorov and Ponomarenko, 2005;Silchenko et al., 2010;Vicente et al., 2011 for the attempts in recovering time delays in coupling). What we can observe is the correlations between the net transfer entropy and the differences in sample entropy shown in Figures 5C,E. The results revealed the presence of a relatively strong and robust linear correlation between the two statistics, similar to what we saw for MEG data in Figure 2A. However, the correlation observed in Figure 5E is close to zero and statistically insignificant, contrary to Figure 2B. Similar to the time delay, the coupling parameter ∈ turned out to be able to explain, to some degree, the results in Figure 2. As expected, the net transfer entropy was found to be a monotonically increasing function of the coupling strength ∈, as shown in Figure 6A. Also, the difference in coarse-grained sample entropy was, at first approximation, a linear function of ∈, as shown in Figure 6D. In turn, this led to the negative correlation between the

FIGURE 6 | Effects of the strength of coupling (parameter ∈) on the relations between differences in sample entropy between coupled Rössler's oscillators and net information transfer between them.
Specifically, net transfer entropy (A) and differences in sample entropy estimated at fine (B) and coarse (D) time scales are given as functions of the coupling strength ∈. Complexity and transfer entropy were estimated based on the signals x 1 and x 2 , according to the model (12) for the different values of the parameter ∈ with a fixed T. As in Figure 5, only the relations illustrated in (C,E) can be observed in the MEG data analysis (Figure 2). Note that Figures 2A,B are consistent with (C,E), respectively, only for relatively weak couplings, with ∈ < 0.08 (B). complexity difference and net transfer entropy for all the values of the coupling parameter, as plotted in Figure 6E, in a good accordance with the results observed in Figure 2B. The influence of ∈ on the fine-grained sample entropy was ambiguous, as shown in Figures 6B,C. It is worth noting that Figures 2A,B are consistent with Figures 6C,E, respectively, only for a weak coupling.

CONCLUSION AND DISCUSSION
In this paper, we examined relations between signal variability and asymmetry in mutual interdependencies between activated neuromagnetic sources. Variability was quantified based on sample entropy (Richman and Moorman, 2000), which is ultimately interpreted as the average rate of information generated by a dynamic system (Grassberger and Procaccia, 1983;Pincus, 1995). Using the concept of multi-scale entropy (Costa et al., 2002), we examined variability at fine and coarse resolutions of the same time series. Interdependencies between source dynamics was estimated using conditional mutual information between the past and present of one signal and the future of another signal, provided that the knowledge about the past and present of the second signal is excluded (Palus et al., 2001). The asymmetry in information Frontiers in Systems Neuroscience www.frontiersin.org transfer represent the differences in predictive power between sources, i.e., to predict the activity of each other. The analyses of signal variability and information transfer were performed under an assumptions that neuronal ensembles involved in performing a task can be described by coupled nonlinear dynamic systems (Haken, 1996). Noise can be present at different levels of the non-linear models describing the observed time series. For the purpose of this study, we differentiate three types of noise-like activity. First, there is internal noise, which is an inherent component of a model, and is a part of the input entering the non-linear deterministic system. Second, we distinguish the variability in the signal generated by non-linear dynamic system. Finally, observational noise can be mixed with the output of the system.
This study focuses on exploring the variability in non-linear dynamics and describes this variability in its relations to the transfer of information in functional networks. Typically, there is the assumption that one observes non-linear systems in different states, and the goal is to describe these differences. Although different, two initial conditions would not be differentiated with certain experimental precision. However, they may evolve into distinguishable states after some finite time. Thus, one could say that a system that is sensitive to initial conditions produces information (Eckmann and Ruelle, 1985).
Sample entropy, which was used as a measure of variability, is closely related to the mean rate of information generated by a dynamic system underlying the observed signals. In practice, however, both linear stochastic and non-linear deterministic effects can contribute to the measure of sample entropy. A number of studies indicate that averaging the information rate over a larger time horizon allows one to alleviate linear effects, in particular, those associated with observational noise, and to focus on the signal variability due to the underlying non-linear determinism (Govindan et al., 2007;Kaffashi et al., 2008). Down-sampling of the original time series, as used in the multi-scale entropy approach, can be viewed as a way to extend the period over which the information generated by a system is averaged.
The first part of our analysis was based on the dynamics of neuromagnetic sources reconstructed from MEG data collected during a face recognition task. In the second part, we extended our empirical findings with an analysis of synthetic data based on the dynamics of coupled non-linear oscillators with time delay in coupling. We found that relations between sample entropy of the activity of neuromagnetic sources and the net information transfer between them depends on time scales at which the sample entropy is computed. Specifically, we found that more information is transferred from a source with a higher sample entropy at coarse time scales, but with a lower sample entropy at fine time scales.
Under certain conditions, analysis of the synthetic data offered a potential explanation our empirical findings. Specifically, a study of the system of two coupled oscillators with time delay in coupling revealed the same relations between the difference in sample entropy and asymmetry in information transfer. In particular, we found that the interplay between sample entropy-based on finegrained signals and information transfer can be explained, in a statistical sense, by the variability in the time delay in coupling. On the contrary, correlations between information transfer and sample entropy computed at coarse time scales were insignificant. In addition, we found that the variability in the coupling strength can contribute to the observed relations between the sample entropy-based on the coarse-grained signals and the information transfer. Taking into account that the coarse scales would better reflect non-linear effects, these results indicate that the variability of the signals due to non-linear determinism become more diversified as a result of the propagation of information in the network. In other words, propagation of information in a network may be described as accumulation of complexity (variability) of the brain signals. Similar results were found in (Mišić et al., 2011), who showed that the variability of a region's activity systematically varied according to its topological role in functional networks. Specifically, the rate at which information was generated was largely predicted by graph-theoretic measures characterizing the importance of a given node in a functional network, such as the node centrality or efficiency of information transfer.
It would be worth discussing the differences between an analysis of transfer entropy, as performed in this study, and an analysis of causal relationships between the source activity. Lizier and Prokopenko (2010) suggested to distinguish information transfer and causal effects. Information transfer is defined as the conditional mutual information, representing the averaged information contained in the future of one process about the past of another process, but not in the past of the first process itself. In contrast, causal effect can be viewed as information flow quantifying the deviation of one process from causal independence on another process, given a set of variables that may affect these two processes of interest. Along a similar line of reasoning, Valdes-Sosa et al. (2011) differentiate predictive capacity between temporally distinct events and the effects of controlled intervention on the target process. Observing activity at a network node may potentially indicate its effects at remote nodes. However, identification of a physical influence upon a node at a given network assumes that any other physical influence that this node receives should be excluded.
Under this context, it should be emphasized that this study focuses on predictive information transfer, rather than on information flow. Using bivariate variant of information transfer, compared to the multivariate version, imposes a few limitations. First, it is impossible to distinguish between direct and indirect connections (Gourévitch et al., 2007). Specifically, confounding effects of indirect connections on estimation of transfer entropy were considered in Vakorin et al. (2009). Second, bivariate estimates of directionality in case of mutually interdependent sources may produce spurious results (Blinowska et al., 2004). With regards to this study, it should be noted that the issue associated with common sources is less of a problem in MEG than in EEG, as neuromagnetic signals do not suffer from volume conduction (Hämäläinen et al., 1993). However, in general, choosing an optimal set of variables constituting a network to analyze in a multivariate way remains an open issue. For example, it was shown that information-theoretic measures (transfer entropy), which in general does not require a model of interactions between nodes of a network, in contrast to autoregressive models, remain sensitive to model misspecification, wherein excluding a node from the analysis or adding a node affects the estimation of transfer entropy and robustness of the results (Vakorin et al., 2009).

Frontiers in Systems Neuroscience
www.frontiersin.org