Canonical information flow decomposition among neural structure subsets

Takahashi, Daniel Y.; Baccalá, Luiz A.; Sameshima, Koichi

doi:10.3389/fninf.2014.00049

ORIGINAL RESEARCH article

Front. Neuroinform., 30 May 2014

Volume 8 - 2014 | https://doi.org/10.3389/fninf.2014.00049

This article is part of the Research TopicInformation-based methods for neuroimaging: analyzing structure, function and dynamicsView all 16 articles

Canonical information flow decomposition among neural structure subsets

Daniel Y. Takahashi¹

Luiz A. Baccalá²^*

Koichi Sameshima³

¹Psychology Department, Neuroscience Institute, Princeton University, Princeton, NJ, USA
²Telecommunications and Control Department, Escola Politécnica, University of São Paulo, São Paulo, Brazil
³Department of Radiology and Oncology, Faculdade de Medicina, University of São Paulo, São Paulo, Brazil

Partial directed coherence (PDC) and directed coherence (DC) which describe complementary aspects of the directed information flow between pairs of univariate components that belong to a vector of simultaneously observed time series have recently been generalized as bPDC/bDC, respectively, to portray the relationship between subsets of component vectors (Takahashi, 2009; Faes and Nollo, 2013). This generalization is specially important for neuroscience applications as one often wishes to address the link between the set of time series from an observed ROI (region of interest) with respect to series from some other physiologically relevant ROI. bPDC/bDC are limited, however, in that several time series within a given subset may be irrelevant or may even interact opposingly with respect to one another leading to interpretation difficulties. To address this, we propose an alternative measure, termed cPDC/cDC, employing canonical decomposition to reveal the main frequency domain modes of interaction between the vector subsets. We also show bPDC/bDC and cPDC/cDC are related and possess mutual information rate interpretations. Numerical examples and a real data set illustrate the concepts. The present contribution provides what is seemingly the first canonical decomposition of information flow in the frequency domain.

1. Introduction

Human behavior is primarily thought as a property that emerges from the interaction of several brain areas, body parts, and the environment. Understanding how these elements dynamically interact is one of major themes of systems neuroscience. Several multivariate time series methods—old and new—have been introduced to describe the interdependence between brain areas using signal modalities like EEG, BOLD signals, MEG and LFP—and are collectively called connectivity measures. Partial directed coherence (PDC) (Baccalá and Sameshima, 2001) and directed coherence/directed transfer function (DC/DTF) (Kamiński and Blinowska, 1991) are two examples of such connectivity measures. Both describe complementary aspects (see Baccalá and Sameshima, 2014 for an in depth discussion) of how information flows between pairs of univariate time series components that belong to a multivariate vector of simultaneously observed time series (Takahashi et al., 2010). Recently, PDC and DC have been generalized (as bPDC/bDC, respectively) to describe how subsets (blocks) of components within a time series vector interrelate (Takahashi, 2009; Faes and Nollo, 2013). This is specially important for neuroscience applications as one often wants to investigate the interaction between sets of time series that are circumscribed to an observed region of interest (ROI) with respect to another physiologically relevant ROI (Nedungadi et al., 2011). The potential relevance of this type of question alone justifies looking for their deeper meaning in terms of information theoretical quantities.

Despite their practical importance, bPDC/bDC suffer from the limitation that several time series within a given subset may be irrelevant or interact in opposition to one another thereby posing interpretation difficulties. Also, in several situations, a researcher may be interested in just the few “best” descriptions of interaction between two sets of time series but not in the total amount of information flowing between them. For a more concrete example, assume that two brain areas interact and that bPDC is large. In this situation, it does not straightforwardly follow that all brain region components are interacting in the same way, or even whether some such components may be ignored. One way to address this limitation is to decompose bPDC/bDC into different components weighed according to relevance.

The aim of this article is twofold: (a) to provide a proper information theoretic interpretation for bPDC/bDC and (b) to introduce a canonical decomposition of information flows, henceforth termed, respectively, canonical PDC/DC (cPDC/cDC). These new decompositions allow us to closely mimic classical canonical correlation analysis so that different dynamically relevant interaction modes between brain areas can be exposed. Due to PDC interpretability in terms of Granger causality (Baccalá and Sameshima, 2014), a consequence of the present formulation is that cPDC represents a long sought frequency domain counterpart to time domain canonical decompositions of Granger causality (Sato et al., 2010; Ashrafulla et al., 2013).

The article is organized as follows. We first introduce the background and notation necessary for the rest of the article (section 2). In the results section (section 3), we first show that both bPDC and bDC between two subsets of processes are block coherences between suitably defined underlying processes. Then, we demonstrate that such coherences are nothing but monotonic transformations of the mutual information rate between the respective processes (Gelfand and Yaglom, 1959; Takahashi et al., 2010; Nedungadi et al., 2011) leading immediately to their interpretability as mutual information rates. Next, we introduce cPDC and cDC and prove that they are the non-zero eigenvalues of the matrices whose determinants underlie the respective bPDC and bDC definitions (section 4). Using simulated examples and publicly available data we illustrate the usefulness of cPDC/cDC (section 5) followed by a brief discussion (section 6). Proof details are left to the Appendix.

2. Background

Let X₁, …, X_K be K distinct multivariate time series vectors with dimension M₁, …, M_K. Using ^T to indicate matrix transposition, let X(t) = [X₁(t)^T, …, X_K(t)^T]^T for each time t ∈ ℤ be a second order stationary time series with spectral density matrix S(ω) at each frequency ω ∈ [−π, π). To justify our formal computation, we assume that S(ω) is uniformly bounded from below and above and invertible at all frequencies (Hannan, 1970). This is called the boundedness condition which guarantees that the following autoregressive (AR) representation of X holds in the mean square sense

\begin{matrix} X (t) = \sum_{l = 1}^{+ \infty} A (l) X (t - l) + ϵ (t), & (1) \end{matrix}

where ϵ(t) = [ϵ₁(t)^T … ϵ_K(t)^T]^T stands for a zero mean innovation process, i.e., 𝔼[ϵ(t)ϵ(t)^T] = Σ and 𝔼[ϵ(t)ϵ(l)^T] = 0 for l ≠ t. For l ≥ 1, A(l) are (M₁ + … + M_K)²-dimensional matrices. Let A_pq(l) for p, q ∈ {1, …, K} and l ≥ 1 be M_p × M_q-dimensional matrices so that A(l) has the following structure

A (l) = [\begin{matrix} A_{11} (l) & \dots & A_{1 M} (l) \\ ⋮ & ⋱ & ⋮ \\ A_{M 1} (l) & \dots & A_{M M} (l) \end{matrix}]

We define $\bar{A} (ω) = I - \sum_{l \geq 1} A (l) e^{- \sqrt{- 1} ω l}$ .

Under the boundedness condition, the following moving average (MA) mean square sense representation for the process X also holds

\begin{matrix} X (t) = \sum_{l = 0}^{+ \infty} H (l) ϵ (t - l), & (2) \end{matrix}

where H(l) for l ≥ 0 are (M₁ + … + M_K)²-dimensional matrices. Let $\bar{H} (ω) = \sum_{l \geq 0} H (l) e^{- \sqrt{- 1} ω l}$ . We have that Ā*(ω) = H⁻¹(ω) for all ω ∈ [−π, π). The superscript * indicates the matrix complex conjugate.

Let P(ω) = S⁻¹(ω). bPDC from the multivariate process X_j to the process X_i at frequency ω, denoted π^(b)_ij(ω), is defined (Takahashi, 2009; Faes and Nollo, 2013) by

\begin{matrix} π_{i j}^{(b)} (ω) = 1 - \det ​ (P_{j j} (ω) - {\bar{A}}_{i j}^{*} (ω) Σ_{i i}^{- 1} {\bar{A}}_{i j} (ω)) \det ​ {(P_{j j} (ω))}^{- 1}, & (3) \end{matrix}

where det indicates the determinant and the subscript indices relate to the natural block structure associated with the matrices.

Let Θ = Σ⁻¹. bDC from the multivariate process X_j to the process X_i at frequency ω, denoted γ^(b)_ij(ω), is defined (Takahashi, 2009; Faes and Nollo, 2013) by

\begin{matrix} γ_{i j}^{(b)} (ω) = 1 - \det ​ (S_{i i} (ω) - {\bar{H}}_{i j} (ω) Θ_{j j}^{- 1} {\bar{H}}_{i j}^{*} (ω)) \det {(S_{i i} (ω))}^{- 1} . & (4) \end{matrix}

Note that the present bDC definition differs slightly from the one in Faes and Nollo (2013). We removed the unnecessary condition of strict causality, i.e., diagonality of Σ, simply by substituting Σ⁻¹_jj by Θ⁻¹_jj in their definition of bDC as it is more suited for formulating information theoretic results as shown ahead.

Consider a second-order stationary multivariate process W(t) = [Y(t)^T Z(t)^T]^T. The block coherence between Y and Z at frequency ω is defined as (Nedungadi et al., 2011)

\begin{matrix} C_{Y Z}^{(b)} (ω) = 1 - \det (S_{W W} (ω)) \det {(S_{Y Y} (ω))}^{- 1} \det {(S_{Z Z} (ω))}^{- 1} ​ ​ . & (5) \end{matrix}

Observe that we used the process name in the subscript of the power spectrum S to indicate the corresponding spectral density matrices. In the rest of the article, we will use interchangeably the process name or the corresponding indices in the subscript whenever there is no ambiguity.

Another important definition is that of mutual information rate (MIR) between two multivariate strictly stationary processes Y and Z is

\begin{matrix} {MIR}_{Y Z} = \lim_{t \to + \infty} \frac{1}{t} 𝔼 [\log \frac{d ℙ (Y (1), \dots, Y (t), Z (1), \dots, Z (t))}{d ℙ (Y (1), \dots, Y (t)) d ℙ (Z (1), \dots, Z (t))}] ​ . & (6) \end{matrix}

The classical relationship between block coherence (Equation 5) and mutual information rate (Equation 6) follows from

Theorem. (Gelfand and Yaglom, 1959; Pinsker, 1964) If Y and Z are jointly stationary Gaussian processes satisfying the boundedness condition, we have that the MIR between Y and Z is given by

\begin{matrix} {MIR}_{Y Z} = - \frac{1}{4 π} \int_{- π}^{π} \log (1 - C_{Y Z}^{(b)} (ω)) d ω . & (7) \end{matrix}

Now, following Takahashi et al. (2010), we define, for i ∈ {1, …, K}, the partialized process η_i by

\begin{matrix} η_{i} (t) = X_{i} (t) - 𝔼 [X_{i} (t) | {X_{j} (l), j \neq i, l \in ℤ}], & (8) \end{matrix}

where 𝔼[⊙|⊙] henceforth denotes the best linear conditional predictor. Likewise the partialized innovation process ζ_i for i ∈ {1, …, K} is

\begin{matrix} ζ_{i} (t) = ϵ_{i} (t) - 𝔼 [ϵ_{i} (t) | {ϵ_{j} (t), j \neq i}] . & (9) \end{matrix}

Observe that both partialized process and partialized innovation process were defined in Takahashi et al. (2010) but for the special case of scalar η_i and ζ_i.

3. Relation Between bPDC/bDC and Mutual Information Rate

Our first result establishes the relationship between bPDC and block coherence and is analogous to Theorem 1 in Takahashi et al. (2010).

Theorem 1. Let X satisfy the boundedness condition. For all i, j ∈ {1, …, K} and all frequencies ω ∈ [−π, π) we have that

\begin{matrix} π_{i j}^{(b)} (ω) = C_{ϵ_{i} η_{j}}^{(b)} (ω) . & (10) \end{matrix}

A straightforward corollary is

Corollary 1. Let X be a stationary Gaussian process and satisfy the boundedness condition. For all i, j ∈ {1, …, K} we have that

\begin{matrix} {MIR}_{ϵ_{i} η_{j}} = - \frac{1}{4 π} \int_{- π}^{π} \log (1 - π_{i j}^{(b)} (ω)) d ω . & (11) \end{matrix}

4. Canonical PDC and DC

Canonical correlation is a classical method developed initially by Hotelling (1936) to address the relationship between random vectors. Brillinger (1981) generalized the method for time series and gave an excellent account of the relationship between canonical correlation analysis and different ideas in multivariate statistics. Our formulation of canonical coherence is equivalent to the definition introduced by Brillinger (1981).

Let Y and Z be respectively M_Y- and M_Z-dimensional jointly second order stationary processes. To better understand the relationship between Y and Z, we can ask the following question: Which components of Y and Z are most representative of the interaction between the processes? One way to formalize this is to consider filtering matrices B_Y(l) (1 × M_Y) and B_Z(l) (1 × M_Z), for all l ∈ ℤ and define the scalar processes b_Y and b_Z by

\begin{matrix} b_{Y} (t) = \sum_{l \in ℤ} B_{Y} (l) Y (t - l) & (14) \end{matrix}

and

\begin{matrix} b_{Z} (t) = \sum_{l \in ℤ} B_{Z} (l) Z (t - l), & (15) \end{matrix}

so that C_{b_Yb_Z}(ω) is maximized for all ω ∈ [−π, π). If furthermore Y and Z are jointly stationary Gaussian processes, then this is equivalent to maximizing MIR_{b_Yb_Z}.

Following the above idea, we define the first canonical coherence between Y and Z at frequency ω by

\begin{matrix} C_{Y Z}^{(c_{1})} (ω) = \sup_{B_{Y}, B_{Z}} C_{b_{Y} b_{Z}} (ω) . & (16) \end{matrix}

Assume that the supremum (Equation 16) is achieved for b_Y and b_Z, which we call first canonical time series. Consider the residual processes Y¹(t) = Y(t) − 𝔼[Y(t)| {b_Y(l), l ∈ ℤ}] and Z¹(t) = Z(t) − 𝔼[Z(t)| {b_Z(l), l ∈ ℤ}]. Observe that Y¹ and Z¹ are uncorrelated to the processes b_Y and b_Z, respectively. The second canonical coherence C^(c₂)_YZ(ω) is defined recursively on the residues by C^(c₂)_YZ(ω) = C^(c₁)_Y¹Z¹(ω).

Analogously, for 2 ≤ m ≤ min{M_Y, M_Z}, considering the residual processes

\begin{array}{l} Y^{m} (t) = Y^{m - 1} (t) - 𝔼 ​ [Y^{m - 1} (t) | {{\bar{b}}_{Y^{k}} (l), l \in ℤ, \\ k \in {1, \dots, m - 1}}] \end{array}

and

\begin{array}{l} Z^{m} (t) = Z^{m - 1} (t) - 𝔼 ​ [Z^{m - 1} (t) | {{\bar{b}}_{Z^{k}} (l), l \in ℤ, \\ k \in {1, \dots, m - 1}}] ​, \end{array}

one may define the m-th canonical coherence as

\begin{matrix} C_{Y Z}^{(c_{m})} (ω) = C_{Y^{m - 1} Z^{m - 1}}^{(c_{1})} (ω) . & (17) \end{matrix}

In this way, it is possible to construct a hierarchy of coherences where each element captures the dependence structure that is not explained by the other elements.

Finally, we introduce cPDC and cDC. For m ≤ min{M_i, M_j}, the m-th canonical PDC from j to i at frequency ω denoted π^(c_m)_ij(ω) is defined by

\begin{matrix} π_{i j}^{(c_{m})} (ω) = C_{ϵ_{i} η_{j}}^{(c_{m})} (ω) . & (18) \end{matrix}

Similarly, the m-th canonical DC from j to i at frequency ω denoted γ^(c_m)_ij(ω) is defined by

\begin{matrix} γ_{i j}^{(c_{m})} (ω) = C_{X_{i} ζ_{j}}^{(c_{m})} (ω) . & (19) \end{matrix}

At first sight, it is unclear whether the canonical PDC and DC exist at all or even if they are uniquely defined. More importantly, nor is it obvious that it is possible to compute them. Despite these initial uncertainties, we show next that canonical coherences are consistently defined as the non-null eigenvalues of some specific matrices.

Let λ^m(Q) denote its m-th eigenvalue from matrix Q ordered from its largest to its smallest value. The following theorem furnishes a practical way to calculate cPDC and cDC.

Theorem 3. Under the boundedness condition for X the following identities hold:

\begin{matrix} π_{i j}^{(c_{m})} (ω) = λ^{m} ({\bar{A}}_{i j}^{*} (ω) Σ_{i i}^{- 1} {\bar{A}}_{i j} (ω) P_{j j}^{- 1} (ω)) & (20) \end{matrix}

and

\begin{matrix} γ_{i j}^{(c_{m})} (ω) = λ^{m} (S_{i i}^{- 1} (ω) {\bar{H}}_{i j} (ω) Θ_{j j}^{- 1} {\bar{H}}_{i j}^{*} (ω)) ​ . & (21) \end{matrix}

Furthermore it is possible to relate bPDC/bDC and cPDC/cDC via

Theorem 4. Under the same conditions of Theorem 3 the following identities hold:

\begin{matrix} π_{i j}^{(b)} (ω) = 1 - \prod_{m = 1}^{\min {M_{i}, M_{j}}} (1 - π_{i j}^{(c_{m})} (ω)) & (22) \end{matrix}

and

\begin{matrix} γ_{i j}^{(b)} (ω) = 1 - \prod_{m = 1}^{\min {M_{i}, M_{j}}} (1 - γ_{i j}^{(c_{m})} (ω)) ​ . & (23) \end{matrix}

A simple consequence of Equations (22), (23) is that for stationary Gaussian processes satisfying the boundedness condition, we now have a decomposition of the mutual information rates

\begin{matrix} {MIR}_{ϵ_{i} η_{j}} = \sum_{m = 1}^{\min {M_{i}, M_{j}}} - \frac{1}{4 π} \int_{- π}^{π} \log (1 - π_{i j}^{(c_{m})} (ω)) d ω & (24) \end{matrix}

and

\begin{matrix} {MIR}_{X_{i} ζ_{j}} = \sum_{m = 1}^{\min {M_{i}, M_{j}}} - \frac{1}{4 π} \int_{- π}^{π} \log (1 - γ_{i j}^{(c_{m})} (ω)) d ω . & (25) \end{matrix}

Note how the quantities being summed in Equations (24), (25) are formally themselves contributions to the mutual information written in terms of their canonical coherence contributions.

5. Illustrations

5.1. Simulated Models

Example 1. To provide insight into cPDC behavior, we begin with a very simple example that can be fully and explicitly solved.

Let a vector of observed time series [Y₁, Y₂, Y₃, Y₄] be a real valued autoregressive process of order p = 1 and Σ = I. The autoregressive coefficients of the model are described by

\begin{matrix} A (1) = (\begin{matrix} .5 & f & 0 & 0 \\ e & .5 & 0 & 0 \\ a & b & .5 & h \\ c & d & g & .5 \end{matrix}) ​, & (26) \end{matrix}

as in Figure 1.

FIGURE 1

Figure 1. Connectivity diagram for Example 1. The number of canonical components depends on the value of ad − bc.

By adopting time series blocks as X₁ = [Y₁ Y₂] and X₂ = [Y₃ Y₄], when e = f = g = h = 0, direct computation shows that the canonical PDC from block X₂ to X₁ is zero, i.e., π^(c₁)₁₂(ω) = π^(c₂)₁₂(ω) = 0 for all ω (reflecting the nullity of the 2 × 2 A(l) right side upper block), whereas the coupling in the opposite direction contributes two distinct components:

\begin{matrix} π_{21}^{(c_{1})} (ω) = \frac{a^{2} + b^{2} + c^{2} + d^{2} + \sqrt{{(a^{2} + b^{2} + c^{2} + d^{2})}^{2} - 4 {(a d - b c)}^{2}}}{2.5 - 2 \cos (ω)} & (27) \end{matrix}

and

\begin{matrix} π_{21}^{(c_{2})} (ω) = \frac{a^{2} + b^{2} + c^{2} + d^{2} - \sqrt{{(a^{2} + b^{2} + c^{2} + d^{2})}^{2} - 4 {(a d - b c)}^{2}}}{2.5 - 2 \cos (ω)} . & (28) \end{matrix}

For ad = bc—i.e., if the lower left 2× 2 block determinant of A(l) is zero as well, the total number of non-zero cPDC components reduces to just 1.

Even if e, f, g, h are non-zero, i.e., regardless of intrablock dynamics, a = b = 0 suffices to produce the single non-zero π^(c₁)₂₁(ω) component (shown in Figure 2A) since block X₁ interacts with block X₂ exclusively through Y₄, i.e., π^(c₂)₂₁(ω) ≡ 0. In this case, since only Y₄ is directly impacted by the interaction, only one combined source of variance exists even though two links exist between the blocks. Likewise if b = d = 0, even though two links leave X₁, there is only one dynamical component that counts.

FIGURE 2

Figure 2. Illustrative plots of the observations in Example 1. (A) cPDC₂₁ results for e = f = g = h = 0 in Example 1 revealing just one non-zero component under the ad = bc condition. (B) cPDC₂₁ when b = c = 0 and non-zero a and c in Example 1 leading to two non-zero components for any non-zero values of the e, f, g, h coefficients (the graph shown was produced using a = 0.5, b = 0, c = 0, d = 1, e = 0.3, f = −0.1, g = 0.3, h = 0.4).

This contrasts with the situation when b = c = 0 where two non-zero π^(c₂)₂₁(ω) coexist (Figure 2B) regardless of the values of e, f, g, h which, nonetheless, contribute to the relative size of the components.

Example 2. In the next example, a 10-variate time series (Y₁, …, Y₁₀) follows the connectivity diagram represented in Figure 3. The multivariate time series is divided into four blocks (X₁, X₂, X₃, and X₄), where X₄ only sends information and X₃, which is an integrative block, only receives information. Block X₁ has two functionally distinct internal parts, and only one is reached by outside influence. The scenario is fairly complicated and we next illustrate cPDC/cDC usefulness for understanding the underlying dynamic interaction between blocks.

FIGURE 3

Figure 3. Connectivity diagram for Example 2 portraying how the bock sets. Note the effect of the value of the a parameter on cDC (Figure 5 versus Figure 6).

To help interpret the results, we begin by describing the non-zero model coefficients and their dynamical effects. Observe that the model subscript indices in this example indicate the corresponding scalar process and not the block number.

1. Block X₁ = [Y₁ Y₂ Y₃ Y₄ Y₅]

\begin{matrix} \begin{array}{l} A_{1, 1} (1) = 1.98 c o s (π / 50), A_{1, 1} (2) = - {(.99)}^{2}, \\ (low frequency oscillator in Y_{1}) \end{array} & (29) \end{matrix}

\begin{matrix} A_{2, 3} (1) = 1, & (30) \end{matrix}

\begin{matrix} \begin{array}{l} A_{3, 3} (1) = 1.98 c o s (π / 2), A_{3, 3} (2) = - {(.99)}^{2}, \\ (oscillator at midband (π /2) in Y_{3}) \end{array} & (31) \end{matrix}

\begin{matrix} \begin{array}{l} A_{5, 4} (1) = . 99, A_{4, 5} (1) = -. 99, \\ (oscillator at midband in [Y_{4} Y_{5}]) \end{array} & (32) \end{matrix}

\begin{matrix} A_{8, 2} (1) = 1, A_{8, 2} (3) = 1, (midband notch) & (33) \end{matrix}

\begin{matrix} A_{6, 3} (1) = 1, A_{6, 3} (3) = 1, (midband notch) & (34) \end{matrix}

\begin{matrix} A_{9, 1} (1) = 1, A_{9, 5} (1) = 1 . & (35) \end{matrix}

2. Block X₂ = [Y₆ Y₇]

\begin{matrix} \begin{array}{l} A_{6, 7} (1) = . 99, A_{7, 6} (1) = -. 99, \\ (oscillator identical to the [Y_{4} Y_{5}]) \end{array} & (36) \end{matrix}

\begin{matrix} A_{9, 6} (1) = 1 . & (37) \end{matrix}

3. Block X₃ = [Y₈ Y₉]

\begin{matrix} A_{8, 8} (1) = - 1, A_{9, 8} (1) = .5 . & (38) \end{matrix}

4. Block X₄ = [Y₁₀]

\begin{matrix} \begin{array}{l} A_{10, 10} (1) = 1.98 c o s (2 π / 3), A_{10, 10} (2) = - {(.99)}^{2}, \\ (high frequency oscillator in Y_{10}) \end{array} & (39) \end{matrix}

\begin{matrix} A_{4, 10} (1) = a, & (40) \end{matrix}

\begin{matrix} A_{7, 10} (1) = 1. & (41) \end{matrix}

The resulting cPDC components can be appreciated in Figure 4 for |a| = 1. Among their interesting features is the existence of the notch filtered link from X₁ to X₂ and to X₃ at midband. The effects of the low frequency dynamics due to Y₁ and the midband resonance due to [Y₄ and Y₅] manifests itself as the strongest component from X₁ to X₃. Likewise the single link effect from X₂ to X₃ is readily apparent as the higher frequency resonances from X₄ toward both X₁ and X₂. Both X₃ components are identically equal to 1 since nothing leaves the block.

FIGURE 4

Figure 4. The cPDC for Example 2 reflects the existence of the notch connecting filters from X₁ to X₂ and to X₃. The intrinsic dynamics of the oscillators from a subregion of X₁ into X₃ is apparent in the resonances of the largest cPDC component. The resonance within block X₂ manifest itself in the single non-zero component into X₃ while the effect of X₄ reaches symmetrically into X₁ and X₂ via its single dynamic component. In this and following two figures, each subfigure may contain up to five cPDC/cDC components, given by min(M_i, M_j) as in Equations (22)/(23), represented in red, blue, yellow, green, or black lines in decreasing order of magnitude.

The corresponding cDCs are portrayed in Figure 5 for a = −1 with no signal reachability from X₄ to X₃. This contrasts markedly with Figure 6 for a = 1 where X₄'s indirect effects on X₃ are not balanced out.

FIGURE 5

Figure 5. cDC for Example 2 for a = −1 leading to a cancelation of the effect of X₄ on X₃ as the signal travels indirectly through two exactly identical structures but with opposite phases before reaching X₃. The notch filtering action is also apparent from the cDCs from X₁ to X₂ and X₃.

FIGURE 6

Figure 6. cDC for Example 2 with a = 1 which differs from Figure 5 in the effect from X₄ to X₃ which no longer cancels out.

The effects of the notch connections are readily apparent in both cases. For example, the power associated with the notch frequencies are the local components to X₂ and X₃ and cannot be attributed to outside influence. For block X₁ only one of the five components is different from 1 reflecting the contribution coming from X₄.

5.2. Empirical Data

This example is based on EEG data borrowed from Sameshima et al. (2014) (Ex. 7.7), which describes a left mesial temporal ictal episode monitored using an extended 10–20 system. The midline electrodes were excluded and left (L) and right (R) side electrodes were grouped as to whether they were frontal (F), central (C), parietal (P), temporal (T) or occipital (O) leading to the canonical PDCs portrayed in Figure 7 where the most important connecting blocks share a dominant low pass frequency canonical component of fairly identical shape pointing to the existence of a shared dominant connectivity dynamics behind the observation, see Figure 8A. Their connectivity is further summarized in Figure 8B.

FIGURE 7

Figure 7. cPDC from the Empirical Data example (section 5.2) from the left mesial ictal episode where the largest components are represented either in red or green. cPDC values in red were arbitrarily considered significant and were pictorially summarized in Figure 8B.

FIGURE 8

Figure 8. (A) This corresponds to Figure 7.13 from Sameshima et al. (2014) showing the gPDC connectivity graph (see arrows) and the scalp electrodes grouping sets corresponding to frontal (LF, RF), temporal (LT, RT), central (LC, RC), pariental (LP, RP), and occipital (LO, RO) areas. The midline electrodes in gray were not considered in this analysis. (B) Diagram for the significant first cPDC components in Figure 7 (red lines) showing scalp electrode set connections shown in (A). Notice some divergences between gPDC and cPDC graphs possibly due the lack of proper rigorous statistics usage for cPDC significance level estimation, for instance, there is cPDC from RO to LO (B), but gPDC O2 to O1 is absent (A), while there is gPDC from C4 to T1 without corresponding cPDC from RC to LT.

6. Discussion

We showed that bPDC/bDC introduced in Takahashi (2009) and Faes and Nollo (2013) are block coherences between properly chosen vector time series. When the time series are Gaussian, this implies that bPDC/bDC represent mutual information rates between well defined underlying vector time series. This fully generalizes the results presented in Takahashi et al. (2010). To enhance the understanding of the possibly complex interaction between multiple time series and overcome some bPDC/bDC limitations, we showed that the latter can be decomposed in canonical terms that we call cPDC/cDC. These decompositions represent the various different modes of interaction whereby sets of time series interact. We introduced an explicit way to compute these new quantities and proved some of their properties. The usefulness of cPDC/cDC was illustrated by three examples.

6.1. bPDC and BDC as Block Coherences

Takahashi et al. (2010) showed that PDC from the j-th scalar time series to the i-th scalar time series is the coherence between the i-th innovation process and the j-th partialized process with a similar result for DC. It is natural to ask whether an analogous result holds for bPDC and bDC. We showed that this is indeed the case where bPDC/bDC represent block coherences relating subsets of adequately defined innovations/partialization processes (Takahashi, 2009; Nedungadi et al., 2011). At first sight these identities may seem surprising as both bPDC and bDC are fully multivariate and directional measures of dependence, whereas block coherences are at once block-pairwise and symmetric measures of dependence. Yet careful reading of Theorems 1 and 2 highlights that bPDC/bDC from j to i and bPDC/bDC from i to j are, in general, block coherences between distinct pairs of vector processes which explains their asymmetric nature and lends them their directed connectivity character. Also, we note that for both bPDC and bDC, the coherences involve innovation process subsets which explains their fully multivariate characteristic as measures. Another interesting observation is that since the innovation processes are uncorrelated to the past of the partialized processes by construction, in the case of bPDC only innovations in the past of the partialized process contribute to the coherence which explains why bPDC is a directed measure of dependence. An analogous observation holds for bDC. In the Gaussian case, the bPDC/bDC representation as a block coherence allows relating them to the mutual information rate between suitably chosen time series. Formally this justifies the idea that these quantities are de facto measures of information flow. For an interesting comparison between bPDC/bDC and Geweke's measure of linear feedback see Faes and Nollo (2013). As a small note for the reader, we observe that our definition of bPDC/bDC is slightly more general than the one proposed by Faes and Nollo (2013) because the covariance matrix of the innovations does not need to be diagonal as they assumed.

6.2. Canonical Decomposition of Directional Measures

Given a pair of random vectors, it is natural to ask how to measure/represent dependence between them. In statistics, there are two main methods, both inspired by the basic Pearson correlation, to address this. The first one generalizes Pearson correlation directly using the determinants of the covariance matrix between and within each set of random variables. For time series, the equivalent measure in the frequency domain is the block coherence and the directed versions are bPDC and bDC. A second generalization rests on the idea of canonical correlation introduced by Hotelling (1936). There are several generalizations of canonical correlation for time series taylored specifically to infer Granger causality in the time domain (Sato et al., 2010; Wu et al., 2011), but, to the best of our knowledge, cPDC and cDC are the first proposals of canonical measures of directed dependence in the frequency domain.

One advantage of cPDC/cDC over bPDC/bDC is that canonical decomposition allows inferring the various different existing modes of interaction between sets of time series in close analogy to what is done for classical canonical correlation and principal component analyses. One should expect this to be useful when several signals are redundant, generated by similar mechanisms, or when there are several time series that do not significantly contribute to the interaction between sets of time series, e.g., when there are many brain areas that are not interacting with each other during some specific behavior. Besides, as we show in Theorem 4, we can recover the bPDC/bDC from the cPDC/cDC.

6.3. Interpreting cPDC/cDC

The main practical interest of cPDC/cDC is to allow the simplification of connectivity interpretations whilst giving new insights into the dynamical interaction between neural structures. We illustrated the achievable simplification using an EEG data set from an epileptic patient. We also showed how cPDC is related to the number of “modes” of interaction between sets of time series through the simple numerical Example 1 and via the slightly more complex Example 2. We expect that cPDC/cDC together with bPDC/bDC become useful tools for handling high dimensional data sets that are increasingly being recorded by several researchers.

We propose that a reasonable way to understand the usefulness of cPDC/cDC is to make an analogy with classical principal component and canonical correlation analyses. Therefore, similar heuristics could be applied in practical situations, for example, to decide the number of different components to include in the interpretation. The canonical time series b_Y and b_Z from section 4 (see also Brillinger, 1981) are analogous to the canonical variables from the classical canonical correlation analysis and can play a similar role for result interpretation.

Finally we remark that the computational procedures used for the present paper will be made available the PDC homepage at http://www.lcs.poli.usp.br/~baccala/pdc/canon together with the data used in section 5.2.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

CNPq Grants 307163/2013-0 to Luiz A. Baccalá and 309381/2012-6 to Koichi Sameshima are also gratefully acknowledged and to NAPNA—Núcleo de Neurociência Aplicada from the University of São Paulo. Part of this work took place during FAPESP Grant 2005/56464- 9 (CInAPCe). Daniel Y. Takahashi was partially supported by Pew Latin American Fellowship and Ciência sem Fronteiras Fellowship-CNPq grant (246778/2012-1).

References

Ashrafulla, S., Haldar, J. P., Joshi, A. A., and Leahy, R. M. (2013). Canonical Granger causality between regions of interest. Neuroimage 83, 189–199. doi: 10.1016/j.neuroimage.2013.06.056

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Baccalá, L. A., and Sameshima, K. (2001). Partial directed coherence: a new concept in neural structure determination. Biol. Cybern. 84, 463–474. doi: 10.1007/PL00007990

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Baccalá, L. A., and Sameshima, K. (2014). “Multivariate time series brain connectivity: a sum up,” in Methods in Brain Connectivity Inference Through Multivariate Time Series Analysis, eds K. Sameshima and L. A. Baccalá (Boca Raton: CRC Press), 245–251. doi: 10.1201/b16550-18

CrossRef Full Text

Brillinger, D. R. (1981). Time Series: Data Analysis and Theory. Classics in applied mathematics. Vol. 36. San Francisco, CA: SIAM, Society for Industrial and Applied Mathematics; Holden-Day.

Faes, L., and Nollo, G. (2013). Measuring frequency domain Granger causality for multiple blocks of interacting time series. Biol. Cybern. 107, 217–232. doi: 10.1007/s00422-013-0547-5

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Gelfand, I. M., and Yaglom, A. M. (1959). Calculation of amount of information about a random function contained in another such function. Am. Math. Soc. Transl. Ser. 2, 3–52.

Hannan, E. J. (1970). Multiple Time Series (Wiley Series in Probability and Mathematical Statistics). New York, NY: Wiley. doi: 10.1002/9780470316429

CrossRef Full Text

Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321–377. doi: 10.2307/2333955

CrossRef Full Text

Kamiński, M., and Blinowska, K. J. (1991). A new method of the description of the information flow in brain structures. Biol. Cybern. 65, 203–210. doi: 10.1007/BF00198091

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Lütkepohl, H. (1996). Handbook of Matrices. Chichester: John Wiley.

Nedungadi, A. G., Ding, M., and Rangarajan, G. (2011). Block coherence: a method for measuring the interdependence between two blocks of neurobiological time series. Biol. Cybern. 104, 197–207. doi: 10.1007/s00422-011-0429-7

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pinsker, M. S. (1964). Information and Information Stability of Random Variables and Processes. San Francisco, CA: Holden-Day.

Sameshima, K., Takahashi, D. Y., and Baccalá, L. A. (2014). “Asymptotic PDC properties,” in Methods in Brain Connectivity Inference through Multivariate Time Series Analysis, eds K. Sameshima and L. A. Baccalá (Boca Raton: CRC Press), 113–131. doi: 10.1201/b16550-9

CrossRef Full Text

Sato, J. R., Fujita, A., Cardoso, E. F., Thomaz, C. E., Brammer, M. J., and Amaro, E. Jr. (2010). Analyzing the connectivity between regions of interest: an approach based on cluster Granger causality for fMRI data analysis. Neuroimage 52, 1444–1455. doi: 10.1016/j.neuroimage.2010.05.022

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Takahashi, D. Y. (2009). Medidas de Fluxo de Informação com Aplicação em Neurociência. Ph.D. thesis, University of São Paulo. Available online at: http://www.teses.usp.br/teses/disponiveis/95/95131/tde-07062011-115256/en.php

Takahashi, D. Y., Baccalá, L. A., and Sameshima, K. (2010). Information theoretic interpretation of frequency domain connectivity measures. Biol. Cybern. 103, 463–469. doi: 10.1007/s00422-010-0410-x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Wu, G. R., Chen, F., Kang, D., Zhang, X., Marinazzo, D., and Chen, H. (2011). Multiscale causal connectivity analysis by canonical correlation: theory and application to epileptic brain. IEEE Trans. Biomed. Eng. 58, 3088–3096. doi: 10.1109/TBME.2011.2162669

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

A. Appendix

A.1. Proof of Theorems 1 and 2 and Corollaries 1 and 2

The proofs in this section follow the pattern of those in Takahashi et al. (2010). The chief difference lies in the care needed regarding the order of the products between the defining matrices. Here we exhibit the main proof ingredients for reader convenience, with further details available in Takahashi (2009) and Takahashi et al. (2010).

Proof of Theorem 1 and Corollary 1. Let W = [Y^T Z^T]^T be a second order stationary process satisfying the boundedness condition, using the following well known identity for determinants (Lütkepohl, 1996)

\begin{matrix} \begin{array}{l} \det (S_{W} (ω)) = \det (S_{Z Z} (ω) \\ - S_{Z Y} (ω) S_{Y Y}^{- 1} (ω) S_{Y Z} (ω)) \det (S_{Y Y} (ω)) \end{array} & (A 1) \end{matrix}

leads to

\begin{matrix} \begin{array}{l} C_{Y Z}^{(b)} (ω) = 1 - \det (S_{Z Z} (ω) \\ - S_{Z Y} (ω) S_{Y Y}^{- 1} (ω) S_{Y Z} (ω)) \det (S_{Z Z}^{- 1} (ω)) \end{array} & (A 2) \end{matrix}

\begin{matrix} \begin{array}{l} = 1 - \det (S_{Y Y} (ω) \\ - S_{Y Z} (ω) S_{Z Z}^{- 1} (ω) S_{Z Y} (ω)) \det (S_{Y Y}^{- 1} (ω)), \end{array} & (A 3) \end{matrix}

under Equation (5).

Rewrite bPDC as

\begin{matrix} \begin{array}{l} π_{i j}^{(b)} (ω) = 1 - \det (P_{j j}^{- 1} (ω) \\ - P_{j j}^{- 1} (ω) {\bar{A}}_{i j}^{*} (ω) Σ_{i i}^{- 1} {\bar{A}}_{i j} (ω) P_{j j}^{- 1} (ω)) \det (P_{j j} (ω)), \end{array} & (A 4) \end{matrix}

so that using the following identities proved in Takahashi et al. (2010)

\begin{matrix} P_{j j} (ω) = S_{η_{j} η_{j}}^{- 1} (ω), & (A 5) \end{matrix}

\begin{matrix} S_{ϵ_{i} η_{j}} (ω) = {\bar{A}}_{i j} (ω) S_{η_{j} η_{j}} (ω), & (A 6) \end{matrix}

and for all ω ∈ [−π, π)

\begin{matrix} S_{ϵ_{i} ϵ_{i}} (ω) = Σ_{i i}, & (A 7) \end{matrix}

back substituted into Equation (A4) leads to

\begin{matrix} \begin{array}{l} π_{i j}^{(b)} (ω) = 1 - \det (S_{η_{j} η_{j}} (ω) \\ - S_{η_{j} ϵ_{i}} (ω) Σ_{i i}^{- 1} S_{ϵ_{i} η_{j}} (ω)) \det (S_{η_{j} η_{j}}^{- 1} (ω)), \end{array} & (A 8) \end{matrix}

so that using Equation (A2) shows that the right-hand side of Equation (A8) actually is C^(b)_{ϵ_iη_j}(ω) as we set out to prove. Corollary 1 is immediate from Theorem 1 and Equation (7). □

Proof of Theorem 2 and Corollary 2. Theorem 2 is obtained by rewriting C^(b)_{X_iζ_j}(ω) using Equation (A3) noting that

\begin{matrix} S_{X_{i} ζ_{j}} (ω) = {\bar{H}}_{i j} (ω) S_{ζ_{j} ζ_{j}} (ω) & (A 9) \end{matrix}

and for all ω ∈ [−π, π)

\begin{matrix} S_{ζ_{j} ζ_{j}} (ω) = Θ_{j j}^{- 1} . & (A 10) \end{matrix}

Corollary 2 follows from Theorem 2 and Equation (7). □

A.2. Proof of Theorems 3 and 4

Brillinger, (1981, chapter 10) introduced the idea of canonical coherence for time series. We restate his result under our notation as the following theorem.

Theorem 5 (Brillinger, Theorem 10.3.2) Let X and Y be m₁ and m₂-dimensional time-series jointly satisfying the boundedness condition. For m ≤ min{m₁, m₂}, the following identity holds:

\begin{matrix} C_{X Y}^{(c_{m})} (ω) = λ^{m} (S_{Y Y}^{- 1} (ω) S_{Y X} (ω) S_{X X}^{- 1} (ω) S_{X Y} (ω)) & (A 11) \end{matrix}

\begin{matrix} = λ^{m} (S_{X X}^{- 1} (ω) S_{X Y} (ω) S_{Y Y}^{- 1} (ω) S_{Y X} (ω)) . & (A 12) \end{matrix}

Proof of Theorem 3. From Equations (18), (A11), we have

\begin{matrix} C_{ϵ_{i} η_{j}}^{(c_{m})} (ω) = λ^{m} (S_{η_{j} η_{j}}^{- 1} (ω) S_{η_{j} ϵ_{i}} (ω) S_{ϵ_{i} ϵ_{i}}^{- 1} (ω) S_{ϵ_{i} η_{j}} (ω)) . & (A 13) \end{matrix}

Now, from Equations (A5), (A6), and (A7) it follows that

\begin{matrix} S_{η_{j} η_{j}}^{- 1} (ω) S_{η_{j} ϵ_{i}} (ω) S_{ϵ_{i} ϵ_{i}}^{- 1} (ω) S_{ϵ_{i} η_{j}} (ω) = {\bar{A}}_{i j}^{*} (ω) Σ_{i i}^{- 1} {\bar{A}}_{i j} (ω) P_{j j}^{- 1} (ω), & (A 14) \end{matrix}

which proves Equation (20).

To prove Equation (21), we use Equations (19), (A12) to obtain

\begin{matrix} C_{X_{i} ζ_{j}}^{(c_{m})} (ω) = λ^{m} (S_{X_{i} X_{i}}^{- 1} (ω) S_{X_{i} ζ_{j}} (ω) S_{ζ_{j} ζ_{j}}^{- 1} (ω) S_{ζ_{j} X_{i}} (ω)) . & (A 15) \end{matrix}

Finally, from Equations (A9), (A10), we have

\begin{matrix} S_{X_{i} X_{i}}^{- 1} (ω) S_{X_{i} ζ_{j}} (ω) S_{ζ_{j} ζ_{j}}^{- 1} (ω) S_{ζ_{j} X_{i}} (ω) = S_{i i}^{- 1} (ω) {\bar{H}}_{i j} (ω) Θ_{j j}^{- 1} {\bar{H}}_{i j}^{*} (ω), & (A 16) \end{matrix}

which concludes the proof. □

Proof of Theorem 4. Rewrite bPDC as

\begin{matrix} 1 - π_{i j}^{(b)} (ω) = \det (I - {\bar{A}}_{i j}^{*} (ω) Σ_{i i}^{- 1} {\bar{A}}_{i j} (ω) P_{j j}^{- 1} (ω)) . & (A 17) \end{matrix}

Now, Equation (22) is a straightforward consequence of the relationship between eigenvalues and the determinant of a matrix. A similar argument proves Equation (23). □

Keywords: directed connectivity measures, canonical decomposition, frequency domain, information flow, generalized coherence

Citation: Takahashi DY, Baccalá LA and Sameshima K (2014) Canonical information flow decomposition among neural structure subsets. Front. Neuroinform. 8:49. doi: 10.3389/fninf.2014.00049

Received: 17 January 2014; Accepted: 23 April 2014;
Published online: 30 May 2014.

Edited by:

Daniele Marinazzo, University of Gent, Belgium

Reviewed by:

Katerina Hlavackova-Schindler, University of Life Sciences, Austria
Angeliki Papana, University of Macedonia, Greece

Copyright © 2014 Takahashi, Baccalá and Sameshima. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Luiz A. Baccalá, Telecommunications and Control Department, Escola Politécnica, University of São Paulo, Av. Prof. Luciano Gualberto, trav. 3,#158, São Paulo, SP 05508-900, Brazil e-mail:YmFjY2FsYUBsY3MucG9saS51c3AuYnI=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.