Information theoretic measures of causal influences during transient neural events

Introduction: Transient phenomena play a key role in coordinating brain activity at multiple scales, however their underlying mechanisms remain largely unknown. A key challenge for neural data science is thus to characterize the network interactions at play during these events. Methods: Using the formalism of Structural Causal Models and their graphical representation, we investigate the theoretical and empirical properties of Information Theory based causal strength measures in the context of recurring spontaneous transient events. Results: After showing the limitations of Transfer Entropy and Dynamic Causal Strength in this setting, we introduce a novel measure, relative Dynamic Causal Strength, and provide theoretical and empirical support for its benefits. Discussion: These methods are applied to simulated and experimentally recorded neural time series and provide results in agreement with our current understanding of the underlying brain circuits.


Introduction
During both wakefulness and sleep, the mammalian brain is able to implement numerous functions key to our survival with extraordinary reliability.This implies precise coordination of transient mechanisms at multiple spatiotemporal scales ensuring both the synergy between brain regions contributing to the same task, and the non-interference between network activities in charge of different functions.Evidence for such transient mechanisms is provided by the variety of neural events that can be observed in brain activity across multiple structures.Such phenomena may occur in response to stimuli, as has been observed for gamma oscillations [Tallon-Baudry andBertrand, 1999, Fries, 2015], and may play a role in the dynamic encoding of information.However, key phenomena can also occur spontaneously, as exemplified by the variety of events occurring during sleep.These include sharp-wave ripple (SWR) complexes that occur in the hippocampus during the same sleep stages, and take the form of a slow deflection (the sharp wave, SW) superimposed with a fast short-lived oscillation (the ripple).SWR has been extensively studied and a large set of evidence supports its key role in episodic memory consolidation and the recall of previous experiences [Ego-Stengel and Wilson, 2010, Diba and Buzsaki, 2007, Lee and Wilson, 2002].
In order to understand how these transient phenomena operate mechanistically, causality measures based on observed neural time series can be very helpful to quantify the underlying transient influences between brain structures.Several measures of causality have been proposed, starting in the econometrics literature with Granger causality (GC) [Granger, 1969], relying on vector auto-regressive models.This measure can be generalized to an information-theoretic quantity: Transfer Entropy (TE) [Schreiber, 2000].In the present work, we focus on "model-free" quantities such as TE that are defined independently of the specific functional relationships entailed by a particular model of the dynamics.TE and GC have been used to assess the significance of causal links, but also the "strength" of these links.However, whether they are appropriate quantities to measure such strength is debated [Stokes andPurdon, 2017, Janzing et al., 2013].
Structural Causal Models (SCM) also allow causality measures to be evaluated by their ability to reflect putative interventions targeting a specific mechanism composing the system under consideration.In this context, the relevance of causality measures has been investigated in Ay and Polani [2008], which discusses how to account for the effect of knockout experiments, and introduces a measure of information flow, emphasizing its desirable properties.Janzing et al. [2013] provides interesting theoretical justifications for this kind of measure and extends it to define causal strength (CS) of an arbitrary set of arrows in a graphical model.With respect to TE, information flow and CS have the benefit to be local, in the sense that it depends only on the direct causes of the observed effects and their associated mechanisms.This makes CS a good candidate to measure transient connectivity changes during non-stationary neural events, as they would be able to restrict themselves to causal influences that take place at a specific time, associated to specific arrows in the "unrolled" causal graph describing time-varying interactions.
However, we will argue that such a measure may not reflect well a key element for neuroscientists: the role played by transient dynamics occurring in the "source" region in driving events in the target region.Based on the potential outcome framework [Rubin, 1974], causal reasoning has also been used to provide intuitive measures of the causal impact of a specific phenomenon happening at a given time point [Brodersen et al., 2015], by comparing it to a scenario where this phenomenon does not happen.This inspired us to take into account the peri-event change of signals compared to a pre-event stage as another component of causal influence.Therefore, in this paper, we use the lens of interventions in SCMs to propose a principled quantification of the strength of causal interactions in peri-event time series, i.e. dataset collected specifically around the times of occurrence of an identified phenomenon in neural signals.Based on information theoretic analyses, we assess the relevance and issues raised by a time-varying implementation of GC, TE and causal strength (DCS), and extend DCS to a novel measure, the relative DCS (rDCS), to quantify causal influences reflected by both the connectivity and the event-related change at the cause.We show theoretically that rDCS is effective in uncovering dynamic causal influences for task-dependent events that are often accompanied with a deterministic component, as well as for spontaneous events.We also demonstrate how choices made for aligning peri-event time series collected across multiple occurrences of these events may bias causality measures, and we propose a proper way to align the detected events to recover the ground truth causal direction for a uni-directionally coupled system.The benefits of rDCS over TE and DCS is demonstrated by both simulated toy models and neurophysiological recordings of SWRs.Overall, our results suggest that rDCS helps better understand the causal interactions between transient dynamical events, and thus uncover elementary mechanisms that shape brain activities.

Methods
2.1 General principles for the analysis of event-related causal interactions

Interventions in SCMs
One key question in causality is estimating the effect of manipulations of the system of interest from data, which boils down to comparing two "worlds" or scenarios [Shpitser and Pearl, 2008]: the original world where no intervention is performed, and the post-intervention world.
Both original and post-intervention worlds typically cannot be measured simultaneously (e.g."treatment" and "no treatment" in the same patient).However, estimating their differences arguably forms the basis of causal investigations in empirical sciences, for example by performing randomized experiments on multiple instances of a system designed with mutually exclusive treatments to infer the outcome of manipulations of this system.However, even performing carefully controlled experiments on close to identify instances of a system is often challenging in reality, as many physical and physiological phenomena cannot be easily reproduced or manipulated.This is typically the case for spontaneous transient neural events investigated in this paper, where neurophysiological experimental techniques limit the understanding and control of their conditions of occurrence, as well as the ability to precisely modify some aspects of network activity to test assumptions on the underlying mechanisms.
Under additional assumptions, the framework of SCMs (as briefly introduced in Supplementary Section A), can be leveraged to infer the outcome of manipulations based on observational data only.Assuming those assumptions are met (which is out of the scope of the present work), the SCM inferred from data can be modified using a family of operations named interventions to model the pseudo-manipulation of the system described by the SCM [Pearl, 2000, Peters et al., 2017].Intervening typically refers to modifying the structural equation associated to one node in the SCM, to study the modifications it brings about in the system.When interventions are performed, the only affected mechanistic relations (represented by arrows in an SCM) are the ones between the intervened nodes and their parent nodes.For instance, one can impose a fixed deterministic value to a node, or that this node's variable is drawn from a given distribution, independently from other variables in the SCM [Janzing et al., 2013, Correa and Bareinboim, 2020, Peters et al., 2017, Chapter 3].Both such interventions lead to an intervened causal graph where the arrows between the node intervened upon and its parents are removed.
Importantly, while an intervention modifies the graph associated to an SCM, the variables' joint distribution can still be obtained by exploiting the intervention knowledge, observational data and prior assumptions related to the unaffected conditionals.Mathematically, for an SCM where a directed acyclic graph (G) is described by the following structural equations PA j are the variables indexed by the set of parents of vertex j in G. Intervening on V k consists in replacing its structural assignment by a new one: The resulting modified distribution Peters et al. [2017, chapter 6]).Meanwhile, other structural equations and the distribution of their associated exogenous variables are kept unchanged.
As an example, Figure 1A shows two uni-directionally coupled brain regions where transient events are observed and the corresponding SCM.To obtain an intervened system mimicking the experimental ablation of anatomical connectivity, Figure 1B shows the intervention performed in the SCM: cut the causal arrow from X 2 t−1 to X 1 t and feed X 1 t with an independent copy of X 2 t−1 (denoted as X 2 t−1 ).The rationale behind this operation is that we want to suppress the dependency between the two nodes while maintaining the same level of input activity in the target node.In a context where nodes correspond to single neurons, this can be thought of as a proxy for the experiment of cutting the axon of t with an independent copy of a reference state of the cause node X 2 t .(D) (upper) A time course of observed peri-event signals of Region 1 (X 1 t , red) and Region 2 (X 2 t , grey) reflecting the actual condition.The blue dashed time course represents the post-intervention scenario where X 1 t evolves without the influence from X 2 t .The interval marked by grey dashed lines refers to the reference state before the occurrence of events in X 2 t .(lower) A proper causality measure should quantify the difference between the original and post-intervention scenarios at each time point.afferent neurons, while injecting a current to maintain the baseline level of excitation in the target neuron, such that it is kept in naturalistic conditions.

Both activity in the source region and connectivity causally influence the target region
At first glance, the aforementioned scenario seems to straightforwardly contrast the causal effect we wish to measure with a reasonable baseline.However the operation of feeding the effect node X 1 t with an independent copy of the cause node X 2 t−1 at the same time t − 1 still implicitly incorporates the influence of the event-related transient changes undergone by X 2 at the time t − 1 on X 1 t , as the distribution of X 2 t−1 may strongly differ from what it is during baseline activity (before the event onset).By removing the causal link in the intervened SCM, we are measuring the influence of connectivity on the target region at the time the event happens in the "cause region", but do not contrast this influence to a situation where the event would not have happened.We thus argue that a better reference scenario for testing the influence of an event in a source region on a target region would both "remove the connectivity between two regions" and also "remove the event-related changes in the cause region" (Figure 1C(upper)).This would account for both the cases of stimulus-triggered events and spontaneous events, as addressed in Section 2.4.2 and Section 2.5.We refer to the causal impact and regression discontinuity methodologies to justify how to implement this in the next section.

Causal impact and regression discontinuity to remove the existence of cause events
Using the potential outcome framework [Rubin, 1974], Brodersen et al. [2015] introduced a Bayesian approach to quantify the causal impact of an event at a given time point n on an observed time series {y k }.It relies on observed pre-event data {y 1 , ..., y n }, covariates and priors on time series parameters to extrapolate a distribution of potential outcome sample paths { ỹn+1 , ..., ỹm } under the counterfactual scenario that no event occurred.Comparing the posterior predictive density of these unobserved counterfactual responses to the observed time course y n+1 , ..., y m (under intervention) thus allows quantifying the effect of the event.Such contrasting strategy is also present in a variant of regression discontinuity designs in economics and social sciences.In particular, regression discontinuity in time assesses causal effects by comparing outcomes' distributions on time intervals before and after the onset of a policy change [Hausman and Rapson, 2018].
Both approaches contrast the properties of the models over different time intervals, one before the event and one after.Inspired by these works, we suggest estimating the post-interventional scenario based on the dynamics of the intervened distributions and observed distributions at reference time points where the intervention had not yet occurred (Figure 1D(upper)).
Specifically, we propose that a reference scenario where both connectivity and the influence of transient events on the cause are removed can be approximated by replacing again the variable fed into the target node, as described above, but this time it should be replaced by an independent copy of the activity in the cause region at a reference time point where no event has occurred yet (Figure 1C,D).We will elaborate on implementation aspects in Section 2.4.

Candidate time-varying causality measures
We now present the time-varying versions of commonly-adopted causality measures and discuss their properties in the context of transient event-based causality analysis, in light of the above principles.The candidate measures include time-varying extensions of Granger causality (GC), Transfer Entropy (TE) and Causal Strength (CS) [Janzing et al., 2013], to address the non-stationarity of transient events.To make the comparison quantitative, time series are modeled as linear vector auto-regressive (VAR) model, that we specify with time-inhomogeneous (or time-varying) coefficients to match our context.In the SCM framework, a bi-variate time-varying VAR model can be represented by the causal graph of Figure 2A.

Granger causality
Granger causality (GC), as well as its information-theoretic extension, Transfer Entropy (TE) is based on Wiener's principle of causality, based on which Granger [1969] defines (Granger-)causality from X 2 to X 1 if knowledge of X 2 p,t , in addition to X 1 p,t , will allow better prediction of X 1 t .This can be interpreted as a comparison between two prediction scenarios: • Scenario 1: predict X 1 t with both X 1 p,t and X 2 p,t , • Scenario 2: predict X 1 t with only X 1 p,t , where X 1 p,t and X 2 p,t refer to the respective p previous past points of each time series, without further specification, such that in our notation p can be potentially infinite.
The VAR model describing the first scenario is referred to as the full model [Geweke, 1984], where the first variable X 1 is dependent on both variables X 1 and X 2 : where the t subscript in all parameters (a t , b t , k 1 t , σ 2 1,t ) comes from our time-inhomogeneity assumptions and is not standard in the GC literature.An estimate of the innovation variance of X 1 t (σ 2 1,t in Eq. 1) is the mean squared residual error (σ 1,t ) of the forecast of X 2 t under the assumption that both X 1 p,t and X 2 p,t contribute to X 1 t .Under Scenario 2 where X 1 t is predicted only by X 1 p,t , we have a reduced model (2)  2) model defined in Eq. 1 with uni-directional coupling from X 2 to X 1 .(B) Conditioning on both past states of X 1 and X 2 blocks all paths from X 1 t−3 to X 1 t .Blue nodes represents conditioned nodes while blue arrows marks blocked paths.Orange arrows marks the unblocked paths.(C) Conditioning on past states of X 1 alone blocks all paths from X 1 t−3 to X 1 t in the uni-directional case.Color codes are the same as (B).(D) Conditioning on past states of X 1 alone does not block all paths from X 1 t−3 to X 1 t in the bi-directional case.Color codes are the same as (B).(E) The intervention implemented in devising CS is to break the causal arrows and send an independent copy X 2 p,t to X 1 t at each time point.This diagram applies to both CS and DCS (Section 2.2.3).(F) The intervention implemented in devising rDCS is to break the causal arrows and send an independent copy of the stationary state X 2 p,t ref to X 1 t at each time point.
where the model order p , the coefficient a , the innovations mean k 1 and innovations variance σ 2 1 are different from the corresponding terms in Eq. 1 and should be re-estimated.
If X 2 Granger-causes X 1 , then the full model should fit the data more accurately compared to the reduced model as measured by the estimated variance σ 2 1,t , which should be larger than the one of σ 2 1,t .Then the Granger causality can be defined as the log ratio of the residual variance between the reduced model and the full model, which leads to estimating the magnitude of Granger causality as where the factor 1/2 is chosen for consistency with TE (see Section 2.2.2).While the above linear VAR model is the most widely used, Granger causality has been extended to non-linear models following the same predictive approach (e.g.Marinazzo et al. [2008Marinazzo et al. [ , 2011]], Diks and Wolski [2016]).

Transfer Entropy
TE is an information-theoretic implementation of Wiener's principle, where the performance of prediction between the above two scenarios is quantified with conditional entropy.In information theory, the conditional entropy H(X|Y ) = E y [H(X|Y = y)], measures the amount of information needed to describe the outcome of random variable X given that the value of another random variable Y .In the context of Wiener's principle, this can be used as a generalized way of quantifying the quality of the prediction of future values based on past ones: the larger H(future|past), the worse the quality of the prediction is.
Transfer Entropy (TE) quantifies to which amount X 2 is Granger causes X 1 and is defined as Interestingly, using the Kullback-Leibler (KL) divergence D KL between two probability densities D KL (p||q) = p(x) log p(x) q(x) dx ,TE can be rewritten as an expected KL-divergence between the corresponding conditional probabilities, thereby contrasting the two above mentioned scenarios: As noticed in Barnett et al. [2009], under stationary Gaussian VAR assumptions the analytic expression of Gaussian entropy applied to Eq. 4 leads to GC(X 2 t → X 1 t ) = TE(X 2 t → X 1 t ) in the limit of unbiased variance estimation, such that TE appears as a strict generalization of GC, and can be estimated by GC in the context of Gaussian VAR models.TE and GC statistics are two commonly used measures of causal strength for investigating interactions between brain regions (e.g., Wibral et al. [2013], Besserve et al. [2010Besserve et al. [ , 2015]]).Based on the observational conditional distribution of the neural signals being analyzed, these two measures estimate a quantity that is easily interpretable from a forecasting perspective.However, they have some limitations with regard to their interpretability as interventions in the SCM framework and in the time varying setting that interests us in this paper.
A key issue is that the reduced model ignores but does not remove the influence of past values of X 2 (X 2 p,t ) on X 1 t by marginalizing with respect to them.It can be shown that such change does not preserve the SCM structure, and leads to violations of the Markov properties due to the implicit dependency on the mechanisms relating X 2 p,t and X 1 p,t , which manifest themselves through the p(X 2 p,t |X 1 p,t ) term of the marginalization equation [Ay andPolani, 2008, Janzing et al., 2013]: As a consequence, the reduced model cannot be interpreted as an intervention on the original SCM that would result in a model where arrows associated to the causal influence of interest would be removed.
Besides, TE estimation is non-local.While one can exploit classical model order selection techniques (e.g.Akaike Information Criterion [Akaike, 1974[Akaike, , 1998] and Bayesian Information Criterion (BIC, Gideon et al. [1978], Shao et al. [2022]) to select the best order for the full model, in case of bi-directional coupling, the reduced model of Eq. 2 is misspecified (in a generic case) for any finite order.This can be easily seen by exploiting the d-separation criterion (see Supplementary Section A), as illustrated in Figure 2. Figure 2B shows the estimation in the full model, where conditioning on both X 1 p,t and X 2 p,t blocks all the paths from X 1 t−3 to X 1 t such that X 1 t−3 and X 1 t are conditionally independent.For such a uni-directionally-coupled system, a finite order for the reduced model also guarantees such conditional independence, as seen in Figure 2C where all paths are blocked by conditioning.However, in the same system with bi-directional coupling, for any k > p (i.e.k > 2), there is always a path from 2D shows, 2 paths from X 1 t−3 to X 1 t are not blocked by conditioning on X 1 p,t .Under faithfulness assumptions, this implies that there is conditional dependence between X 1 t and its remote past samples, no matter how many finite past states we are conditioning on.This further implies that to minimize the forecast error of X 1 t in the reduced model one should ideally exploit the past information of this time series up to p = +∞.This issue has been both raised and addressed in the literature, in particular by resorting to Autoregressive Moving Average models and state space models for defining an appropriate reduced model (e.g.[Barnett andSeth, 2015, Solo, 2016]).However, this remains an important limitation when extending TE to time-varying versions, where the model is assumed to be stationary at best locally in time.For example, when defining a non-stationary VAR model as Eq. 1, we assume a different linear model in each 1-point time window.The non-locality of TE is particularly problematic for such a time-varying model assumption because of the implicit influence of past activities on this quantity.

Dynamic causal strength
To overcome the limitation of TE and GC, Ay and Polani [2008] has proposed a measure of information flow to quantify the influence of some variables on others in a system, which has been further studied and generalized in Janzing et al. [2013] as a measure of the Causal Strength (CS) of an arbitrary set of arrows in a graphical model.In the present paper, we define CS in the context of time-inhomogeneous vector autoregressive processes and their associated unrolled causal graph, and thus call it Dynamic Causal Strength (DCS).
DCS can be naturally defined using the SCM interventional formalism (Pearl [2000], Peters et al. [2017], see also Supplementary and Section 2.1.1).Briefly, interventions are performed on nodes in order to remove the specific arrows from the causal graph whose influence we wish to quantify.In agreement with Ay and Polani [2008] and Janzing et al. [2013], in the context of inhomogeneous VAR models (as illustrated in Figure .2A), an appropriate intervention to address the causal inference from X 2 t to X 1 t can be designed as the following soft intervention (shown in Figure . 2E): replace the arrow X 2 p,t → X 1 t by an arrow injecting instead X 2 p,t , an independent copy of X 2 p,t with the same marginal.Importantly, compared to Janzing et al. [2013] but in line with Ay and Polani [2008], we propose to replace the multivariate vector X 2 p,t with a copy without enforcing independence between the components of this vector, in order to preserve the dependency between the successive past time points of X 2 , as those are often strongly correlated in practice.The intervention distribution p DCS models the post-interventional world after removing the causal arrow from X 2 p,t to X 1 t and results in the entailed conditional probability which does not depend on p(X 2 p,t |X 1 p,t ) anymore, in comparison to Eq. 6. DCS then quantifies the KL divergence between the distributions of X A parametric formulation under linear Gaussian model assumptions is given in Supplementary Section D.5.

Near deterministic behavior of TE and DCS
The analysis of transient neural events leads us to analyze signals that have limited stochasticity in several respects: on the one hand, strongly synchronized oscillatory signals can be represented by VAR models with low innovation variance, relative to the variance of the measured signal.Moreover, when a study focuses on a reproducible type of transient pattern, it often has a reproducible component, with little variability across collected trials.Such a situation can be modeled with a time-varying deterministic innovation, exhibiting strong variation of its mean across time, but no or little variance.We investigate the theoretical properties of TE and DCS in this regime, showing a benefit of DCS with respect to TE, but also remaining limitations.

TE behavior for strongly synchronized signals
Besides, it has also been pointed out that the definition of TE in Eq. 5 has some other non-intuitive implications [Ay andPolani, 2008, Janzing et al., 2013].In particular, there are situations in which TE(X 2 → X 1 ) almost vanishes, although the influence is intuitively clear.How frequent are the practical situations in which we have these detrimental effects is unclear; however, theoretical analysis suggests that this can happen when the time series are strongly correlated.
To see this, we can derive with Eq. 5 in the case where X 2 is a deterministic function of X 1 such that TE vanishes.Take the special case where X 2 t is proportional to X 1 t such that X 2 t = kX 1 t , representing a time-wise synchronization of the two signals, the conditional variance will be However, a strong correlation between two observed time series does not necessarily imply that causal interactions between them are weak, from an SCM perspective.We will investigate this case in Section 3.1 and compare with the results of DCS to show that DCS does not suffer from this non-intuitive vanishing problem.

Insensitivity of TE and DCS to deterministic perturbations
While several intuitive properties make DCS a good candidate to quantify causal influences, we exhibit a counterintuitive property common to TE and DCS in the context of peri-event time series.Transient neural events are mainly investigated in two types of analyses: 1) stimulus-triggered (or response-triggered) data that are temporally aligned by task (or response) onset and 2) event-triggered data where occurrences of a type of brain-activity pattern are detected along the time course of the recordings (manually or algorithmically) and used to create peri-event trials.
In both cases, neural activities are likely to have a deterministic component appearing in the peri-event ensembles, due the similarity of the response to successive stimuli in case 1), or due to the similarity of the neural patterns detected in the recordings in case 2).Here we will show that, in a linear setting, TE and DCS are insensitive to such a deterministic component.Specifically, TE and DCS values are unaffected by interventions on the innovations' mean at any time point.
When varying α, this models a (soft) intervention on the second time series.Then it can be easily shown that the expected time course of X 1 is This witnesses the causal influence of X 2 t0 on values of X1 t at subsequent times, which for large α results in large deviations from the baseline expectation of X 1 t for t prior to t 0 .Intuitively, one may expect that a quantification of the magnitude (strength) of the causal influence of X 2 on X 1 should be larger for larger α, as a transient of larger magnitude propagates from X 2 to X 1 .From a neuroscientific perspective, this could model an experimental setting where one brain region is electrically stimulated with increasing strength to detect whether it is anatomically connected to another.Obviously, the magnitude of the stimulation is expected to be critical to elicit a response in the target region.However, TE and DCS actually turn out to be insensitive to such stimulation.
We will show this in the more general setting of the VAR(p) model of Eq. ( 1) and Supplementary Eq.S7.Consider the intervention at time t 0 that transforms η t0 to η t0 + α.To compute the intervention distribution of the new variables denoted ( X1 , X2 ) changes with respect to the distribution of the original variables, we can examine the difference with respect to (X 1 , X 2 ) that has the same innovations, except for η 1 t0 for which we remove a constant α. (X 1 , X 2 ) is then distributed according to the original distribution (before intervention), and the difference which is a deterministic difference equation with a unique solution making X and X coincide before the intervention 1 (U t , V t ).As a consequence, the intervention distribution P is a shifted version of the original distribution: which implies the same for conditional marginal distributions, e.g.
As a consequence TE on the intervention distribution writes The same reasoning can be applied to DCS leading to invariance as well (see Supplementary Section B).
As a consequence, such deterministic causal influences cannot be detected by TE or DCS for a broad class of models.This result is not what we would expect from a measure of influence, because in the above example of Eq. 8, setting a large α intuitively leads to a large influence of X 1 on X 2 provided c = 0. Provided that TE and DCS can be made arbitrarily small by reducing Σ t , TE and DCS would detect no influence despite this strong effect on the mean of X 2 t .As elaborated above, this is in contrast to what would be expected in the neuroscientific context, and directly relates to the observational, event-related setting that we investigate: the deterministic component is due to the alignment of the data with respect to an event of interest, and we do not have a different condition to contrast the occurrence of this event with what would have happened in its absence.This analysis calls for building a synthetic baseline condition that would allow deterministic changes to be detected.

Motivation
Therefore, following the guidelines for event-based causality (presented in Section 2.1), we propose a novel measure, the relative Dynamic Causal Strength (rDCS), as a modification of DCS.This measure aims at taking into account the influence of event-based changes in the cause signals independent from the connectivity (the mechanism), and notably those driven by deterministic exogenous inputs.In the specific problem we are investigating, the cause is the past states of X 2 as X 2 p,t , while the mechanism can be represented by the model in Eq. 1 and symbolized by the corresponding causal arrow in the SCMs.In the measures we have introduced so far, DCS only exploits the case where the causal arrow is deleted as a post-intervention scenario but does not address the change in the cause itself.
In the case where X 2 experiences a deterministic exogenous input in a transient window, the cause increases significantly; thus, intuitively, the causal effect should also be enhanced even if the causal arrow remains the same (i.e., the coefficient b stays unchanged).Apart from intervening on the causal arrow, further intervention can be implemented on the cause node to construct a post-intervention scenario where the cause receives no time-varying innovations.Therefore, inspired by causal impact (Section 2.1.3)which characterizes the difference between the current state and a baseline state, we propose (additionally to DCS) to replace the marginal of X 2 p,t by the marginal X 2 p,t ref for a reference period t ref .
The reference period t ref is typically chosen to be a stationary period before the occurrence of the transient deterministic perturbations and statistics of X 2 p,t ref can be averaged by statistics of X 2 p,t within this period.This leads to the relative Dynamical Causal Strength (rDCS) The implementation of rDCS given a VAR model is derived in Supplementary Section D.6.
Intuitively, the relativeness originates from the comparison between the current past states X 2 p,t and the reference past states X 2 p,t ref .
It is then natural to predict that in the uni-directional case, rDCS(X 2 → X 1 ) = DCS(X 2 → X 1 ) for any reference time t ref if X 2 is stationary because stationarity implies that the marginal distributions of X 2 p,t ref and X 2 p,t are identical.As a particular case, this result implies that a transient loss of causal link from X 2 to X 1 will lead to rDCS = 0, while for a stationary bivariate system, DCS = rDCS is constant.

Sensitivity of rDCS to deterministic perturbations
The definition of rDCS implies sensitivity to deterministic perturbations.Indeed, taking the example in Section 2.3.2, the reference state X 2 p,t ref is unaffected by the deterministic perturbation.Consequently, the translational invariance does not hold for the intervention distribution because rDCS( because the denominators do not allow equating the integrated terms by change of variables in the generic case.Therefore rDCS is capable of uncovering transient causal influences between stimulus-triggered events exhibiting a deterministic waveform.

Alignment for spontaneous events
The relevance of peri-event time-varying causal analysis using the proposed rDCS, as well as TE and DCS, depends on the modeling assumptions of peri-event data.In particular, we assume that the samples at a given peri-event time point are sampled i.i.d.across trials form the same ground truth distribution [Shao et al., 2022] associated to the neural events we want to study.This is easily satisfied for stimulus-evoked events, as addressed in Section 2.3.2 and Section 2.4.2, with a intrinsic reference time for occurrence (i.e., the triggering time).However, analyzing spontaneous events, like the transient events during sleep, requires a detection procedure to locate their occurrence.This commonly involves the procedures of transforming the original signal into a detection signal that amplifies the event-related features (i.e. by filtering or template matching) and finding the events where the detection signal is over a certain threshold.Specifically, the events detected in this way are often aligned by the local peaks of peri-event signals, and this alignment may not reflect perfectly the ground truth distribution of the events.
While alignment may seem a trivial step at first blush, its influence on VAR model estimation turns out to be critical.This could lead to biased estimation of event statistics and peri-event dynamics (due to selecting only data over threshold and gathering local maxima together), resulting in misleading characterization of causal interactions (e.g.wrong causal directions).Thus we will discuss here how to align the events appropriately such that the causal direction between transient events can be better identified with the proposed rDCS.
To model the effect of a threshold-based detection and alignment, based on Bareinboim and Pearl [2012] and Bareinboim et al. [2014], we can modify the SCM in Figure 2A to incorporate an additional node S representing the selection variable, which is a binary variable indicating the time window is selected if and only if S = 1 (see Supplementary Section A for background).Typically S is defined by testing whether a continuous random variable goes over a predefined threshold.This continuous RV is itself a function of the time series values in a sliding time window, corresponding for example to a measure of the match between the time series and a predefined template.We can a priori choose S to depend either on the cause variable X 2 t (Figure 3A) or on the effect variable X 1 t (Figure 3B).We look at the peri-event time t relative to the reference time t.This selection based on variable S is a priori not perfect, in the sense that it will not recover exactly the set of peri-event time series that we initially wish to analyze, i.e. those associated to a biologically relevant pattern of activity.Assuming that the detection method (e.g. the detection template) is well chosen, and the detection threshold is high enough, selection based on S = 1 will typically "over-select", i.e. excluding some peri-event time series that would actually be relevant for our analysis.Figure 3C(left, upper right) illustrates how thresholding selects only subset of peri-event trajectory samples at t = 0 in a simulated scenario.This over-selection can then be modeled as sampling peri-event data from a conditional peri-event distribution p(X|S), while we are interested in analyzing a ground truth distribution p(X).This conditioning may induce a so-called selection bias in the estimation of quantities we are interested in, notably the conditional distributions that enter the calculations of TE, DCS and rDCS.The impact of such bias on those quantities as been investigated in Bareinboim and Pearl [2012], Bareinboim et al. [2014] within the SCM framework, as we describe in the following.
For simplicity and consistency with the Results section, we will restrict ourselves to models with a unidirectional causal effect (either X 1 → X 2 or X 2 → Z 1 ) and that S is only dependent on a finite number of negative peri-event times (t ≤ 0) as in the case of a causal Finite Impulse Response (FIR) filter (for other cases, refer to Supplementary Section C.2. Figure 3A, B illustrate in this setting that the causal arrow (X 2 → X 1 ) can be recovered at any peri-event time only when the selection node depends on the cause variable (see Supplementary Section C.2 for justification).Specifically, this means that P X 1 t | X 1 p,t , X 2 p,t , S = P X 1 t | X 1 p,t , X 2 p,t for the SCM in Figure 3A.For the opposite direction, P X 2 t | X 1 p,t , X 2 p,t , S = P X 2 t | X 2 p,t for negative peri-event time t ≤ 0. For the case where S depends on the effect variable, P X 1 t | X 1 p,t , X 2 p,t , S = P X 1 t | X 1 p,t , X 2 p,t and P X 2 t | X 1 p,t , X 2 p,t , S = P X 2 t | X 2 p,t for negative peri-event time t ≤ 0 (see also Supplementary Section C.2.The S-dependent and S-independent conditionals are visualized in Figure 3D for an example VAR(1) model, as described in Section 2.3.2,where the innovations η 1 t and η 2 t are drawn from a uniform-distribution. Similarly, the conditional model of the post-intervention scenario for rDCS with selection node depending on the cause Therefore, the KL divergence in Eq. 9 can always be estimated correctly when selecting based on the cause values, while this does not hold for other directions or selecting on the effect.As rDCS is defined as the expectation over the KL divergence over the past states X 1 p,t and X 2 p,t (Eq.9), the estimated rDCS(X 2 t → X 1 t ) also depends on the unbiased sampling of the joint probability of X 1 p,t and X 2 p,t .We argue here that this is normally satisfied as the threshold is set for a filtered detection signal but not for the original ones such that the latter are not affected.In the most extreme case where the detection signal is the observation itself (as seen in Figure 5), the estimated rDCS(X 2 t → X 1 t ) might be slightly amplified because the selected distribution tends to be biased towards larger values (also see Eq. 9), while DCS(X 2 t → X 1 t ) might be underestimated according to Eq. 7. Thus its directionality is still preserved.Next, we would like to clarify the relationship between the selection node, thresholding and alignment.Assume that there is a hidden state underlying the observed signals which are accessible from the data.The perfect alignment (considered ground truth) refers to the condition where the hidden states are identical for all trials at each peri-event time t in an extracted event ensemble, as shown in Figure 3C(left) for t = 0.In the S-dependent SCM, the aforementioned selection bias due to thresholding is confined for samples at each peri-event time t , as shown in Figure 3C(upper right).We refer to this alignment situation as (events) aligned by single-time selection of the selected variable.However, as the hidden state is not known, by thresholding over the whole observed signal one often detects neighboring time points or points at neighboring local maxima (e.g., Figure 3C(lower right)).Selecting all these points is smoothing or pooling over all these neighboring points, and the corresponding alignment is named as (events) aligned by smoothed selection of the detected signal or variable.Notably, practically, a common way is to select the local peaks as the reference points, which can be understood as a non-uniform subsampling of the smoothed samples.Thus in the presented experiments, we will illustrate the results with the local peak alignment such that it is more accessible by the readers.
Therefore, we propose that the selection bias-related recoverability theory can be applied to event ensembles aligned by local peaks over the threshold.In this case, while event ensembles are aligned by the true cause variable, the strength of connectivity (the arrow X 2 t → X 1 t ) is not affected by thresholding (i.e., by the selection node) and so is the rDCS in the causal direction (rDCS(X 2 t → X 1 t )).Although rDCS is biased for the other direction when aligned by the effect variable, for uni-directionally coupled systems the bias is small such that the contrast between two directions is preserved.As the true causal direction is unknown, we thus propose further that, to investigate the dominant causal direction between two event ensembles, we should focus on comparing the causality measures (TE, DCS and rDCS) for each direction when the events are aligned on the putative cause, i.e.,

Results
In this section, we first focus on illustrating the properties of TE, DCS and rDCS with simulated toy models.The problem of vanishing TE occurring with synchronized signals and the benefits of DCS in the same situation will be investigated in Section 3.1.Next, we simulate a simple uni-directionally coupled VAR(4) system with rhythmic perturbations of the cause variable to generate transient events, where we will show that rDCS is able to reflect the change of causal effects due to this perturbation while TE and DCS fail.We also study the influence of the alignment method in the same example, as well as in empirical data of SWRs from uni-directionally coupled brain regions.

The case of strongly-correlated signals
As mentioned in Section 2.2.3, TE does not capture well causal influences when the cause and effect signals are strongly correlated with each other, contray to DCS.Here, to illustrate such contrast, we simulate a bivariate dynamical system in the form of two synchronized continuous harmonic oscillators x(t) and y(t), with uni-directional coupling (i.e., x(t) driving y(t)): In this system, x(t) is designed as an under-damped oscillator (ζ x = 0.015722) which approximately oscillates at a period T x = 200 samples corresponding to natural (angular) frequency ω x = 2π/T x = 0.0314 rad/sample.To achieve synchrony, y(t) is also designed as an under-damped oscillator ( ζ y = 0.2) whose intrinsic oscillation gradually vanishes and finally follows the oscillation of x(t) with a coupling strength of c = 0.098.For y(t), T y = 20, ω y = 2π/T y = 0.314.We also add small Gaussian innovations to both oscillators: n x ∼ N (0, 0.02), n y ∼ N (0, 0.005).Adding this noise allows fitting a VAR model to the the signals to assess the causal interactions with TE and DCS.VAR parameter estimation would fail with deterministic signals by causing the covariance matrix estimates to be singular.
Using the Euler method with a time step of 1 and random initial points (N (0, 1)), we simulated 2000 trials of this uni-directionally coupled system with 1000-point length.We discarded the first 500 points to ensure that the time series reach a sufficient level of synchronization.We can see this system as a stationary VAR(2) process because numerical simulation with the Euler method generates data with its past two states.
Figure 4 (left panel) shows the results of time-varying TE and DCS for assessing the causal effects between x(t) and y(t).Calculation is performed in both the ground truth direction (x(t) → y(t)) and the opposite direction.We first look at the control experiment.Consistent with the system's stationarity, TE is constant in both directions while being higher in the ground-truth direction.DCS in the ground-truth direction stays at a relatively high level, despite some small oscillation under a frequency similar to the intrinsic oscillation frequency of x(t).
With respect to the detection of causal direction, both measures are able to detect the correct direction (i.e., causation for x(t) → y(t) is much larger than in the opposite direction).It is also reasonable that DCS in both directions is higher than TE, according to its definition in section 2.2.3.However, from the control experiment, we cannot conclude that the smaller TE values are due to its definition or due to the strong synchrony in the signals.
Therefore, we introduced a transient decrease of the noise variance in the cause signals (x(t)).The logic of designing this transient change is the following: the level of synchronization will increase with weaker noise, but the system and input magnitude remain the same because the contribution of the noise change to the signal amplitude is negligible; thus if TE is insensitive to the level of synchronization of signals, its values are expected to stay constant.However, as the results show in Figure 4 (right panel), there is a transient decrease of TE during the interval where noise variance is decreased, suggesting that TE performs poorly in the cases where the cause and effect signals are strongly synchronized.

The case of deterministic perturbations
In this section, we directly address the benefits of rDCS over TE and DCS when applied to signals driven by deterministic perturbations.To illustrate this specific property, we designed some simple transient events perturbing the innovation parameters of a stationary VAR process with uni-directional coupling.The events are generated by feeding the cause signal with innovations with non-zero time-varying means, such that both signals will exhibit temporal oscillations.We refer to these events as perturbation events in the following sections.These perturbations intrinsically define a hidden state that parametrizes the ground truth distribution of peri-event data.We exploit the hidden state and demonstrate that the proposed alignment method in Section 2.5 is efficient for recovering the time-varying causal direction between the two variables.
We enforce non-stationarity of η 2 t , the innovations of the ground truth cause process {X 2 t }.Both innovations η 1 t and η 2 t are drawn from a Gaussian distribution with unit variance (with no correlation in between, i.e., t is non-zero and time-varying.We designed the timevarying profile of k 2 t as a Morlet-shaped waveform to mimic the oscillatory properties of neural event signals: , where α = 2/25 is a constant controlling the event duration, and H = 4 is the amplitude of the highest peak in the center of the event.The total duration of the Morlet-shaped waveform is 101 ms.The innovation's mean designed for X 2 is shown in Figure 5B (top left panel).
We generated this bi-variate VAR(4) process for 1300s consisting of 5000 trials of perturbation events by transiently varying η 2 t , detecting event occurrence based on the cause X 2 t , as illustrated in Figure 5A.The central peaks of these Morlet events are used as the ground-truth reference points for which peri-event time t = 0, and used to extract a dataset of multi-trial events ensemble with a 200-ms peri-event window such that t range from -99ms to +100ms (i.e.there is no alignment procedure that could lead to selection bias, see Section 2.5).The event waveforms of the cause variable X 2 t and the effect variable X 1 t are illustrated in Figure 5B (bottom left, middle left).The whole process is repeated 100 times to obtain variabilities plotted in the figure.t (left) and bi-variate ensembles of the other two selections aligned by X 1 t (middle, right).Thin blue line represents the threshold in X 1 t .(Bottom) Same settings as in (middle) but aligned by X 2 t .(C) Example elements of coupling strength in the ground truth directions X 2 t → X 1 t (red) and the opposite direction X 1 t → X 2 t (blue) for 3 types of event ensembles aligned by putative cause.(D) TE (left), DCS (middle) and rDCS (right) for all 3 types of event ensembles aligned by putative cause.

Effect of trial alignment on model estimation and causality measures
As the innovation is designed as deterministic (i.e., identical) across trials, with this alignment, the event ensembles obtained by the ground truth reference points define a dataset with ground-truth alignment.We can compare the VAR model estimation and causality measures resulting from this dataset to the outcomes obtained by aligning events based on the detection of either variable X 2 or X 1 , as discussed in Section 2.5.
To validate the theory related to the recoverability of ground truth conditional probabilities in the presence of selection bias due to the event detection procedure, we test the single-time selection case where biased selection is only performed for samples at peri-event time t = 0 (as Figure 3C(upper right)), thus not changing the hidden state alignment (Figure 5B(upper middle)).The reference points are decided as all ground truth reference points higher than a threshold d 0 , where d 0 is 3 times the standard deviation of the whole signal.The selected event ensembles of both variables are shown in Figure 5A(middle middle) for thresholding over X 1 t and Figure 5A(middle right) for thresholding over X 2 t .Notably, this kind of selection is only feasible when the hidden state in known, which is not realistic practically for real data.
Therefore, to demonstrate the appropriateness of the proposed alignment (i.e., aligning by all data points of one variable over the threshold, or the local peaks ), we assume the signal itself as the detection signal and d 0 as the threshold.We obtain an event ensemble by selecting local peaks for points over 3 times d 0 as new reference points, which is shown in Figure 5B(middle right and bottom right).This can be seen as a smoothed version of the real events, which is also confirmed by checking the aligned hidden states (Figure 5B(upper right)).
While inferring VAR model parameters of the event ensembles according to Shao et al. [2022], the true model order (4) can be recovered for all five ensembles.Figure 5C demonstrates the recoverability of conditional models for ensembles aligned by the putative cause.One of the 4 coupling strengths from the putative cause to the putative effect is plotted as red curves.As described in the simulation procedure in Section 3.2.1, the coupling strength is constant over time, which is reflected in Figure 5C(left).Consistent with the theory in Section 2.5, biased selection of event trials on the samples at t = 0 leads to unbiased estimation of the coupling strength X 2 t → X 1 t aligned by the cause X 2 (denoted also as "| X 2 " in Figure 5C(middle)).By comparison, the coupling strength in the other direction is slightly biased at negative peri-event times (t ) but still relatively close to its true value (0).This contrast holds for alignment with local peaks over threshold, as seen in Figure 5C(right).
Figure 5D(upper, middle, lower) shows the corresponding results of how causality measures perform in the three alignment scenarios.During the periods where no transient events occur, all three measures are able to infer a timeinvariant causal effect in the ground-truth direction (X 2 → X 1 ) compared to the opposite direction.Besides, in line with theoretical predictions, DCS is higher than TE and is equal to rDCS.During the perturbation events, in the ground truth direction TE and DCS remain constant and rDCS exhibit a rhythmic pattern.These results match the theoretical predictions: TE and DCS measures the connectivity strength, which does not change, while rDCS measures the combined causal effect related to the connectivity and the event-based changes at the cause while yielding larger variations transmitted to the effect node.
In the case where event ensembles are aligned by single-time selection of the cause X 2 t , TE and DCS of the ground truth direction is underestimated while rDCS is slightly overestimated around t = 0, which is consistent with the theories in Section 2.5.A bias appears in the opposite direction while aligned by the effect, but the direction is detected correctly.The case of local peak alignment shows similar results, except the peak amplitude of the smoothed rDCS is less amplified.Thus, this simulational experiment of perturbation events demonstrates the effectiveness of rDCS in reflecting the causal influence when the cause is perturbed by a deterministic exogenous input compared to TE and DCS, validating that rDCS is a better measure to address event-based causal interactions.We show that in practice, detecting via thresholding and aligning the event ensemble with the local peaks of the putative cause is a good way to recover the ground truth event-based causality given uni-directional connections: the trick is to address the coupling when the events are aligned by the putative cause, then a specific feature of invariance in the presence of selection bias can be applied.This property will be further tested on real data in the next section.

Validation on SWRs-based causality between CA3 and CA1 regions
Sharp Wave-Ripple (SWR) events, hypothesized as a key element in implementing memory consolidation in the brain, have been reported in the electrophysiological recordings within the hippocampus of both macaques and rodents.In this section we detect SWRs in an experimental dataset to investigate the behavior of TE, DCS and rDCS in a neuroscientific context where the event-hosting brain regions are uni-directionally coupled.
SWRs are primarily generated in the CA1 area of the hippocampus.The somas of CA1 pyramidal cells are located in the in the pyramidal layer ('pl') while their dendritic trees are rooted in the stratum radiatum ('sr').It is hypothesized that the dendritic trees receive strong excitatory inputs from the pyramidal cells in CA3 which generate post-synaptic activities in the dendritic trees.This results in LFP activities in low frequencies (0-30Hz, due to the sharp-wave) and in gamma band (30-80Hz, due to CA3 oscillations).Then the dendritic activities propagate to the soma, where recurrent interactions between inhibitory and excitatory cells generate a very fast oscillation, the ripples (80-250Hz).
We applied the event-based causality analysis to an open source dataset where electrophysiological recordings in the CA3 and CA1 regions of rodent hippocampus have been performed with 4 shanks of 8 channels simultaneously in each region [Mizuseki et al., 2014].In agreement with the SWR generation mechanism explained in the above paragraph, anatomical studies [Csicsvari et al., 2000] support uni-directional anatomical coupling between these two regions within the hippocampal formation, i.e., CA3→CA1.The analysis is based on two Local Field Potential (LFP) data sessions recorded from the rat named 'vvp01' with a sampling rate of 1252Hz.An example trace of a channel pair of both CA3 and CA1 regions is shown in Figure 6A.As SWRs are more challenging to observe during behavioral sessions, we perform our analysis only on a session of sleep which lasts 4943.588s.
Following Mizuseki et al. [2009], we detect SWRs by applying an 49-ordered FIR filter in the frequency band [140,230]Hz to each channel of signals in both regions.Similar to Section 3.2, we set a threshold over the mean of the filtered signals (5 SD) to locate the events and align them according to the local peak points over threshold in the filtered signals of either region in a channel pair.
Figure 6A(lower) shows the case aligned on the CA3 signals.The peri-event window for display has been chosen to be [-119.8, 119.8]ms, while VAR model estimation and the BIC-based model order selection are performed according to [Shao et al., 2022].For each channel pair, we obtain two bi-variate event ensembles, thus extracting 2*1024 ensembles for all channel pairs for each alignment condition.The event waveforms and statistics of an example channel pair for different alignments are illustrated in Figure 6B.SWR-based causality measures shown in Figure 6C compare the alignment by the putative cause and by the putative effect.The reference states used for estimating rDCS are the averaged states over the first 16ms time points in the window.The standard deviation plotted in the figure originates from 100 times bootstrapped ensembles and the variability is averaged over 1024 channel pairs.In line with the theoretical predictions, the ground truth direction (CA3→CA1) is well recovered when using an alignment by the putative cause, but not when aligning by the putative effect.TE, DCS and rDCS in the opposite of the truth direction are not significantly different from zero, which is consistent with the uni-directionality of anatomical connections posited by anatomical studies.Significantly stronger causal influences in the ground truth direction are shown by TE, DCS and rDCS before the alignment point (t = 0), matching the hypothesized SWR generating mechanism that the CA3 region drives the SWR interactions in CA1 region.The lack of difference between the two directions in more stationary states might be explained by the ineffectivity of causal measures based on linear VAR models to capture non-linearity [Shajarisales et al., 2015].The transient increase in the non-ground truth direction when using alignment on the putative cause might be explained by the selection bias elaborated on in Section 2.5.

Discussion
In summary, we have discussed the benefits and shortcomings of two time-varying causality measures (TE and DCS) in characterizing causal interactions based on peri-event data.To address their insensitivity to deterministic perturbations, we proposed a novel measure, the rDCS, that implements an intervention on both the cause and the mechanism in the SCM framework.We compared the performance of these causality measures on perturbation events with innovations with time-varying means and and electrophysiological recordings of hippocampal SWRs.The benefits of rDCS is supported by the perturbation events presented in Section 3.2.As causality analysis of transient events aims at uncovering the network mechanisms underlying these phenomena (e.g., addressing whether one event drives the other), we argue for the use of rDCS as it provably captures causal influences due to event-related changes in the cause that propagate to target regions through anatomical connections, even if these changes have little variability across trials.
Our results of the performance of rDCS are tested with SWRs events recorded within the hippocampus.However, potentially this measure can be applied to all multiple trial event pairs recorded in two brain regions, for example, ponto-geniculate-occipital waves simultaneously recorded in the thalamus and cortex.The effect of time-varying non-zero-meaned innovations might reflect the exogenous variables un-included in the model, and thus might be helpful in understanding the mechanism of underlying event generation.
However, we show that the data preparation procedure (i.e., the detection procedure and the alignment of the events) potentially affects the detection of causal effects and the quantification of their strength, consistent with predictions of [Bareinboim and Pearl, 2012] and [Bareinboim et al., 2014].In particular, while the directionality of peri-event causal interactions is expected to be ab intrinsic property of the underlying mechanisms, we showed that the inferred causal direction is dependent on the alignment methods (i.e., the event-triggering region).We hypothesize that this is due to selection bias, which is supported by our theoretical investigations.
Actually, aligning neural events based on the activity in a single region is a common practice in event-related brain research.Our results thus suggest that causality analysis of such peri-event data collection methods may be affected by a selection bias, which should be controlled for.We have demonstrated empirically that for uni-directional coupling, computing rDCS on peri-event datasets triggered by the putative cause is normally effective in revealing the true causal direction.Thus, as a future direction, it is imperative to devise a correction approach for the selection bias such that peri-event dynamics can be accurately recovered to achieve a better characterization of causal influences between transient neural events.Definition 2 (d-separation) A path p in graph G is said to be d-separated by a set of nodes Z if either: (1) p contains a chain i → m → j or a fork i ← m → j such that the middle node m is in Z, or (2) p contains a collider i → m ← j such that the middle node m is not in Z and such than no descendant of m is in Z.
Z is said to d-separate X from Y in G if and only if Z blocks every path from a node in X to a node in Y .This property is denoted X ⊥ ⊥ G Y |Z.
Proposition 2 (Global Markov property) For a given SCM (S, P N , G) and subsets of nodes X, Y , Z in G, then Note that d-separation only implies conditional independence.The reverse implication requires an additional assumption regarding the relation between the graph and the random variable called faithfulness (see for example [Peters et al., 2017, chapter 6]).As an example, under the faithfulness assumption, the graphical model of Figure AA entails that X and Z are dependent given Y .

A.1 Interventions
The SCM framework allows modeling interventions performed on a system.Classical interventions in an SCM amount to impose a fixed value to a random variable, but the framework allows a much broader class of modifications of the For positive t , S only depends on the indirect cause for all arrows such that P The reasoning is similar for Figure S3A, which represents filtering the original signal and detect events with template matching approaches.For Figure 3B and Figure S3B, S can never be understood as purely dependent on the cause for negative t .For example, for the causal arrow in P X Thus none of the conditionals can be recoverable for negative t , but the opposite for positive t .

D.1 Time-varying VAR(p) model
Following the introduction, the candidate measures include Granger causality (GC), Tranfer Entropy (TE) and Causal Strength (CS) [Janzing et al., 2013], where we extend them into a time-varying version to address the nonstationarity of transient events.In addition to their potential information theoretic formulations, the measures will also be formulated in a time-varying bivariate system consisting two variables X 1 and X 2 .The system is assumed as a inhomogeneous bivariate p-ordered Vector Autoregressive (VAR(p)) model.At each time t, the current state is a linear function of the past (lagged) p states gathered in the vector and the exogeneous inputs as the innovation term in the following form

D.2 Derivation of KL-divergence between two uni-variate Gaussians
Time-varying TE, DCS and rDCS are formulated as the KL divergence between the corresponding actual and counterfactual conditions.Thus we first present the KL divergence between two uni-variate Gaussian variables.
We denote the gaussian for the actual condition as p(x) = N (µ a , σ 2 a ), and the counterfactual gaussian as q(x) = N (µ c , σ 2 c ).Then the KL divergence between p(x) and q(x) is The actual condition defined for TE, DCS and rDCS is that the current state of X 1 t is dependent on the past of both X 1 and X 2 , denoted as N (µ a , σ 2 a ) = p(X 1 t |X 1 p,t , X 2 p,t ) .
The dynamics of X 1 t is described by the structural equation Therefore, the time-varying conditional mean and variance can be derived as

D.4 Transfer Entropy
For TE, as the conditional probability representing the "counterfactual" condition takes the form Resulting from the same model in Eq.S7, the mean and variance can be derived as Therefore, the expression of time-varying TE should be

D.5 Dynamic Causal Strength
For DCS, the counterfactual condition is with X 1 t = a t X 1 p,t + b t X 2 p,t + η 1 t , and X 2 p,t is a random sample drawn from the distribution p(X 2 p,t ).Then the conditional mean is: The conditional variance is:  The conditional mean and variance for the counterfactual condition should be revised as:

Figure 1 :
Figure 1: Analysis of event-based causality via interventions in SCMs.(A) (upper) A diagram representing two brain regions with uni-directional connectivity from Region 2 to Region 1. Region 2, as the "cause region", exhibits transient events (grey) that influence Region 1 repetitively.(lower) An SCM underlying the diagram, where X 1 t and X 2 t denotes states of Region 1 and Region 2. (B) (upper) An experimental manipulation of the two-region diagram in (A): cutting the anatomical connectivity.(lower) A corresponding intervention of the SCM in (A) represents cutting the causal arrow and feeding the effect node X 1 t with an independent copy of the cause node X 2 t .(C) (upper) Another experimental manipulation of the two-region diagram in (A): cutting the anatomical connectivity and removing the event-based signal changes at Region 2. (lower) The corresponding intervention of the SCM in (A) represents cutting the causal arrow and feeding the effect node X 1t with an independent copy of a reference state of the cause node X 2 t .(D) (upper) A time course of observed peri-event signals of Region 1 (X 1 t , red) and Region 2 (X 2 t , grey) reflecting the actual condition.The blue dashed time course represents the post-intervention scenario where X 1 t evolves without the influence from X 2 t .The interval marked by grey dashed lines refers to the reference state before the occurrence of events in X 2 t .(lower) A proper causality measure should quantify the difference between the original and post-intervention scenarios at each time point.

Figure 3 :
Figure 3: Illustration of selection bias due to thresholding and alignment.(A) SCM of a bi-variate VAR(2) model with uni-directional coupling from X 2 to X 1 and a selection node S depending on states of the cause variable before peri-event time (t <= 0).The selection node S represents partial selection of samples due to thresholding of the filtered cause signal (as the detection signal).Orange arrows makes the recoverable arrows with the current selection node, while purple arrows indicates the unrecoverable ones.(B) The same SCM as in (A) with the selection node depending in a similar way on the effect signal.(C) An example event ensemble for the cause variable X 2 t in (A-B) and the detection threshold.(D) Zoomed event ensenbles for (C) (left) and histograms for selected samples compared to the full sample (right).Upper panel illustrates selection bias at ground truth peri-event time t = 0. Lower panel shows selection bias at the peri-event time t = 0 for detected events aligned by the peak.The aligned state t = 0 is a smoothed state of neighboring states of the ground truth t = 0. (E) Illustration of recoverability when aligning by the cause.Subplots show joint distributions of the lagged variables and the putative effect variable of a VAR(1) model with uniformly distributed innovations, with left column for the ground-truth alignment, the middle column for aligning by the cause and right column for aligning by the effect.The conditional is only recoverable for the top middle panel.

Figure 4 :
Figure 4: TE fails when the signals are strongly synchronized.(A) Control experiments where synchrony is not changed.(top) Example trace of the bivariate signal in the control experiment.(middle) Time-varying design of innovation's variance for both variables in the control experiment.(bottom) Time-varying TE and DCS results in the control experiment.(B) TE underperforms during transient increased synchrony induced by a tiny change in noise variance.The transient change can be seen as an event.Subfigure designs are the same as (A).

Figure 5 :
Figure 5: Causal analysis for simulated perturbation events with non-zero innovations.(A) Example signal traces of the bi-variate VAR(4) system (black).Blue and red traces mark two example events detected by thresholding over the cause X 2 t .Blue and red dots show other reference points.(B) (Top) Hidden states for ground-truth alignment (left), single time selection of the ground truth event ensembles due to thresholding (middle) and events aligned by local peaks over threshold (right).(Middle) ground

Figure 6 :
Figure 6: Event-based causal analysis for SWRs in rodent hippocampal CA3 and CA1 regions.(A) Examples signal traces of the original signals and bandpass filtered signals of CA3 and CA1 regions (black).Blue and red traces mark two example events detected by thresholding over the cause CA3 and aligned by the local peak.Blue and red dots show other reference points.(B) Event waveforms of SWR event ensembles at CA3 (left) and CA1 (right) regions aligned by CA3 (upper) and CA1 (lower) signals.Shades repensent the ensemble standard error averaged over 1024 channel pairs.(C) Peri-event causality measured by TE (upper), DCS (middle) and rDCS (lower) for event ensembles aligned by the putative cause (left) and the putative effect (right).Shades reflect standard deviation of 100 repeated bootstrapped ensembles.
Figure S1: (A) Example of graphical representations of structural causal models.(B) Graphical representation including a sampling node, indicating that sample selection is a function of X only.

Figure
Figure S3: (A) SCM of a bi-variate VAR(2) model with uni-directional coupling from X 2 to X 1 and a selection node S depending on states of the cause variable before peri-event time (t <= 0).The selection node S represents partial selection of samples due to thresholding of the detection signal obtained with template matching with the original signal.Orange arrows makes the recoverable arrows with the current selection node, while purple arrows indicates the unrecoverable ones.(B) The same SCM as in (A) with the selection node depending in a similar way on the effect signal.

D
KL (p||q) = − p(x) log q(x)dx + p(x) log p(x)dx = [log(p(x)) − log(q(x))] p(x)dx µ c ) 2 = (X − µ a + µ a − µ c ) 2 = (X − µ a ) 2 + 2(X − µ a )(µ a − µ c ) + (µ a − µ c ) X − µ a ) 2 + 2(µ a − µ c )E p [X − µ a ] + (µ a − µ c) mean and variance for the actual condition are the model parameters.The column sign suggests that the VAR model can be interpreted as a data generation mechanism, which is critical to the SCM framework (Section A).Elements of the coefficient matrix a t ,b t ,c t and d t are all p-dimensional vectors.Expanding the vector version of Eq.S3 using Eq.S2, the equation can be rewritten as: a t X 1 p,t + b t X 2 p,t + η 1 t |X 1 p,t , X 2 p,t ] = a t X 1 p,t + b t X 2 p,t + k 1 b t X 2 p,t + η 1 t |X 1 p,t ] = a t X 1 p,t |X 1 p,t + b t E[X 2 p,t |X 1 p,t ] + k 1Plugging the expressions of µ a , µ c , σ 2 a and σ 2 c into Eq.S6, the KL divergence can be derived asTE(X 2 t → X 1 t ) = D KL (N (µ a , σ 2 a )||N (µ c , σ 2 Cov[X 2 p,t |X 1 p,t ]b t + σ 1,t σ 1,t t ,X 2 p,t ) [(b t (X 2 p,t − E[X 2 p,t ])) 2 ] = b t E[(X 2 p,t − E|X 2 p,t ])(X 2 p,t − E|X 2 p,t ]) ]b t = b t Cov[X 2 p,t ]b tTherefore the expression of DCS can be obtained by plugging-in the means and variances of the two Gaussian Cov[X 2 p,t ]b t + σ 1,t σ 1,tD.6 Relative Dynamic Causal StrengthThe relative Causal Strength is 'relative' in the sense that the intervention involves an independent copy of a baseline state X 2 p,tref , instead of the lagged state X 2 p,t , such that X 1 t = a t X 1 p,t + b t X 2 p,tref + η 1 t for the counterfactual condition.