Matlab Open Source Code: Noise-Assisted Multivariate Empirical Mode Decomposition Based Causal Decomposition for Causality Inference of Bivariate Time Series

Causality inference has arrested much attention in academic studies. Currently, multiple methods such as Granger causality, Convergent Cross Mapping (CCM), and Noise-assisted Multivariate Empirical Mode Decomposition (NA-MEMD) are introduced to solve the problem. Motivated by the researchers who uploaded the open-source code for causality inference, we hereby present the Matlab code of NA-MEMD Causal Decomposition to help users implement the algorithm in multiple scenarios. The code is developed on Matlab2020 and is mainly divided into three subfunctions: na_memd, Plseries, and cd_na_memd. na_memd is called in the main function to generate the matrix of Intrinsic Mode Functions (IMFs) and Plseries can display the average frequency and phase difference of IMFs of the same order in a matrix which can be used for the selection of the main Intrinsic Causal Component (ICC) and ICCs set. cd_na_memd is called to perform causal redecomposition after removing the main ICC from the original time series and output the result of NA-MEMD Causal Decomposition. The performance of the code is evaluated from the perspective of executing time, robustness, and validity. With the data amount enlarging, the executing time increases linearly with it and the value of causal strength oscillates in an ideally small interval which represents the relatively high robustness of the code. The validity is verified based on the open-access predator-prey data (wolf-moose bivariate time series from Isle Royale National Park in Michigan, USA) and our result is aligned with that of Causal Decomposition.


INTRODUCTION
In early studies, the definition of Noise-assisted Multivariate Empirical Mode Decomposition (NA-MEMD) based Causal Decomposition was given by Ur Rehman and Mandic (2011), She et al. (2017), and Zhang et al. (2017Zhang et al. ( , 2021, dealing with the cause-effect relationship observed across from two time series signals in the real-world situations (Adarsh and Janga Reddy, 2019). The study provides Matlab Open Source Code of NA-MEMD Causal Decomposition and helps scholars and researchers (particularly in function connectivity Mueller et al., 2013;Meshi et al., 2016, signal detection and processing Abbate et al., 1997;Song andQue, 2006, andStatistical Causality Cox, 1992;Cox and Wermuth, 2004) to determine the causality occurring at stochastic, deterministic and complex dynamical (nonlinear deterministic) processes on the basis of time series (Small, 2008). The development of such a theoretical framework has been arising since the publication of Granger causality (Granger, 1969;Kamiński et al., 2001;Seth, 2007). It defines that variable X is considered to be the Granger cause of variable Y if X helps to explain future changes in Y. The test of Granger causality introduces F-test to quantify the autoregressive property between X and Y by solving a best least-square problem. Granger causality was fundamentally based on the hypothesis of uncoupling cause-effect and, thus, would be applicable to stochastic processes. For multivariate time series, Looney et al. (2018) propose a novel multivariate sample entropy that can handle the analysis of within-and cross-channel dynamics. It is also the only method to identify synchronized regularity dynamics. In addition, inspired by Takens' Embedding Theorem (Noakes, 1991), Sugihara et al. proposed Convergent Cross Mapping (CCM) (Sugihara et al., 2012;Krakovská et al., 2015), which held that if the projections of X and Y in a certain dimension, i.e., X ′ and Y ′ respectively, existed the causal relationship, then X and Y would have CCM causality. It was assumed that a cause-effect relation was embedded in a complex dynamical process which was also likely to be a linear/nonlinear deterministic system. CCM accommodated the inseparability/coupling of causal effects. Not surprisingly, Yang et al. (2018) established Causal Decomposition and further confirmed the utility of Hilbert-Huang Transform (HHT) (Huang, 2014a,b) in causality inference. Cause-effect mutual information was assumed to be carried over the instantaneous phase of the observed time series. However, standard methods such as Standard Fourier, wavelet, and Hilbert may encounter problems in analyzing real-world data. First, the approaches depending on predefined function heavily depend on the data length and stationarity and the real-world data which are often short and intermittent may hinder the analysis process. Second, among the standard methods, integral transforms pursue more frequency concepts than the temporal concept which may increase the frequency resolution but lose information in the time domain. Third, for nonmonocomponent data, the direct implementation of analytic signal representation results in negative IFs which have no physical justification. Next, accurate observation and measurement should be conducted on the corresponding scales rather than globally. However, correlation, coherence, and synchrony measure cannot be performed on certain scales . Causal Decomposition was based on phase dependency, which was distinguished from the prediction paradigm such as Granger causality, CCM, Mutual Information from Mixed Embedding (Kugiumtzis, 2013;Jia et al., 2019), Dynamic Causal Modeling (Friston et al., 2003(Friston et al., , 2013, and Transfer Entropy (Staniek and Lehnertz, 2008;Bossomaier et al., 2016). It was suitable for complex dynamical processes, acquired in the manner of time series in certain similar time scales. The development of the MEMD theoretical framework has been started by Altaf et al. (2007) who put forward a method that develops a mathematical approach to adapt EMD to both real and complex domains. The so-generated Intrinsic Mode Functions (IMFs) can be used in processing both real and complex signals. Another method proposed by Tanaka and Mandic (2007) also provides an insight into extending standard EMD to the complex domain. It takes advantage of both positive and negative frequency components of signals to generate complex-valued IMFs. Since the EMD is initially limited to real-valued time series, an extension of the EMD framework to bivariate time series is designed by extracting zero-mean rotating components (Rilling et al., 2007). In order to make EMD compatible with trivariate signals, ur Rehman and Mandic (2009) comes out with a theoretical framework that projects local mean envelop to multiple directions in three-dimensional spaces adapting the rotation property of quaternions. In order to handle the causality analysis in multiscale time series, like multiple physiological time series (e.g., EEG, EMG, and ECG) composed network (Bashan et al., 2012;Faes et al., 2017), we recently presented NA-MEMD Causal Decomposition (Zhang et al., 2021) and pointed out its potential to the causality inference in a complex dynamical process. In Zhang et al. (2017), multichannel EMG signals are processed by EEMD, MEMD, and NA-MEMD, and their outcomes are quantitively assessed by comparing their number of IMFs, mode-alignment, and mode-mixing. It has been justified that NA-MEMD has a relatively outstanding performance in processing brain-muscle signals compared with EEMD and MEMD. Many studies have contributed to algorithm implementations related to causality analyzes in open-source scenarios. C and C++ code of EMD using Matlab Coder TM was introduced in R2018a in MathWorks (Huang, 2022). Furthermore, Rehman and Mandic generated the algorithm of Multivariate empirical mode decomposition (MEMD) (Rehman and Mandic, 2010). Then, Zhang et al. (2017) published a Matlab toolbox (Wen et al., 2022) of NA-MEMD in 2017. In the causality analysis, Mønster provided a Convergent Cross Mapping algorithm in MATLAB in 2018 (Jakubik, 2022). Yang published the proposed Causal Decomposition approach in GitHub (Yang, 2022), which then was exchanged to MathWorks in 2020. In the study, we hereby present the Matlab code package for NA-MEMD Causal Decomposition used in the preliminary article by Zhang et al. (2021), offering the configuration specification details required in the data analyzes and tests, and providing the functional specification of line-by-line codes in the workflow.

Background Theory
The overall procedure for NA-MEMD Causal Decomposition is illustrated in Figure 1 by 5 steps. It first adds multichannel auxiliary white noise (shown in step a-2) to original bivariate time series signals (shown in step a-1), followed by NA-MEMD to obtain two corresponding IMFs sets (shown in steps b-1 and b-2) as well as their phase coherence. Next, the main ICC (framed by the purple rectangles in steps c and d) and ICCs sets (framed by the yellow rectangles in steps c and d) are selected according to average frequency and phase difference. Only the IMFs with identical frequency scales and small phase differences can be regarded as ICCs. The main ICC is usually selected as the first ICC in the set. Removing the main ICC from IMFs sets and adding the remaining signals in each IMFs setup, two pairs of signals are formed by one modified signal without the main ICC and the other original one (shown in step d).
The phase coherence of the two pairs is calculated. Combined with the value of the phase coherences of two redecomposed signal pairs and the original signal pair, the value of Absolute Causal Strengths (ACSs) and Relative Causal Strengths (RCSs) can be obtained by NA-MEMD Causal Decomposition (shown in step e).

Function na_memd
The procedure for function na_memd is illustrated in Figure 2. The function enables to form a matrix, giving Intrinsic Mode Functions (IMFs) decomposed from NA-MEMD with causaleffect time series defined by input matrix input_data. Relevant variables about function na_memd are listed in Table 1. The function na_memd is invoked when variables input_data, ave_noise, level_noise, noise_channel_num, and en_num are set. Then, function size returns the number of row elements of matrix input_data to variable len_inp, and the number of column elements of matrix input_data to variable wid_inp. level_noise and noise_channel_num are used to generate the random Gaussian-White noise time series. After then, function memd is run if vector/matrix inp_noise is appended to matrix input_data, which is named matrix input_cha_noi. It repeats for constant en_num times (i is the loop variable). For each repetition, a three-dimension array imfs is returned which contains all IMFs decomposed from matrix input_cha_noi. In With it increasing, the running time will be prolonged dramatically. The first two columns refer to av_freq1 and av_freq2. The third column represents difference. With it increasing, the running time will be prolonged dramatically.
causal_matrix Data type: Matrix Used to output the relative causal strengths and the absolute strengths.
The first two columns represent the relative causal strengths. The latter two columns represent the absolute causal strengths.
order to facilitate the ensemble process of NA-MEMD, array imfs is reshaped into cell imf _result. Considering repetitions in-between, if the matrix sizes in cell imf _result perhaps are inconsistent, the residual is embedded with a pseudo-monotone variation. In order to guarantee the consistency of the size in cell imf _result, the residuals can be ignored together. Matrix elements of imf _result, sum_imf _result, and imf _result are first initialized to be zero, which stand for reshaped IMF data, the summation of the reshaped IMF, and the average value of the summation respectively. The noise data are generated by random alternating noise components determined by parameter level_noise. This process is repeated by times of en_num. For each loop, the noise is appended to matrix input_data which is matrix input_cha_noi. Then matrix imfs is calculated by performing MEMD on input_cha_noi with function memd. Next, imfs is reshaped into imf _result for further summation which data is stored in sum_imf _result. The average value is also calculated and assigned to ave_imf _result.

Function PLseries
The workflow for function PLseires is shown in Figure 3. The function outputs a matrix that shows the average frequency and their phase difference of IMFs decomposed from NA-MEMD by matrices imf 1 and imf 2 ( Table 2), respectively. The row of imf 1 and imf 2 stand for the index of IMFs and the column of imf 1 and imf 2 refers to the length of the time series. The output matrix is defined as matrix peakMatrix. The function PLseries is called when matrix imf 1 and imf 2 are input. The elements in vectors av_fre1, av_fre2, and difference are initialized to be zero and their length of the row is also set to be consistent with that of imf 1 and imf 2 (their rows are in the same length). After that, for each IMF, Hilbert Transform is used to calculate its instantaneous phase. To alleviate the effect of phase winding, the function unwarp is called to obtain the real phase which is defined as vectors un_ang1 and un_ang2. Then the average value of the unwrapped phase is calculated and stored in vectors av_ang1 and av_ang2, respectively. Their difference is stored in vector difference. According to vectors un_ang1 and un_ang2, the phase information is converted to frequency information which is stored in vectors fre1 and fre2, respectively. Their average values are calculated and are stored in vectors av_fre1 and av_fre2. The final output matrix peakMatrix is the combination of vectors av_fre1, av_fre2, and difference which are in the first, the second, and the third column, respectively.

Function causal_decomposition
The workflow for function causal_decomposition is demonstrated in Figure 4. The function generates a matrix causal_matrix to provide values of RCSs and ACSs according to the times series inputted by matrix input referred to in Table 3.
The raw data of bivariate time series are row-by-row loaded to input which is subsequently defined as vectors s1 and s2. Those two decomposed IMF sets are then assigned as matrices imfs1 and imfs2, respectively. Intrinsic Causal Components (ICCs) sets, vector ICC, are manually reviewed and selected by function plot_and_pick. Based on ICC, the number ICC_main is confirmed to indicate RCSs and ACSs of raw bivariate time series of input. Main ICC is selected manually by function plot_and_pick, and is defined as vector imfss1 and imfss2.
Their phase coherence and variance are calculated by function phasefcimf and function nvar, respectively. After obtaining the main ICC, it is removed from the corresponding inputs s1 and s2, and the rest are defined as vectors s1r and s2r. They are used to replace the corresponding signal in input to form two new input

Test for Time Series Data
To guarantee the experimental efficiency in the real application, variable noise_channel_num and en_num and the data length should be modified. The input data is the Gaussian-White noise with a mean value of 0 and variance of 1, and the experiment is performed on Matlab 2020b on the fully powered laptop.

Variable noise_channel_num
The relationship between noise_channel_num and executing time is demonstrated below in Table 4.
In this test, en_num and data length are determined to be 3 and 500, respectively. It is clear that executing time has a linearly increasing trend with noise_channel_num from the recorded data in Table 4.

Variable en_num
The relationship between en_num and executing time is demonstrated below in Table 5.
In this test, noise_channel_num and data length are fixed to be 3 and 500, respectively, and with en_num increasing, executing time increases linearly along with en_num according to Table 5.

Data Length
The relationship between data length and executing time is demonstrated below in Table 6.
In this test, noise_channel_num and en_num are both fixed to be 3, and with data length increasing, executing time increases linearly along with data length referred to in the data in Table 6.

Robustness and Validity Test
Since the random noise is involved in NA-MEMD, it is likely that the results of different execution will be slightly different from one another. Therefore, the consistency of the output should be tested. In this test, level_noise, noise_channel_num, en_num,

Test number
RCSs of X-to-Y (C x y) RCSs Y-to-X (C y x) ACSs of X-to-Y (C x y) ACSs Y-to-X (C y x) and data length are set to be 0.001, 3, 5, and 61, respectively, and the open-access predator-prey data is provided by Vucetich and Peterson (2012). The result is shown in Table 7. In Table 7, since the X-to-Y causality value and Y-to-X causality value sum to one, only the X-to-Y causal value is considered in the stability discussion.
From the statistical point of view, the mean value of C xy is 0.6384 and the variance is 0.004712 which is less volatile. Combined with the actual causal process demonstrated by the causality values by Yang et al. (2018), all of them indicate strong and valid causality since our results are close to that of Yang et al. (2018), which demonstrates the outcome of the procedure is consistent with the algorithm.

The Approach to Exert Noise in Original Signals in na_memd
Generally, input parameters level_noise, en_num, and noise_channel_num are used to determine the total amount of noise appended to the original signal. The description of the relevant variables can be checked in Table 3. Before setting the parameters, the noise level in the original input data should be evaluated. If the input has already been overwhelmed by noise, both the two parameters are supposed to be set higher to better remove the noise in the original data. However, with the Gaussian-White noise increasing, the effective data in the input is also likely to be attenuated by the appended noise. As a result, it is of significance to choose a proper value of level_noise and noise_channel_num to avoid excessive Gaussian-White noise. One method of removing the original noise without doing damage to the effective data is to reduce the product of level_noise and noise_channel_num and to increase en_num.

The Method of Selecting ICCs Set From IMFs
For ICCs, the ICCs set and the main ICC need to be selected for further operation. Among all the IMFs, ICCs are the ones whose average frequencies and phases are on the same scale. IMFs in different frequency scales or with large phase differences should be excluded from the ICCs set. In ICCs set, the main ICC is the one with the smallest average phase difference. Besides, the phase diagram of IMFs generated by Matlab can also be used to discriminate ICCs from all IMFs. Assist from both data in peakMatrix and visual aid of signal diagram is necessary for ICCs discrimination.

Calculation of RCSs and ACSs
In this code, it is recommended that only RCSs are used to detect the causal-effect relationship between bivariate time series signals.
Absolute Causal Strengths is defined as the variance weighted Euclidean distance (De Leeuw and Pruzansky, 1978) between the re-decomposed IMFs set and the original IMFs set. Its value can reflect the relative cause-effect relationship between multiple signals. The signals with higher ACS can be regarded as the cause and the ones with smaller ACS can be considered as the effect, respectively. Specifically, if only two signals are considered, RCS can also be calculated to illustrate their cause-effect relationship. It normalizes the ACS into an interval between zero and one, and we use the value of RCS to interfere with the causaleffect relationship. When the value of RCS is larger than 0.5, it represents the relationship of cause, and when the value of RCS is smaller than 0.5, it represents the relationship of effect. When the value is equal to 0.5, the relationship can be either reciprocal causation or an irrelevant relationship. When the ACS of two candidate signals is negligible, RCS will converge to 0.5. The threshold value is normally set as 0.05 (Yang et al., 2018).

The Reason for Linear Increase in Executing Time
In this code, the ensemble process is designed as simple loops for multiple noise appending rather than nested loops or complex function iteration. According to ur , compared with EEMD, NA-MEMD applies an ensemble algorithm that massively reduces the reconstruction error. As a result, the impact of the number of noise channels on creating error can be negligible. However, ur  recommend the number to be 2 which is suitable for the test situation and can be adjusted accordingly. Therefore, this experiment can change the noise_channel_num and observe differences. As a result, the executing time is in direct proportion to the total data size. When noise_channel_num, en_num, and data length increase, respectively, the total data size inflates linearly. Therefore, the executing time would have a tendency of linear increase with those variables.

The Determination for System Robustness
For any existing causality pairs, their cause-effect relationship is stable. RCS between them is considered a constant. As a result, it is appropriate to use the variance of relative causality data to represent the robustness of the system. If the variance has a small value which means RSC is less likely to fluctuate between the ideal causality strength, the system has relatively high robustness. If the variance is measured high, then it proves that the system is more susceptible to the external or internal environment.

CONCLUSION
A code package of NA-MEMD has been proposed to facilitate the potential users to apply the algorithm in an effective way. The users need to select the ICCs set and main ICC to check the causal-effect relationship according to the printed frequencyphase matrix and the signal figures. Based on MEMD, the crucial step is to append multi-channel random Gaussian-White noise to the original signal repeatedly to attenuate self-carrying noise. By adjusting the number of noise channels, the intensity of appended noise per channel, and the cycle index, it has been shown that the users can alleviate the influence of original noise without damaging the original data. As a result, NA-MEMD Causal Decomposition has potential for applications in analyzing bivariate and multiscale time series signals.

DATA AVAILABILITY STATEMENT
The code of NA-MEMD Causal Decomposition for causality inference of bivariate time series will be available through open-source platform github, https://github.com/AaronLi43/ ginkgo_glasgow. The demo data used in section RESULT is Wolf and moose field data which are available online at the United States Isle Royale National, https://isleroyalewolf.org/ data/data/home.html. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
YZ, GW, and ZL found the potential utility of the theoretical framework of NA-MEMD Causal Decomposition in function connectivity, signal detection and processing, Statistical Causality, implemented Matlab open-source code for NA-MEMD Causal Decomposition, tested the efficiency, robustness, validity of the code, carried out the results and discussions, and drafted and revised the manuscript. MX participated in Section Result in part. SS verified and proved the theoretical part of NA-MEMD Causal Decomposition. DY and PX supervised the project and checked the paper quality. PX advised the submission procedure and the manuscript quality linked toward its extensions to fMRI studies. All authors contributed to the article and approved the submitted version.