Quantification of Phase-Amplitude Coupling in Neuronal Oscillations: Comparison of Phase-Locking Value, Mean Vector Length, and Modulation Index

Phase-amplitude coupling is a promising construct to study cognitive processes in electroencephalography (EEG) and magnetencephalography (MEG). Due to the novelty of the concept, various measures are used in the literature to calculate phase-amplitude coupling. Here, performance of the three most widely used phase-amplitude coupling measures – phase-locking value (PLV), mean vector length (MVL), and modulation index (MI) – is thoroughly compared with the help of simulated data. We combine advantages of previous reviews and use a realistic data simulation, examine moderators and provide inferential statistics for the comparison of all three indices of phase-amplitude coupling. Our analyses show that all three indices successfully differentiate coupling strength and coupling width when monophasic coupling is present. While the mean vector length was most sensitive to modulations in coupling strengths and width, biphasic coupling can solely be detected by the modulation index. Coupling values of all three indices were influenced by moderators including data length, signal-to-noise-ratio, and sampling rate when approaching Nyquist frequencies. The modulation index was most robust against confounding influences of these moderators. Based on our analyses, we recommend the modulation index for noisy and short data epochs with unknown forms of coupling. For high quality and long data epochs with monophasic coupling and a high signal-to-noise ratio, the use of the mean vector length is recommended. Ideally, both indices are reported simultaneously for one data set. Highlights mean vector length is most sensitive for differentiating coupling strength modulation index is most robust to differences in data length, sampling rate and SNR phase-locking value and mean vector length cannot detect biphasic phase-amplitude coupling


Introduction 54
Phase-amplitude coupling is a promising method to study cognitive processes (Jensen, 2006;Jensen and 55 Lisman, 1998;Lisman and Jensen, 2013;Vosskuhl et al., 2015). There is no convention yet of how to 56 calculate phase-amplitude coupling, but instead much heterogeneity of phase-amplitude calculation 57 methods used in the literature. Most of these are reasonable measures from a theoretical point of view. 58 To provide empirical evidence for choosing one of these measures over another, this work thoroughly 59 compares the performance of the three most widely used phase-amplitude coupling measures with the 60 help of simulated EEG data. The measures are the phase-locking value (PLV) by Mormann et al. (2005), 61 mean vector length (MVL) by Canolty et al. (2006), and modulation index (MI) by Tort et al. (2008). 62 From a historical viewpoint, the first amplitude modulations that have been detected are amplitude 63 fluctuations of specific frequency bands, becoming apparent in the fast Fourier transform (FFT) of 64 constituents of these signals (Burgess and Ali, 2002;Novak et al., 1992;Pfurtscheller, 1976). Because 65 the FFT approach can solely reveal that the amplitude of a higher frequency oscillates at a lower 66 frequency (characteristic of one signal), these amplitude modulations should not be misinterpreted to 67 account for true temporal coupling between the instantaneous phase of the lower frequency and the 68 amplitude envelope of the higher frequency (association between two signals and definition of phase-69 amplitude coupling). Neither the lower frequency itself nor its instantaneous phase are extracted in this 70
Several of these phase-amplitude coupling measures were compared with the help of simulated and real 81 data in four reviews. Tort et al. (2010) executed the most extensive comparison so far, including most 82 of the above listed measures and evaluating their performance pertaining to tolerance to noise, amplitude 83 independence (independence from the amplitude of the amplitude-providing frequency band), 84 sensitivity to multimodality, and sensitivity to modulation width. The modulation index, introduced by 85 the same group (Tort et al., 2008), is well-rated in all aspects while, amongst others, the phase-locking 86 value has poor ratings in all aspects. The mean vector length has good ratings in some aspects (e. g. 87 tolerance to noise), but weaknesses in others (e. g. amplitude dependence). 88 Penny et al. (2008) introduced the GLM approach and compared it to the phase-locking value, mean 89 vector length, and envelope-to-signal correlation in respect to noise level, coupling phase, data length, 90 sample rate, signal non-stationarity, and multimodality. They found that the methods discriminated 91 between data simulated with and without coupling to different extents, ranging from below chance level modulation index, cross-frequency coherence), they found that "no one measure unfailingly out-101 performed the others" (Onslow et al., 2011, p. 56). They concluded that each measure seems to be 102 particularly suited for specific data conditions. Mean vector length for example is suitable for noisy data, 103 exploratory analyses (analysing a broad frequency spectrum) and when the power of the amplitude 104 providing frequency band is low. 105 The above cited reviews do not point to a single optimal measure for calculating phase-amplitude 106 coupling. They rather show that mostbut not allof the used measures perform well and are equally 107 affected by various confounders. Despite the availability of manifold measures, 79 % of studies use the phase-locking value adapted for phase-amplitude coupling, mean vector length, or modulation index 109 (Hülsemann, 2016). Why is this the case? The phase-locking value is derived from a long-used, phase-110 phase coupling measure that is easily adapted for the purpose of phase-amplitude measurement. Its 111 familiarity in the scientific community might have promoted its application. Possibly the predominant 112 application of mean vector length is due to its mathematical directness. The modulation index is 113 The majority of reviews used very straightforward data simulation methods. Oftentimes, a sinusoidal 115 oscillation is constructed at a lower phase-providing frequency and at a higher amplitude-providing 116 frequency. Phase-amplitude coupling is introduced by multiplying both signals (cf. Onslow et al., 2011, 117 p. 52). Amplitude is then extracted from the so constructed signal and phase is extracted from the pure 118 sinusoidal oscillation of the lower frequency. White noise is added to both signals. There are two pitfalls 119 in this approach. Both sinusoidal signals reflect a plain prototype of phase-amplitude coupling, but in 120 real neuronal data, pure sinusoidal oscillation cannot be filtered; rather, frequency bands containing 121 different amounts of various frequencies are extracted. Second, white noise is added to the simulated 122 data, even though it is known that not white noise but Brownian noise is inherent to brain dynamics (He 123 et al., 2010;Miller et al., 2009). 124 Because none of the hitherto existing reviews simultaneously meet the requirements of realistic 125 simulation of EEG data, providing inferential statistics for comparison of the measures, investigating 126 moderators of phase-amplitude coupling, and including the three most widely used measures (phase-127 locking value, mean vector length, and modulation index), a new comparison of these methods is 128 presented here. We aim to combine the best aspects of all previous reviews. EEG data is simulated rather 129 realistically according to the procedure described by Kramer and Eden (2013). The influence of several 130 moderators (multimodality, data length, sampling rate, noise level, modulation strength, and modulation

Simulation of EEG Data and Implementation of Phase-Amplitude Coupling 138
A characteristic of natural EEG data is the proportionality of its frequency spectrum to a power law P(f) 139 ~ (1/f β ). Namely, the higher the frequency f, the weaker the amplitude P(f). The exponent β defines the 140 strength of the amplitude decrease. White noise is defined by β = 0, pink noise by β = 1 and Brownian 141 (red) noise by β = 2. Different investigations have shown that the frequency spectrum of human brain 142 activity relates to Brownian (red) noise, with 2 < β < 3 (He et al., 2010;Miller et al., 2009). Because of 143 this, Brownian noise was generated using MATLAB code provided by Zhivomirov (2013), in order to 144 simulate EEG data ( Figure 1A). 145 Simulated data was then filtered at a low phase-providing frequency, from here on referred to as phase 146 time series, with a narrow bandwidth of 2 Hz. The same data was filtered at a high amplitude-providing In a final step, additional noise was added to the phase and amplitude time series. Therefore, Brownian 171 noise of the same length was simulated, band-pass filtered at the same frequencies as the phase and 172 amplitude time series, and added to the original phase and modulated amplitude time series, respectively. 173 Frequency matched noise is disruptive to the modulated phase-amplitude coupling and therefore allows 174 to check for the robustness of the phase-amplitude coupling measures. 175 Subsequently, phase and amplitude were extracted from the correspondent time series via Hilbert 176 transform, using the Signal Processing Toolbox of MATLAB (The MathWorks, Inc). Then continuous 177 phase and amplitude time series were segmented. This was done to introduce data discontinuities, which 178 are present in real data as well. Filtering, Hilbert transform, and phase or amplitude extraction were 179 always conducted on continuous data, to prevent filtering or other artefacts in the later analysed data 180

epochs. 181
Data sets with a length of 42, 105, and 180 seconds were simulated. This amount of data is sufficient to 182 simulate 30 trials with a length of 400, 2500 and 5000 milliseconds plus additional 30 seconds to 183 introduce data discontinuities when segmenting the data. These parameters were chosen to mirror typical 184 properties of event-related EEG data: (1) at least 30 trials per unique condition for which phase-185 amplitude coupling will be calculated (Luck, 2014), (2) trial length between 400 and 5000 milliseconds, 186 and (3) data discontinuities between trials. Sampling rate was set to 1000 Hz (Cohen, 2014). In addition, 187 simulated data was resampled to 500 Hz in order to investigate the influence of sampling rate. Noise 188 was scaled by the factor 0.9, 1.0, and 1.1 in order to simulate different signal-to-noise ratios. Scaling Hanning Window ranged between 22.5 % and 27.5 % of one low frequency cycle to modulate different 194 "widths" of phase-amplitude modulation. This width is equivalent to about a quarter of one cycle and 195 therefore covers the peak (or trough) phases of that low frequency cycle. At these phases, amplitude of 196 the higher frequency was increased. All parameters were realised for mono-and biphasic coupling 197 (factor multimodality). 198 199

Measuring Phase-Amplitude Coupling 200
To calculate phase-amplitude coupling, first, raw data is band-pass filtered in the frequency bands of 201 interest. Second, the real-valued band-pass filtered signal is transformed into a complex-valued analytic 202 signal. Finally, phase or amplitude is extracted from the complex-valued analytic signal. All these steps 203 can essentially be implemented in MATLAB with four lines of code: phase can be inferred from the phase lag. The phase-locking value is calculated by the following 225 where n is the total number of data points, t is a data point, θlt is the phase angle of the lower frequency 20° each, which many authors follow. The amount of bins can influence the results, as will be explained below. The average amplitude of the amplitude-providing frequency in each phase bin of the phase-277 providing frequency is computed and normalized by the following formula: 278 where ā is the average amplitude of one bin, k is the running index for the bins, and N is the total amount 280 of bins; p is a vector of N values. With the help of these calculations, one obtains the data for the phase-281 amplitude plot, which depicts the actual phase-amplitude coupling graphically ( Figure 1B, right panels). 282 Subsequently Shannon entropy is computed; a measure that represents the inherent amount of 283 information of a variable. If Shannon entropy is not maximal, there is redundancy and predictability in 284 the variable. Shannon entropy is maximal, if the amplitude in each phase bin is equal (uniform 285 distribution, Figure 1B, right upper panel). Shannon entropy is computed by the following formula: 286 where p is the vector of normalized averaged amplitudes per phase bin and N is the total amount of bins. 288 It does not matter which logarithm base is used if permutation testing is applied later on (Cohen, 2014). Phase-amplitude coupling is defined by a distribution that significantly deviates from the uniform 294 distribution. Kullback-Leibler distance, a measure for the disparity of two distributions is calculated by 295 the following formula: 296 where U is the uniform distribution, X is the distribution of the data, N is the total amount of bins, and 298 H(p) is the Shannon entropy according to equation 4. The uniform distribution is represented by log(N). 299 The final raw modulation index is calculated by the following formula: The more phase-amplitude coupling there is in the data, the more the given phase-amplitude plot 332 deviates from the uniform distribution and the higher the modulation index becomes. where CV denotes coupling value, μ denotes the mean and σ denotes the standard deviation (S. D.). 347 Only when the observed phase-locking value is larger than 95 % of shuffled values (which are expected 348 to be uncorrelated), it is defined as significant.

Specificity of Phase-Amplitude Coupling Measures 396
Phase-amplitude coupling values did not differ depending on method, data length, sampling rate, or 397 noise level. Because of the high number of simulations (n = 10 000), some main effects and interactions became significant. However, all effect sizes were below ω² < .01, therefore these differences are 399 negligible. 400 Figure 2 shows the phase-amplitude coupling value distribution for the phase-locking value, the mean 401 vector length, and the modulation index. When setting the critical z-value for the phase-locking value 402 at 1.86, for the mean vector length at 1.84, and for the modulation index at 1.92 five percent of the 403 simulated data were classified as containing coupling (false positive). Thus, these values were defined 404 as critical z-values. This implies that the mean vector length is most specific, directly followed by the 405 phase-locking value. The modulation index is least specific compared to the two other methods. 5.43 ± .07) compared to medium (2500 ms; mean ± S.E: 4.70 ± .06) and long epochs (5000 ms; mean ± 420 S.E: 4.78 ± .09). Medium and long epochs did not differ in their false positive rates.
The main effect was qualified by a method by data length interaction (F(4,396) = 36.34, p < .01, ω² = 422 .14, Dunncrit = .23). This revealed that the above-described pattern was driven by the phase-locking value 423 and mean vector length. There were no differences in false positive rate within the modulation index.
The stronger the coupling, the larger phase-locking value, mean vector length, and modulation index 450 are. As Tort et al. (2010) has shown, this behaviour is not inherent to all phase-amplitude coupling 451 measures. Since researchers do not only want to prove the existence of phase-amplitude coupling, but 452 also differentiate its strength, a measure that can do this is indispensable. Of all three methods, mean 453 vector length differentiates best between the different factor levels of modulation strength. 454 455

Effect of modulation width on phase-amplitude coupling measures 456
Coupling values of all methods increased with increasing modulation width (F(2,198) = 110.11, p < .01, 457 ω² = .42). The interaction method by modulation width became significant (F(4,396) = 70.18, p < .01, 458 ω² = .24; Figure 3B). Post hoc t-tests showed that all factor levels within a method differed significantly 459 from each other (all p's < .01). The effect of modulation width was most pronounced for the phase-460 locking value (.14 < ω² < .72) and the mean vector length (.14 < ω² < .71). The modulation index was 461 least sensitive to modulation width (.15 < ω² < .52). 462 The broader the coupling width, the larger phase-locking value, mean vector length, and modulation 463 index are. Of all three methods, phase-locking value and mean vector length differentiate best between 464 the different factor levels of modulation width.  Figure 4A). 485 That is, multimodality influences the three methods very differently. Phase-locking value and mean 486 vector length cannot find biphasic coupling as it was implemented here (amplitude of the higher 487 frequency was increased at peak and trough of the lower frequency). Because of the mathematic 488 construct of the mean vector length (equation 2, Figure 1B) Figure 4B) became significant. Post hoc t-tests showed that all 506 factor levels within a method differed significantly from each other (all p's < .01). The data length effect 507 was most pronounced for mean vector length (.82 < ω² < .94), and phase-locking value (.80 < ω² < .94). 508 The modulation index was least affected by data length (.65 < ω² < .75). 509 Overall, the longer the data, the larger phase-locking value, mean vector length, and modulation index 510 are. This association was found in the data presented here, but must not generally apply. Here coupling was simulated continuously into the data. If coupling is transient and does not proportionally vary with 512 data length, this relationship does not need to apply. Penny et al. (2008) showed, that coupling strength 513 decreases for phase-amplitude coupling, which was simulated transiently. Potentially, the general rule 514 is that the longer the data epochs where coupling occurs, the stronger the phase-amplitude coupling 515 values. This should be tested in a follow-up analysis. This analysis further showed that a minimal data 516 length is required for finding coupling, which should exceed at least 400 milliseconds per trial when 517 including 30 trials (also see Cheng et al., 2018). None of the methods were able to detect coupling in The factor sampling rate stands out because of its comparatively small effect size. A second set of data 530 was simulated testing phase-locking value, mean vector length, and modulation index at 16 -18 Hz for 531 the modulating frequency and 202 -238 Hz for the modulated frequency (for detailed results see 532 Hülsemann, 2016). This analysis showed that sampling rate is indeed important, but only if the 533 investigated upper frequency band approaches the Nyquist frequency. Of all three methods, modulation 534 index is least affected from the confounding factor sampling rate.  The interaction method by noise became significant (F(4,396) = 251.00, p < .01, ω² = .53; Figure 4D).
Post hoc t-tests showed that all factor levels within a method differed significantly from each other (all 540 p's < .01). The effect of noise was most pronounced for the modulation index (.65 < ω² < .76) and the 541 mean vector length (.55 < ω² < .84). The phase-locking value was least affected by noise (.51 < ω² < 542 .81). 543 Overall, the noisier the data, the lower phase-locking value, mean vector length, and modulation index 544 are. This aspect is not desired but plausible. Noise obscures the relation between the phase of the lower 545 frequency and amplitude of the higher frequency. The data as a whole contains phase-amplitude 546 coupling to a lesser extent, as the relative amount of noise compared to the relative amount of signal 547 increases. Of all three methods, phase-locking value is least affected from the confounding factor noise.

Interaction Effects 566
Conducting a 6-way ANOVAs for each method separately (see Hülsemann, 2016 for detailed results), 567 revealed ordinal interaction for all factors (multimodality, data length, sampling rate, noise, modulation 568 strength, and modulation width). Especially multimodality and data length interacted with the remaining 569 factors, as well as interacted with each other and the remaining factors. Sampling rate only showed 570 significant interactions, when analysing frequencies close to the Nyquist frequency. All interactions had 571 a monotone pattern, following the pattern of each main effect. For example, mean vector length 572 increased the longer the data, but it increased less when also noise increases ( Figure 5). This pattern was 573 true for each added factor. Phase-locking value and mean vector length did not find biphasic coupling 574 at all. Because of this, for these two methods, the described main effect and interaction patterns are only 575 valid for monophasic, but not for biphasic coupling. For the modulation index the pattern was true for 576 mono-and for biphasic coupling. Comparing all three methods it becomes evident that the modulation index is least affected by the 590 confounding factors multimodality, data length and sampling rate. However, it is alsolike the phase-591 locking valueless sensitive to variation in modulations strength compared with the mean vector length. 592 The modulation index is especially less sensitive to modulation width compared to the mean vector 593 length and phase-locking value. Mean vector length and modulation index are similarlyand stronger 594 than the phase-locking valueaffected by the confounding factor noise. 595 596 597

Conclusion 598
In conclusion, for long data epochs, recorded at high sampling rates, with a high signal-to-noise ratio, 599 the use of the mean vector length is recommended, because it is more sensitive to modulation strength 600 and width than both other methods. For noisier data, shorter data epochs, recorded at a lower sampling 601 rate, the use of the modulation index is recommended, as it is least influenced by the confounding factors compared with both other methods. If it is not clear whether cross-frequency coupling will be mono-or 603 biphasic, the modulation index should be used, even though literature suggests that biphasic coupling 604 can be neglected. 605 The phase-locking value does not stand out in comparison to the two other measures. Its usage is 606 potentially problematic because phase information is extracted from the amplitude envelope of a signal. 607 Phase information can only be correctly extracted from truly oscillating signals; this must not be 608 necessarily the case for an amplitude envelope. So far, no review evaluated this measure explicitly as The modulation index is quantitatively larger than the phase-locking value and mean vector length. 615 However, even despite substantial quantitative differences in values, the qualitative decision for 616 significance of phase-amplitude coupling is the same for all three methods in our simulation. 617 Nevertheless, comparison of coupling strengths between the methods is problematic and this lack of 618 comparability provides another reason for reporting both, mean vector length and modulation index. 619 In contrast to mean vector length, the false positive rate of the modulation index is not affected by any 620 confounding factor. However, this advantage against mean vector length is counteracted by one 621 disadvantage against the mean vector length: calculation of the modulation index includes Shannon's 622 Entropy. The entropy value depends on the amount of bins as well as amount of data squeezed into the 623 same amount of bins. This is an undesirable degree of freedom, which is not present when calculating 624 the mean vector length. 625 Due to the dependency on confounding variables (e. g. data length), comparing absolute coupling 626 strengths across studies might be difficult even if using the same method. Comparisons within one study, 627 on the other hand, can be done with confidence. Nevertheless, one should make sure that signal-to-noise 628 ratio is comparable within all experimental conditions and over the course of the experiment.