Challenges and perspectives in recurrence analyses of event time series

The analysis of event time series is in general challenging. Most time series analysis tools are limited for the analysis of this kind of data. Recurrence analysis, a powerful concept from nonlinear time series analysis, provides several opportunities to work with event data and even for the most challenging task of comparing event time series with continuous time series. Here, the basic concept is introduced, the challenges are discussed, and the future perspectives are summarized.


INTRODUCTION
The study of event time series is of general interest in data analysis and modelling, because of their ubiquitous nature in almost all scientific fields, such as investigating financial transactions, customer interactions, life-threatening cardiac events, system failures, or natural phenomena.Event series can be single, discrete events, binary events or events with different amplitude, e.g., events extracted from data with heavy tail distributions, short-term extreme events, or anomalies in time series.In neuroscience, event series are also called "spike trains" (Harris et al., 2002).A time series is generally be denoted by a set of ordered pairs {(t i , x i )} of time t i with t i+1 > t i and corresponding data value x i ; and with sampling index i (usually constant sampling time t i+1 − t i = const., i.e., equidistant time axis).An event series, instead, is considered as a series of event times, defined by the associated specific time or timestamp of the single events, finally resulting in a set of event time points {t i }.As events could also have some amplitude, a definition as an event time series as a tuple of time and event strength {(t i , x i )} is also possible.Because the events usually do not occur at regular intervals, such event time series are usually irregularly sampled t i+1 − t i ̸ = const.The alternative is using a regularly sampled, discretised time axis with binary (or amplitude) values at those points of time where the event happens (this is similar to categorical data, another class of discrete data, but not necessarily representing separated single events).However, this approach is usually limited and not appropriate for many research questions, because the timing of events often does not fit the sampling points and, even more important, the time series can be filled with many zeros.Standard time series analysis tools have their limits when analysing such data.

RECURRENCE ANALYSIS OF EVENT TIME SERIES
A recurrence plot (RP) is the graphical representation of the recurrence matrix, which is simply representing all pair-wise time combinations (i, j) of a data sequence which have similar values or states ⃗ x i : with a similarity measure d(•, •) and the Heaviside function Θ(•) which sets R i,j = 1 if the similarity value d(•, •) falls below the threshold ε (Marwan et al., 2007).For dynamical systems with continuous change of the state variables, i.e., ⃗ x i ∈ R m (with m the dimension of the system), the similarity between states is often defined by the Euclidean norm d( ⃗ Marwan et al., 2007).For discrete data of regular sampling (equidistant time instances), e.g., categorical data, the recurrence matrix R can be simply defined by the Kronecker delta function R i,j = δ(x i , x j ), which is one if both arguments are identical (Groth, 2005;Bandt et al., 2008;Faure and Lesne, 2010;Leonardi, 2018).This approach works well for discrete data, such as categorical data or symbolic sequences, with applications, e.g., in life science to detect atrial fibrillation or congestive heart failure (Caballero-Pintado et al., 2018;Pérez-Valero et al., 2019), to measure synchronisation in an epileptic brain (Groth, 2005), or in engineering to optimise manufacturing networks (Donner et al., 2008).This concept is easily extendable for bivariate analysis.Cross-RPs, CR x,y i,j = Θ (ε − d( ⃗ x i , ⃗ y j )), and joint-RPs, JR x,y i,j = Θ (ε , are two basic concepts for measuring different aspects of synchronisation (Marwan et al., 2007).To modify, the cross-RP for discrete data, we can simply use the Kronecker delta CR i,j = δ(x i , y j ) (Lira-Palma et al., 2018).Joint-RP even allows us to measure the synchronisation between different types of data, such as discrete and continuous data (Kodama et al., 2021), where is the Hadamard product of the RP of the discrete system x i and the RP of the continuous system ⃗ y i .
This concept reaches its limits when considering event time series which consist of rare events and many zeros between them, or, even more limiting, consist only of the events {t i } or have strong non-equidistant time instances (t i+1 − t i ̸ = const.).For this kind of data, the similarity measure d(•, •) has to be replaced by a specific metric which measures the coincidence of event sequences.Several measures (event metrics) are available, mainly developed in neuroscience (Ciba et al., 2020).A widespread measure would be the event synchronisation, which allows varying delays between events to be considered as coinciding (Quian Quiroga et al., 2002).This measure is successfully applied for investigating, e.g., the spatio-temporal relationships between extreme rainfall events (Boers et al., 2016).Another candidate is the edit distance, an extension of the Levenshtein distance (Masek and Paterson, 1980;Victor and Purpura, 1997).The distance is calculated by the minimum cost needed to modify one event sequence into another with a limited set of operations (Fig. 1).Edit distance is a metric and has been successfully integrated with recurrence analysis (Suzuki et al., 2010;Banerjee et al., 2021).
The edit distance measure is the (minimum) sum of the costs of the transform operations addition, deletion, and shifting applied to modify a sequence S i = {t where a and b are indices of the events in segments S i and S j ; N i and N j the number of events in segments S i and S j , respectively; N (i,j) the number of events in S i and S j to be shifted, which form the set C; λ s is the cost of deletion/ insertion, and λ 0 the cost assigned for shifting events in time.Thus, the first summand corresponds to deletion and insertion operations and the second summand to the shifting of the events (Fig. 1).Extensions of this cost function include considering costs for amplitude changes or to modify the Edit distance as cost-based similarity between event sequences S i and S j from an event series (left).Events can be shifted, added or deleted, and their amplitude adjusted.All these operations have costs.
The minimum cost is used as the distance (right).
shifting term by a continuous cost function allowing a more intuitive interpretation in terms of a delay (Suzuki et al., 2010;Banerjee et al., 2021).To apply the edit distance for recurrence analysis, the event series has to be divided into sequences S i defined by a time window of length T w .The shifting of this interval can be with smaller steps s < T w resulting in overlapping time intervals.In order to get reliable costs d(S i , S j ), the resulting sequences S i and S j should have at least one event (i.e., should not be empty).
This edit distance measure has been used as a simple synchronisation measure between event series to study the stimulus responses in neuron spike trains (Victor and Purpura, 1997), as a similarity measure between extreme rainfall data to reconstruct climate networks (Agarwal et al., 2022), and to create regularly sampled time series from non-regularly sampled time series (TACTS approach) (Ozken et al., 2015;Eroglu et al., 2016).It was also used as a distance measure for computing RPs directly from event data (Fig. 2), e.g., to study stock exchange data (Suzuki et al., 2010), flood events (Banerjee et al., 2021), or to allow calculation of RPs directly from irregularly sampled palaeoclimate data (Ozken et al., 2018;Ozdes and Eroglu, in press).The integration of the edit distance metric into the RP definition, Eq. ( 1), provides all the applications of recurrence based time series analysis for the specific data of event time series.

CHALLENGES
Despite the recent advances in recurrence analysis of event time series, there are still several challenges.
Event time series can have missing data which are not easy to be detectable.For example, data on landslide events is mainly available at sites where they affect infrastructure (Steger et al., 2021), but their statistical analysis with respect to, e.g., climate change would require reliable event series (Alvioli et al.,  2018).Missing or sparse data can, therefore, bias the results of any analysis, and is subject of research in time series analysis in general, including interpolation, modelling, or advanced data reconstruction methods (Alavi et al., 2006;Facchini and Mocenni, 2011;Sarafanov et al., 2022), but mainly not applicable for event data.
The process behind the analysed study object could be non-stationary (e.g., life-threatening cardiac arrhythmias or seizures (Marwan et al., 2002;Steriade, 2000)), meaning that the statistical properties of the event series may change over time (such as the distribution of events could change over time -events may be sparse, meaning that there could be some periods of time without events), which can make it difficult to apply the event based recurrence analysis (e.g., using edit distance).In particular, if the time interval defined by length T w is too small, many sequences S i could be empty, resulting in non-defined costs d(S i , S j ).The selection of the time interval length T w is, thus, crucial.For simple periodically recurring events, the choice might be easy, but its selection if multiple time scales are present is not straightforward (Banerjee et al., 2021).
The number of events in an interval can also change due to sampling issues, as it is a common problem in palaeoclimate research, where the deposition rate in sediments is varying over time, thus, leading to palaeoclimate time series of non-equidistant sampling in general (Rehfeld et al., 2011;Breitenbach et al., 2012;Braun et al., 2022).Event based metrics, such as event synchronisation, event coincidence analysis, or edit distance cost depend on the number of events in the interval and produce different types of biases which impact the results of the quantitative analysis and call for correction schemes (Wolf et al., 2020;Braun et al., 2022).
In general, the comparison of event time series with continuous time series is very challenging.Such problems occur, e.g., in climate research when studying recurring pattern of special weather phenomena (e.g., atmospheric rivers) or extreme events (such as heavy precipitation or river floods) with respect to large scale climate phenomena, such as El Niño/ Southern Oscillation or North Atlantic Oscillation (Miller et al., 2019;Mundhenk et al., 2018).The RP approach offers a promising way by modifying Eq. ( 2) to (Kodama et al., 2021).However, event series often consist of much less events than the number of sampling points of the continuous time series, resulting in RPs of rather different length and making it impossible to directly apply Eq. ( 4).An approach to finally match the event based RP with those of the continuous data would be required.
Finally, the uncertainty of the timing of events (timing jitter) is strongly affecting any measure of coincidence.It is expected that timing jitter is a common problem in measuring real-world event series.This challenge might be addressed by evaluating the sensitivity of the results on the jitter using specific modells.
The extension of the edit distance can also take amplitude variations into account.However, this mixes two different aspects of the data: the temporal pattern of event sequences and amplitude differences.The optimal choice of the corresponding parameters might be less clear then, but have to be used to balance between these aspects.

DISCUSSION
The perspective future methodical developments will consider several important challenges to study interesting research questions related to (discrete) event data.
For recurrence analysis of event data, so far only the edit distance metric has been applied.It would be important to test and compare also other measures, such as Needleman-Wunsch distance, event synchronisation, event coincidence analysis, or ARI-SPIKE-distance (Needleman and Wunsch, 1970;Quian Quiroga et al., 2002;Wolf et al., 2020;Ciba et al., 2020).Specific discrete data might call for distance metrics considering amplitude differences, e.g., edit distance or longest common subsequence (Bergroth et al., 2000).
Data with missing events is a general problem.Different strategies might be considered to solve this challenge, including correction and gap filling schemes (Braun et al., 2022;Facchini and Mocenni, 2011).Correction schemes are also important for data with non-stationarities (varying sparsity of events).Such correction schemes needs further development to be applicable in a more general way (e.g., independent of the event distribution) and be more computationally efficient.
Events can exhibit some kind of temporal dependencies, meaning that the likelihood of an event may depend on the occurrence of previous events.The RQA measures could be used to study temporal dependencies from event series (Banerjee et al., 2021).In general, diagonal lines in a RP represent the tendency that current neighbours in phase space will remain to be neighbours in the near future, thus represent serial dependence.The RQA measure determinism is quantifying the fraction of recurrences forming such diagonal lines and can, thus, be used as an indicator of serial dependence.
The classification of dynamical processes by event series based on duration, frequency of events or their characteristics (e.g., shape), will be another interesting application which will also involve machine learning approaches.The combination of machine learning with recurrence analysis is currently a strongly developing field with applications mainly in classification and prediction, using RPs and RQA measures as inputs in machine learning workflows (Marwan and Kraemer, in press).A typical example is to convert time series into images by using the RP approach which are finally fed into the machine learning workflow for classification (Estebsari and Rajabi, 2020).RPs of event series can be used in a similar manner for such kind of classification tasks.Other characteristics of event series (like serial dependence) would be accessible to machine learning approaches by the RQA measures (Mohebbi and Ghassemian, 2011;Malekzadeh et al., 2021;Yang et al., 2018) .
The detection of interdependencies or synchronisation of (sub-)systems represented by different kinds of data (e.g., event data with continuous time series) is an important methodical challenge.New approaches based on RPs seem to be promising, including the concept of joint-RPs (Kodama et al., 2021) and the comparison of the probability of recurrences (Nkomidio et al., 2022).The advantage is the comparison by the recurrence structure, which would allow comparing time series of different kinds (e.g., event series vs. continuous data).It includes further developments to finally match the size of event based RPs with those of the continuous data, e.g., considering coarse-graining, interpolation, or specific window selections schemes (for event-sequence based metrics like edit distance) (Banerjee et al., 2021).
RP based analysis can be used to infer coupling directions or even causal links between different systems (Ramos et al., 2017;Peluso et al., 2020).Thus, the next step would be to test this approach for its potential on causality testing even for event data.
RPs also allow to identify patterns or regularities in challenging data, such as event series, including the estimation of the power spectral density of event series (Kraemer et al., 2022).The most obvious way to derive a spectrum from a RP is to use the probability of recurrence after lag τ , which is simply the density of recurrence points along the diagonals (with distance τ from the main diagonal).This probability of recurrence is related to the auto-correlation (Marwan and Kurths, 2002;Zbilut and Marwan, 2008).Using the edit distance measure, the temporal dependency structure within the event series can be visualised and quantified with this approach.Finally, the power spectrum can then be estimated from this probability of recurrence, either by applying the Fourier transform or any other advanced decomposition (Zbilut and Marwan, 2008;Kraemer et al., 2022).
The uncertainty of the timing of events (timing jitter) needs to be considered in the analysis, leading to new concepts such as Monte Carlo based ensemble approaches or Bayesian approaches.A recently proposed concept combines a Bayesian approach with RPs to derive a RP which explicitly represents the uncertainties of the timing of data points (Goswami et al., 2018).The resulting recurrence matrix contains the probabilities of recurrences instead of the binary information of recurrences.The recurrence quantification of such matrix is still subject of future research.
Although various distance measures for event based RP computation are available, the already applied one, edit distance, provides already a bunch of interesting directions for future research.For example, the choice of an optimal window length T w or the different cost parameters λ.Including the cost for amplitude differences require an optimal choice of the corresponding parameters, which would need some systematic studies to provide some guidance to balance between the differences in the temporal and spatial domain.
The recurrence analysis as a concept is rather novel approach, with a lot of interesting and powerful developments and extensions in the last two decades (Marwan and Kraemer, in press).It is also a promising concept for studying different aspects related to (discrete) event time series, where other methods have their limits.
Figure1.Edit distance as cost-based similarity between event sequences S i and S j from an event series (left).Events can be shifted, added or deleted, and their amplitude adjusted.All these operations have costs.The minimum cost is used as the distance (right).

Figure 2 .
Figure 2. Example of a recurrence plot using edit distance.(A) The maxima (red dots) of the x-variable of the Rössler system(Rössler, 1976) are used to mimic sparse (or extreme) events.(B) Recurrence plot calculated from the (x, y, z)-variables of the Rössler system (C) Recurrence plot derived from the events in (A) using the edit distance as defined by Eq. (3).Periodical occurrence of the events are clearly indicated by the period line structures in the edit distance recurrence plot.The empty bars around t =55 s and t =100 s indicate the parts in the dynamics with abrupt changes where no maximum values appear.Edit distance is calculated using overlapping windows length T w =15 s and moving step of s =1 s.The recurrence threshold ε is selected to ensure a recurrence rate (recurrence point density) of 15%.