Real-time changepoint detection for noisy biological signals

Background: Experimentally and clinically collected time series data are often contaminated with signiﬁcant confounding noise, creating short, non-stationary time series. This noise, due to natural variability and measurement error, poses a challenge to conventional changepoint detection methods. Results: We proposed a novel, real-time changepoint detection method for eﬀectively extracting important time points in non-stationary, noisy time series. We validated our method with three simulated time series, a widely benchmark data set, two geological time series, and two physiological data sets, and compared it to three existing methods. Our method demonstrated signiﬁcantly improved performance over existing methods in several cases. Conclusion: Our method is able to eﬀectively discern meaningful changes from system noise and natural ﬂuctuations, accurately detecting change points. The ability of the method to extract meaningful change points with minimal user interaction opens new possibilities in clinical monitoring dealing with Big Data.


Bayesian online change point detection
We begin with a review of the Bayesian online changepoint detection (BOCPD) algorithm [17]. BOCPD uses a combination of a predictive model of future observations of the time series, an integer quantity, r t , known as the run length, or the time since the last changepoint, and a hazard function p(r t |r t−1 ), which calculates the probability of a changepoint occurring with respect to the last changepoint, to calculate the probability of a changepoint occurring. Bayes' rule is used to compute the posterior or past distribution of the run length as p(r t |y 1:t ) = p(r t , y 1:t ) p(y 1:t ) where y 1:t is a vector of past observations of the time series, p(r t , y 1:t ) is the joint likelihood of the cumulative run length and observations, calculated at each step, and p(y 1:t ) is the marginal likelihood of the observations. The joint distribution p(r t , y 1:t ) is computed with each new observation using a message passing algorithm, p(r t , y 1:t ) = rt−1 p(r t |r t−1 )p(y t |r t−1 , y (r) )p(r t−1 , y 1:t−1 ) where p(r t |r t−1 ) is the hazard function, p(y t |r t−1 , y (r) ) is the prediction model with observations y (r) since the last changepoint, and p(r t−1 , y 1:t−1 ) is the previous iteration of the algorithm.

Gaussian processes
We implemented a recently developed predictive model using Gaussian processes (GP) [18]. A GP is a Gaussian distribution over functions -that is, the distribution of the possible values of the function follows a multivariate Gaussian distribution [26].
GPs are flexible and expressive priors over functions, allowing patterns and features to be learned from observed data. A GP is completely specified by a mean function, µ(x), and a positive definite covariance function, or kernel, k(x, x ), where x and x are different observations. The covariance function generates properties of the function drawn from the GP, such as smoothness and shape. In our work, we make use of the rational quadratic covariance function, . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint where α is a computing parameter, and is the input scale, or distance between inputs x and x , which determines the smoothness of the functions. The observed inputs are collected into a matrix X, and the covariance function is used to generate a covariance matrix K. Using the regression model, y = f (x) + t , where t ∼ N (0, σ 2 t ), and f is generated by a GP with mean function µ and covariance function k, f ∼ GP (µ, k), the distribution specified over functions is the multivariate Gaussian, To predict future observations of the time series, we appeal to Bayes' rule, using the GP as a prior over functions. By the rules of conditioning Gaussian distributions [26], the predictive distribution for our next function value f * given input x * is p(f * |x * , X, y) = N (f * |µ * , σ 2 * ) (5) µ * = K * K −1 y (6) where K * is the covariance matrix between the new input x * and past inputs X, k(x * , X), and K * * is the covariance between the new inputs, k(x * , x * ). . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint

Simulation
To test the efficacy of the Delta point method, we produced 1000 simulations of three different time series, each 500 data points in length. Each time series was designed to simulate changepoints that may be seen in real world settings, and to have a specific changepoint that is of more interest than others in the time series. Series 1 has two changepoints, with the changepoint of interest occurring at t = 150. This time series simulated the change from a scaled random walk to an autoregressive model, and then back to a scaled random walk. This is a relatively subtle changepoint to detect, and was inspired from [16]. Series 1 is given as, where t ∼ N (0, 1), is Gaussian white noise. given as, where, t ∼ N (0, 1), is Gaussian white noise. This is a difficult changepoint to detect, as the noise added to the data obscured the introduction of the trend. This time series simulated the accumulation of some product in a system. Series 3 is given as, . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint where, t ∼ N (0, 1), is Gaussian white noise.
The BOCPD method learned parameter values through training, so we only list the values we used to initiate the method. For Series 1, Series 2, and Series 3, we used a training set of 200 data, taken at the beginning of the time series. The Gaussian process model used a non-biased parameter initialization, with an assumed standard normal distribution prior for Series 1, Series 2, and Series 3. The hazard rate parameter used for the hazard function for initial training for each time Series is θ h = −3.982. The Delta point interval length for each time series was set at 40 for training, as this should protect against the BOCPD possibly declaring too many erroneous changepoints. The techniques TY, LS, and L all require a threshold value above which a changepoint will be declared. We performed cross-validation of several threshold values for each method, choosing the value for each time series that allows the most accurate detection of the changepoint of interest. We select the changepoint declared by each method that is closest to the significant changepoint described above.
The results of each method are displayed in Table 1. For Series 1, the Delta point method significantly (p < 0.001) outperformed methods TY and LS in the mean absolute difference (abs. diff.) of detection time, and had a significantly lower MSE. This difference in performance is confirmed by a two sided t-test with a null hypothesis that other methods do not have a significant mean abs. diff. from the Delta point method. In our simulations, method L had a slightly smaller mean abs.
diff. (8.953) compared to the Delta point method (9.718), however the distribution of declared changepoints had a larger standard deviation (12.783 compared to 9.881).
As well, the Delta point method had a lower MSE. The two sided t-test confirmed that both methods had indistinguishable performance (p = 0.135). For Series 2, the Delta point method significantly (p < 0.001) outperformed all methods. For Series 3, method TY performed the best, with the lowest mean absolute difference.

Donoho-Johnstone Benchmark
To further analyze the performance of the Delta point method, we tested it and the existing methods on the Donoho-Johnstone Benchmark non-stationary time series [19]. The Donoho-Johnstone Benchmark is a classic collection of four non-. CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint stationary time series designed as a test for neural network curve fitting. The curves are known as the Block, Bump, Doppler, and Heavisine, and are 2048 data points in length each. We adapted the curves with the introduction of noise to test for changepoint detection. As the Delta point method is not designed to function with time varying periodic data, rather piecewise locally stationary time series, we did not test with the Doppler and Heavisine curves.
For training for the Delta point method, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ h = −3.982. We set the Delta point interval to 10 for the Bump curve due to its high variability, and 50 for the Block curve, as this curve had more well-defined changes. The training set consisted of the first 800 data of each curve, to correspond to the rule of thumb of using the first 35% to 40% of time series data for training [18].
The results of the methods for the Bump and Block curves are displayed in Table   2. The Delta point method performed very well in these cases, declaring the change-

Well-log and Nile recordings
The well-log data set consists of 4050 nuclear magnetic resonance measurements obtained during the drilling of a well [20]. These data are used to deduce the geophysical structures of the rock surrounding the well. Variations in the mean reflect differences in stratification of the Earth's crust [17]. The well-log data set is a well studied time series for changepoint detection [20,17,18]. We selected the largest jump in the mean of the time series as the most significant change point (i = 1070).
The Nile river time series consists of a record of the lowest annual water levels between 622-1284 CE, recorded on the Island of Roda, near Cairo, Egypt [21]. The Nile river data set has been used extensively in changepoint detection [18], making it an effective benchmark for the Delta point method. Geophysical records suggest the . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint installation of the Nilometer in 715 CE, a primitive device for more accurate water level measurements. As such, we selected this as the changepoint of significance.
For training for the Delta point method for both time series, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ h = −3.982. For the well-log data, the Delta point interval was chosen as 30 due to the length of the time series and sensor noise, and for the Nile river time series, the Delta point interval was chosen to be 50, as the curve is smoother. The training set consisted of the first 1000 data for the well-log series, and first 250 data for the Nile river set.
The results of each method are displayed in Table 3. The Delta point method performed better than the other methods TY, LS, and L for the well-log data set

ECG recordings
To determine the effectiveness of the Delta point method in detecting a significant changepoint in short time series, we tested each method's accuracy in detecting the QRS complex, and the difference between the labelled beginning and detected time.
As each time series is short, and the QRS complex rapidly begins and ends in the recording, accurate detection of the changepoint was considered very important.
For training for the Delta point method, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ h = −3.982. Due to the short nature of these time series, the training set length was selected to be the first 30 data points; the training set never included the QRS complex for any of the 100 instances. The Delta point interval was set to 5, as the QRS complex is very short, and occurs rapidly in the series. The time series rapidly changes here, so a shorter interval performed best.
The performance of each method is displayed in Table 4 Table 5.
We also computed Bland-Altman plots for the experimental time series to compare the Delta point method to each other method. In Figure 7(A), we display the Bland-Altman plot for the Delta point method and TY with mean difference (6.93±89.03).

Discussion
The detection of change points in a non-stationary time series is a well studied problem, which has produced many techniques [14,8,15,17,16]. Some of the first work in change point detection is due to [14], where changes were detected by comparing probability distributions of time series samples over past and present intervals. This work was extended by Takeuchi and Yamanishi (method TY) [15], who proposed a scoring procedure along with outlier detection, to compare past and present probability distributions in real-time. The scoring method is based on an . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017.  CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint interest. Although method LS had a lower MSE for this series, its mean detection difference is closer to 0. In Series 3, method L performed the best, with the smallest mean absolute difference in detection time, and MSE. Series 3 consisted of the introduction of the linear trend to the autoregressive moving average model, of which the introduction of the trend was obscured by added noise. Since method L compares density ratios of the time series, its good performance on this time series is likely due to noticing these changing ratios before other methods noticed the trend. For the clinical ECG data set, ECGFiveDays, the Delta point method performs significantly better than methods TY and LS, however has an indistinguishable performance difference with method L, although the Delta point method has the lowest MSE. Due to the rapidly varying nature of the time series when the QRS complex begins, the ability of method L to compare density ratios between components of the time series is beneficial and improves its performance compared to other methods.
With regards to the fetal sheep experimental data set, the early detection of acidemia is better than late detection from a clinical perspective. Hence, we defined the clinical ROI according to expert physician input. The 20 minute window before the expert-labelled sentinel point provides adequate warning to clinicians to increase monitoring, or expedite delivery, while the 3 minute window posterior to the expert-labelled sentinel point is sufficiently close to be included in the experimental procedure. In clinical settings, we believe that earlier detection is better, as it provides longer decision making time, and justification for increased monitoring.
. CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint The novelty of the current work is that our method permits statistical-level predictions about concomitant changes in individual bivariate time series, simulated or physiological such as HRV measure RMSSD and ABP in an animal model of human labour. Our method is able to predict cardiovascular decompensation by identifying ABP responses to UCO, a sensitive measure of acidosis. These predictions are reliable even in the instances when the signals are noisy. This is based on our observation that here, to mimic the online recording situation, no artefact correction for RMSSD was undertaken as is usually done for HRV offline processing  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint   . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint Tables   Table 1 Simulation   . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint . Comparison of detected changepoint of importance in nuclear magnetic resonance measurements from a rock drill used to detect changes in rock stratification, and lowest annual water levels of the Nile River from 622-1284. The changepoint of importance is selected as the first significant jump in the mean, indicating the presence of a change in the ground rock, and the instillation of the Nilometer, respectively.  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted February 5, 2017. ; https://doi.org/10.1101/106088 doi: bioRxiv preprint