Single and Multiple Change Point Detection in Spike Trains: Comparison of Different CUSUM Methods

In a natural environment, sensory systems are faced with ever-changing stimuli that can occur, disappear or change their properties at any time. For the animal to react adequately the sensory systems must be able to detect changes in external stimuli based on its neuronal responses. Since the nervous system has no prior knowledge of the stimulus timing, changes in stimulus need to be inferred from the changes in neuronal activity, in particular increase or decrease of the spike rate, its variability, and shifted response latencies. From a mathematical point of view, this problem can be rephrased as detecting changes of statistical properties in a time series. In neuroscience, the CUSUM (cumulative sum) method has been applied to recorded neuronal responses for detecting a single stimulus change. Here, we investigate the applicability of the CUSUM approach for detecting single as well as multiple stimulus changes that induce increases or decreases in neuronal activity. Like the nervous system, our algorithm relies exclusively on previous neuronal population activities, without using knowledge about the timing or number of external stimulus changes. We apply our change point detection methods to experimental data obtained by multi-electrode recordings from turtle retinal ganglion cells, which react to changes in light stimulation with a range of typical neuronal activity patterns. We systematically examine how variations of mathematical assumptions (Poisson, Gaussian, and Gamma distributions) used for the algorithms may affect the detection of an unknown number of stimulus changes in our data and compare these CUSUM methods with the standard Rate Change method. Our results suggest which versions of the CUSUM algorithm could be useful for different types of specific data sets.


PERISTIMULUS TIME HISTOGRAM -PSTH
For constructing a smoothed PSTH y t the spikes of all simultaneously recorded cells were pooled into a combined spike train. We did not pool over stimulus presentations but calculated a PSTH for every trial. For every time point t, y t depends exclusively on the previous population activity. Let C be the number of cells and {x 1 , . . . , x K } (in ms) the spike times of all cells in one trial. We used two different smoothing methods, rectangular and 'half Gaussian' smoothing. A rectangular smoothed PSTH (spikes/sec) with a bandwidth ∆ (in ms) is defined as A 'half Gaussian' smoothed PSTH (spikes/sec) with a bandwidth ∆ is defined as Since both methods yielded similar results, only the results of the rectangular smoothed PSTH were shown in this article.

Poisson Process
In neuroscience, a spike train is often assumed to be a Poisson process, where the occurrence of a single spike depends solely on time and is independent of other spikes (see, e.g., ?). The probability density function of a Poisson distribution is given by: The expected value µ equals the variance. If a spike train is regarded as a Poisson process, this implies three mathematical assumptions.
1. There is at most one spike at one time point (a neuron cannot generate two spikes at the same time) 2. The spike counts in disjoint intervals are independent of each other.
3. The spike count y t in an interval [t, t + a] is Poisson distributed with the density function with µ being the average spike rate.
The probability that k spikes occur in an interval [t i , t i + a] of length a is the same for all t i . We assume a Poisson process for the given spike train. Spike times are converted to an unsmoothed PSTH y t with a bin size b. The value y t is the number of spikes in [t − b/2, t + b/2[.

Additive shift of the expected value
Under the assumption of additive shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2.1) is transformed into the following hypothesis.
where µ t is the spike rate at time point t and µ 0 the previous spike rate, which is calculated with the maximum likelihood estimator: R is the length of the reference window. The logarithmic residuals s t from Eq. (2) (Methods Section 2.2.1 and Tables 1,3) are This leads to the recursive sum S t .

Multiplicative shift of the expected value
Under the assumption of multiplicative shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2.1) is transformed into the following hypothesis.
µ 0 is calculated in the same way as before (Eq. (S6)). The log-likelihood in the time point t is calculated as: The recursive form of the CUSUM procedure (Eq. (2), Methods Section 2.2.1) can now be determined.

Normal Distribution
With the assumption of normally distributed spike rates, the values of a PSTH at each time point y 1 , y 2 , . . . , y n is assumed to obey the independent Gaussian distribution with variance σ The maximum likelihood estimators for µ and σ are the empirical mean and the empirical variance (S17)

Additive shift of the expected value
Under the assumption of additive shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2.1) is transformed into the following hypothesis.
The logarithmic residuals are This leads to the recursive sum S t S 0 = 0 and S t = max 0,

Multiplicative shift of the expected value
Under the assumption of multiplicative shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2) is transformed into the following hypothesis.
The logarithmic residuals s t are: The cumulative sum S t was calculated by:

Gamma Distribution
Under the assumption of the Gamma distribution, measured data points y 1 , y 2 , . . . , y n are independently Gamma distributed with shape parameter k The maximum likelihood estimator for µ is the empirical mean. Since a close form for the maximum likelihood estimator for k does not exist, an approximation form was used (?).

Additive shift of the expected value
Under the assumption of additive shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2.1) is transformed into the following hypothesis.
The logarithmic residuals s t are: The cumulative sum S t was calculated by:

Multiplicative shift of the expected value
Under the assumption of multiplicative shifts, the general hypothesis of Eq. (1) (see Methods Section 2.2.1) is transformed into the following hypothesis.
The logarithmic residuals s t are: The cumulative sum S t was calculated by:

Optimization of the thresholds α
For each tested parameter combination, the thresholds α in and α de were computed to maximize the value of P (see Eq. (6) in Methods Section 2.4). Initial values were set for α in and α de , and then the values were shifted upwards and downwards first with a big step size (e.g., a step size of 10) until the best performance was reached. This procedure was performed repeatedly with shrinking step sizes until the best performance was reached by a step size of 0.25. The optimization was done for each of the 10 iteration of the stimulus protocol independently.

Optimization of the parameter combinations
The basic idea of optimizing the parameter combinations was to go from coarse to fine. First a coarse grid of the parameter combinations was tested. Then every parameter of the best combinations was varied repeatedly with a finer grid until the best performance was achieved. For the single case four parameters have to be optimize (bandwidth ∆, relative shifts δ in , δ de and length of reference window R). For example, the initial set of tested bandwidths was {1, 5, 10, 30, 50, 70} and the reference window R {50, 100, 150, 200} ms. If, e.g., the bandwidth of 30 ms and the length of 150 achieved a good result, for the next set of tested bandwidths was {20, 25, 30, 35, 40} combined with reference windows {100, 125, 150, 175, 200}. Often, several parameter combinations achieved good results. Therefore, all parameter combinations, which had a performance P > 1.3 or were within the 20, were further investigated.