An Improved Unscented Kalman Filter Based Decoder for Cortical Brain-Machine Interfaces

Brain-machine interfaces (BMIs) seek to connect brains with machines or computers directly, for application in areas such as prosthesis control. For this application, the accuracy of the decoding of movement intentions is crucial. We aim to improve accuracy by designing a better encoding model of primary motor cortical activity during hand movements and combining this with decoder engineering refinements, resulting in a new unscented Kalman filter based decoder, UKF2, which improves upon our previous unscented Kalman filter decoder, UKF1. The new encoding model includes novel acceleration magnitude, position-velocity interaction, and target-cursor-distance features (the decoder does not require target position as input, it is decoded). We add a novel probabilistic velocity threshold to better determine the user's intent to move. We combine these improvements with several other refinements suggested by others in the field. Data from two Rhesus monkeys indicate that the UKF2 generates offline reconstructions of hand movements (mean CC 0.851) significantly more accurately than the UKF1 (0.833) and the popular position-velocity Kalman filter (0.812). The encoding model of the UKF2 could predict the instantaneous firing rate of neurons (mean CC 0.210), given kinematic variables and past spiking, better than the encoding models of these two decoders (UKF1: 0.138, p-v Kalman: 0.098). In closed-loop experiments where each monkey controlled a computer cursor with each decoder in turn, the UKF2 facilitated faster task completion (mean 1.56 s vs. 2.05 s) and higher Fitts's Law bit rate (mean 0.738 bit/s vs. 0.584 bit/s) than the UKF1. These results suggest that the modeling and decoder engineering refinements of the UKF2 improve decoding performance. We believe they can be used to enhance other decoders as well.

In the Kalman filter framework, the goal is to estimate or track some variables of interest which cannot be directly observed. These variables can be indirectly observed through some observations, which have a known relationship with the variables of interest. In the version presented here, time is discretized uniformly and the function describing the temporal evolution of the variables of interest is known. Both the observations and the temporal evolution are contaminated with noise, and the noise is assumed to have known covariance (across dimensions) and assumed to be independent across time.
In neural decoding for cursor or prosthesis control, the variables of interest are kinematics and potentially other variables to decode. The observations are binned spike counts of the recorded population or some other quantification of neural activity.
Let the true state (kinematics) at time t be the m-dimensional column vector . Let the observations (spike counts) at time t be the n-dimensional column vector . Then we assume we have knowledge of how the states relate to the observations: (1) where is a function called the observation model, and is zero mean independent (across time) random noise with known covariance matrix R. In the standard way of applying the Kalman filter to neural decoding, the observation model is the neural encoding model which generates predictions of firing rate given kinematics and other variables of interest. For the Kalman filter, the function is a linear function and is written as a multiplication with a weight matrix H. For the unscented Kalman filter, can be non-linear, though non-linear functions which change rapidly carry a higher approximation penalty. We can fit the neural encoding model, that is, fit the weights in the matrix H or the parameters of the function , by using training data that consists of recordings of neural activity and concurrent kinematics.
We also assume we know how the state evolves with time: (2) where is a function called the transition model, and is zero mean independent (across time) random noise with known covariance matrix Q. The function is again a linear function for Kalman filtering, and it is written as a multiplication with a weight matrix F. For the UKF1 and UKF2, is also a linear function, as we do not focus on sophisticated modeling of kinematics.
The above equations describe the modeling framework for Kalman filtering. During operation, the filter does not have access to the true states, but generates estimates for them. Our estimates have some uncertainty so we represent the uncertainty with a probability distribution, which we parameterize by its mean and covariance. Let us denote the (m x 1) mean vector of the estimate of the state variables at time t as . Let us denote the (m x m) covariance matrix of the estimate of the state variables at time t as .
The equations implementing the Kalman filter, without control inputs, are (Welch and Bishop 2006): where (m x 1) and (m x m) are intermediate variables, (m x n) is the Kalman gain at time t, and is the m-dimensional identity matrix. These equations are executed once per time step (corresponding to one bin). The estimated mean state variables are used as the decoder output at step t.
The equations implementing the UKF1 and UKF2 start the same as the Kalman filter: Then, the sigma points … , a set of deterministic sampling points, are generated from the distribution parameterized by and . The purpose of the sigma points is this: a key step in Kalman filtering is the computation of the expected observations (given the current estimate of the state). The filter needs both the mean and covariance matrix of the expected observations. In the Kalman filter equations above, is the mean and is the covariance. These calculations are closed-form because the observation model is linear. When the observation model is non-linear, there is no general closed-form method for arbitrary non-linear functions. One way to approximate the mean and covariance of the expected observations is via sampling. The unscented transform is a deterministic sampling procedure to do this, and it is the heart of the unscented Kalman filter. This procedure first computes a set of sigma points, then passes these sigma points through the non-linear observation function, and then computes the mean and covariance from the resulting sigma points.
The sigma points are: where … are a total of 2m+1 column vectors of length m, the subscript outside the parentheses indicate which row should be taken from the matrix inside the parentheses, and the square root is the matrix square root. The matrix square root is computed using the Cholesky decomposition for robustness.
Next, the sigma points are evaluated through the observation model: (13) where are 2m+1 column vectors of length n. Evaluating the observation model, or neural encoding model, consists of the following steps. (1) Generate features, including the non-linear terms in the neural encoding model. For the UKF2, augment kinematic features with the spike counts from the previous time step (we include these when referring to features in the following). (2) Subtract from each feature the mean calculated for that feature based on the training data. (3) Divide each feature by the standard deviation calculated for that feature based on the training data. (4) Multiply features by the coefficients fitted from the training data. (5) Add products together to arrive at a (centered and normalized) spike count prediction for each unit. The multiply and add in steps 4 and 5 are implemented using matrix multiplication. The predicted spike count for each unit is an element of the column vector . Repeat the above steps for each to get the set of 2m+1 vectors . Then we find the mean of , the covariance of , the cross-covariance between and , and the Kalman gain: The above equations execute one time step of the UKF1 or UKF2.
During offline reconstructions, we use the position and velocity output of the tap with zero temporal offset, i.e. at time t, we use , , and . During closed-loop control with the UKF1, we use the position output with zero temporal offset. During closed-loop control with the UKF2, we use the position and velocity output of the tap with 2 temporal offset, i.e. at time t, we use , , and , as input to the position-velocity mixture scheme.

Ridge Regression Parameter Selection
The best ridge regression parameter per neuron (unit) was found by search. We set an upper limit (16, arbitrary units) to prevent under-fitting and to limit search time. This search optimized the predicted residual sum of squares (PRESS) statistic. PRESS is the leave-one-out cross-validation error of a linear regression, which can be calculated quickly from the hat matrix (Tarpey 2000): (20) where is the residual for data point i, N is the amount of data, and is the ith diagonal entry of the hat matrix of the regression. The hat matrix for the ridge regression is , where X is the design matrix, is the ridge regression parameter, and is the identity matrix. Once the regularization parameter was decided, the entire training data was used for a final fit.

Model Fitting Details
In closed-loop experiments, we pre-processed the hand movement training data for the UKF2 by applying temporal smoothing. Smoothing was performed by averaging in a moving window of width 5 (bins) centered on the output bin. Smoothing was applied to the position data before taking the gradient to compute the velocity data. Smoothing was applied to the velocity data before taking the gradient to compute the acceleration data. For the Kalman filter during closed-loop experiments, instead of smoothing, the training data was adjusted using intention-estimation (Gilja et al. 2012, Fan et al. 2014. For the UKF1, no smoothing or intention estimation was used for closed-loop experiments, as was done in Li et al. (2009).
For offline reconstructions and encoding model predictions, training and testing data hand positions were smoothed (before computing velocity) using a moving window of width 11. This was performed for all decoders.
For all decoders, we transformed both kinematic and spike count data so that they were zero mean and unit variance. The shifting and scaling parameters of this transformation were computed from training data only. Additionally, for the unscented Kalman filters, the non-linear features computed from the state variables were also transformed to be zero mean and unit variance. This was performed as a step after feature generation, both during parameter fitting and filter operation (which used the training data's statistics for shifting and scaling). By making inputs zero mean, we avoided the need for a bias term in our regressions. By making all features unit variance, we mitigate unequal treatment of features due to scaling differences when using ridge regression.
After fitting the observation model (encoding model) coefficients using ridge regression, we found the observation model noise covariance matrix R by dividing the sum of squared residuals matrix by degrees of freedom (number of data minus number of parameters).

FIT Kalman Filter
In our online comparisons, the Kalman filter used the position-as-feedback refinement of Gilja et al (2012)  The position-as-feedback refinement is implemented by: (1) setting the a priori estimate error covariance values ( ) for position to zero before the calculation of the Kalman gain; (2) setting the state transition model noise covariance ( ) entries for position to zero, except for the diagonal entries, which receive a small value (10 -4 ) for numerical stability. These two changes caused the Kalman gain to be calculated with the assumption that there is extremely small uncertainty in the position. This meant the filter attributed the innovation (actual spike counts minus predicted spike counts, ) to errors in velocity only, and thus updated the velocity only. We did not use a precisely zero value for positional uncertainty to avoid singular matrix inversions. This refinement was only used for the FIT Kalman filter and only during closed-loop neural control of cursor.
Intention estimation is implemented as a pre-processing step before parameter fitting. The velocity vectors in the data for fitting the Kalman filter were rotated so that: (1) they pointed towards the target, when the cursor's center was outside the target; (2) they were zero, when the cursor's center was inside the target. This refinement was used in the FIT Kalman filter in closed-loop control as well as in encoding model analysis, where it was applied to both training and testing data for the Kalman Intention Estimation condition (only).
During closed-loop decoding, the position variable in the Kalman filter state integrates the previous time step velocity estimates via the transition model. To include the most recent velocity estimate in the position output, the most recent decoded velocity is added to the most recent decoded position estimate to get the final positional output for the FIT Kalman filter. The Kalman filter parameters, both in offline reconstructions and closed-loop experiments, are fit on neural and hand movement data with zero temporal offset; like in Gilja et al. (2012), we did not optimize the temporal offset with strongest encoding.

Behavioral Tasks
We used two behavioral tasks. One task is the center-out cursor movement task ( Figure 2B). Each trial consisted of the appearance of a circular target, reach toward the target, and hold duration. Successful holds were rewarded with a small amount of water. Each trial may have a peripheral or central target, and successive trials, if successfully completed, alternated central and peripheral targets, forming center-out-and-back movements. The center target was always at screen center (0, 0 position). Peripheral targets may appear at any location in an equal-distance ring around the screen center, with the exact location randomly chosen from a uniform distribution on angle. Circular targets were 5 cm in diameter (screen size) and the cursor was visually 2 cm in diameter, though it was treated as a point when deciding whether it was inside the target (i.e. logically 0 cm in diameter).
The distance from screen center to the center of peripheral targets was 10 cm during control of the cursor by hand movements and 8 cm during control of the cursor by neural activity. This was to increase the chances that hand movements in the training data covered cursor locations encountered during neural control, to make training data as representative as possible.
The hold time was fixed at 500 ms for all sessions. The inter-trial-interval was set to zero on successful trials. A 200 ms inter-trial-interval was used to penalize failed trials. Trials were considered a failure if the target was not entered 5000 ms after appearance. If the target was entered but hold time was insufficient, the failure timer was restarted from zero the moment the cursor exited the target. Thus it was possible for successful trials to be longer than 5000 ms.
Monkeys also performed a moving target pursuit task with the joystick, for offline analysis and reconstructions only. In this task, the 6 cm diameter target moved continuously following either a Lissajous curve (Figure 2C) or smoothed point-to-point trajectory ( Figure 2D). The monkey had to keep the cursor within the moving target to receive periodic water rewards. The Lissajous curve was described by the following equations: where and are the target x and y axis positions, respectively, in screen coordinates at time t, in seconds, and the Lissajous curve parameters were , , Hz, , and cm. The temporal frequency was different for the x and y axes, making the values of the two coordinates uncorrelated. Our previous study showed that the UKF1's movement model alone could not reconstruct the Lissajous curve (Li et al. 2009), so even though the curve is deterministic, neural decoding was required to trace it with a cursor under UKF1 control.
For the point-to-point trajectory, the target moved smoothly between points uniformly drawn from the screen workspace. These waypoints were visible to the monkey, so that it could plan movements. The target's acceleration (≤ 3 cm/s 2 , screen units) and velocity (≤ 6 cm/s, screen units)