Decoding Kinematic Information From Primary Motor Cortex Ensemble Activities Using a Deep Canonical Correlation Analysis

The control of arm movements through intracortical brain–machine interfaces (BMIs) mainly relies on the activities of the primary motor cortex (M1) neurons and mathematical models that decode their activities. Recent research on decoding process attempts to not only improve the performance but also simultaneously understand neural and behavioral relationships. In this study, we propose an efficient decoding algorithm using a deep canonical correlation analysis (DCCA), which maximizes correlations between canonical variables with the non-linear approximation of mappings from neuronal to canonical variables via deep learning. We investigate the effectiveness of using DCCA for finding a relationship between M1 activities and kinematic information when non-human primates performed a reaching task with one arm. Then, we examine whether using neural activity representations from DCCA improves the decoding performance through linear and non-linear decoders: a linear Kalman filter (LKF) and a long short-term memory in recurrent neural networks (LSTM-RNN). We found that neural representations of M1 activities estimated by DCCA resulted in more accurate decoding of velocity than those estimated by linear canonical correlation analysis, principal component analysis, factor analysis, and linear dynamical system. Decoding with DCCA yielded better performance than decoding the original FRs using LSTM-RNN (6.6 and 16.0% improvement on average for each velocity and position, respectively; Wilcoxon rank sum test, p < 0.05). Thus, DCCA can identify the kinematics-related canonical variables of M1 activities, thus improving the decoding performance. Our results may help advance the design of decoding models for intracortical BMIs.

The control of arm movements through intracortical brain-machine interfaces (BMIs) mainly relies on the activities of the primary motor cortex (M1) neurons and mathematical models that decode their activities. Recent research on decoding process attempts to not only improve the performance but also simultaneously understand neural and behavioral relationships. In this study, we propose an efficient decoding algorithm using a deep canonical correlation analysis (DCCA), which maximizes correlations between canonical variables with the non-linear approximation of mappings from neuronal to canonical variables via deep learning. We investigate the effectiveness of using DCCA for finding a relationship between M1 activities and kinematic information when non-human primates performed a reaching task with one arm. Then, we examine whether using neural activity representations from DCCA improves the decoding performance through linear and non-linear decoders: a linear Kalman filter (LKF) and a long short-term memory in recurrent neural networks (LSTM-RNN). We found that neural representations of M1 activities estimated by DCCA resulted in more accurate decoding of velocity than those estimated by linear canonical correlation analysis, principal component analysis, factor analysis, and linear dynamical system. Decoding with DCCA yielded better performance than decoding the original FRs using LSTM-RNN (6.6 and 16.0% improvement on average for each velocity and position, respectively; Wilcoxon rank sum test, p < 0.05). Thus, DCCA can identify the kinematics-related canonical variables of M1 activities, thus improving the decoding performance. Our results may help advance the design of decoding models for intracortical BMIs.

INTRODUCTION
The primary motor cortex (M1) is robustly linked to the kinematic parameters of the upper limbs (Humphrey, 1972;Humphrey and Corrie, 1978;Georgopoulos et al., 1982Georgopoulos et al., , 1986Sergio et al., 2005;Schwartz, 2007;Aggarwal et al., 2008;Vargas-Irwin et al., 2010). This concept provides a basis for decoding information in an intracortical brain-machine interface (BMI), which often harnesses M1 activities to infer continuous movement parameters in order to enable the neural control of external effectors. Intracortical BMIs have largely relied on functional relationships between M1 activities and kinematics (Moran and Schwartz, 1999b;Paninski et al., 2003;Wu et al., 2004;Shanechi et al., 2016;Vaskov et al., 2018). For instance, a number of BMIs have been developed based on a finding that a population vector constructed from the firing activities of a neuronal ensemble can predict the kinematic variables of arm movements, such as direction, speed, position, and velocity (Georgopoulos et al., 1986Flament and Hore, 1988;Schwartz et al., 1988;Moran and Schwartz, 1999b;Paninski et al., 2003;Wang et al., 2007). In addition to the population vector, neural representations capturing the shared variability in the population's neural activity have been demonstrated to be effective in predicting behavioral covariates Shenoy et al., 2013;Cunningham and Yu, 2014;Kao et al., 2015). These neural representations can be acquired through unsupervised learning techniques such as principal components analysis (PCA) (Ames et al., 2014;Kaufman et al., 2014), factor analysis (FA) , and a linear dynamical system (LDS) based latent-state estimation (Kao et al., 2015) and are known to allow a decoder to guarantee stable outputs Kao et al., 2013). Such neural representations of neural population activity could help enhance decoding kinematic variables. Decoding models for intracortical BMIs are broadly categorized into two categories. The first category is a generative method that operates based on the generation of neuronal firing activities from kinematic states described by encoding models. The second category is a direct method that operates based on a direct input-output function approximation from neuronal firing activities to kinematic variables (Chapin et al., 1999;Sussillo et al., 2012;Dethier et al., 2013;Ahmadi et al., 2019).
Generative decoding methods designed for BMIs include a population vector algorithm (Georgopoulos et al., 1986;Schwartz and Moran, 2000;Van Hemmen and Schwartz, 2008), a Kalman filter (KF) (Wu et al., 2004(Wu et al., , 2006Gilja et al., 2012;Golub et al., 2014), a point process-based adaptive filter (Wang et al., 2009;Shanechi et al., 2014), and a particle filter (Gao et al., 2001), to name a few. These methods infer kinematic information from observed neuronal activities via encoding models. The performance of generative decoding methods thus substantially depends on the assumptions and appropriateness of encoding models. Furthermore, direct decoding methods designed for BMIs include the Wiener filter (Chapin et al., 1999;Serruya et al., 2002;Fagg et al., 2009;Chhatbar and Francis, 2013;Willett et al., 2013), support vector regression (Kim et al., 2006;Xu et al., 2011), and artificial neural networks (ANNs) (Wessberg et al., 2000;Sanchez et al., 2003). Particularly, ANNs can serve as a direct approximator of a non-linear functional relationship between M1 activities and kinematic variables. Various types of ANNs have been suggested to decode M1 activities, including time-delay neural networks (Kim et al., 2003;Wang et al., 2005), recurrent neural network (RNN) (Sussillo et al., 2012), and echostate network (Rao et al., 2005). Furthermore, owing to recent breakthroughs in deep learning, using deep neural networks (DNNs) for decoding M1 activities has become plausible (Sussillo et al., 2012;Ahmadi et al., 2019). For instance, a long short-term memory RNN (LSTM-RNN), one of the non-linear models harnessing temporal information in past neural activities, outperformed other decoding models for BMIs (Ahmadi et al., 2019). Despite their high performance, the intricate architectures of DNNs often require a much larger training data to achieve a successful decoding process. Furthermore, recent efforts to record a larger number of neuronal activities (e.g., >1,000 units) demand effective representational spaces of neuronal ensemble activities, which will also reduce the burden of training DNNs (Marblestone et al., 2013).
Considering the advantage of DNNs as a universal nonlinear approximator, in the present study, we propose a novel approach for decoding M1 activities to estimate limb kinematics by exploring a joint representational space between M1 activities and kinematics. In this joint space, the representation variables of a neuronal ensemble and kinematic parameters are created in a way to maximize coupling between neuronal and kinematic representation variables. Among the many ways of doing so, we leveraged methods, such as a canonical correlation analysis (CCA), to maximize correlations between these variables. As one of the multivariate statistical methods, CCA maximizes correlations between joint (canonical) variables. A conventional linear canonical correlation analysis (LCCA) builds a linear mapping between a neuronal ensemble and canonical variables (Hotelling, 1936;Anderson, 1984;Friman et al., 2007). However, more informative neuronal canonical variables can be extracted from neuronal ensemble activities by using a non-linear method. A recently developed deep canonical correlation analysis (DCCA) allows us to examine this possibility by approximating a non-linear mapping from neuronal ensemble activities to canonical variables with a DNN (Andrew et al., 2013). Previous non-invasive brain-computer interface (BCI) studies showed the effectiveness of DCCA as a means of feature extraction from electroencephalogram associated with various covariates of interest, such as eye movements and visual stimulus frequencies (Vu et al., 2016;Qiu et al., 2018;Liu et al., 2019). For example, Vu et al. successfully improved the performance of the steady-state visual evoked potential-based BCI using DCCA-based feature extraction (Vu et al., 2016). Although DCCA suffers from the same difficulty in interpreting neural activities as DNNs, canonical variables estimated by DCCA may effectively represent kinematicsrelated neuronal ensemble activities. Consequently, decoding these canonical variables may achieve a similar or superior performance to decoding original firing rates (FRs) while keeping a decoding model concise.
Based on this hypothesis, the present study aims to investigate how hand velocity information is represented in canonical variables found by LCCA or DCCA and to compare those representations with other neural representations (PCA, FA, and LDS) extracted from naïve ensemble FRs (Z E-FR ). Moreover, we aim to investigate the performance of decoding hand velocity information from the five types of neuronal representations (E-FR, PCA, FA, LCCA, and DCCA) using one of the two types of decoders, i.e., LKF and LSTM-RNN. Additionally, we examine whether DCCA yields better velocity decoding performance compared to a neural dynamical filter (NDF), which is a linear mapping model to predict kinematic variables from LDS-based latent states (Kao et al., 2015). In this study, we apply various decoding methods to the data of M1 firing activity and hand movements in two non-human primates that performed a 2D reaching task with one arm.

Datasets
The two datasets used in this study are available on a neuroscience data depository, called the Collaborative Research in Computational Neuroscience (Flint et al., 2012;Lawlor et al., 2018;Perich et al., 2018). Each dataset includes the cortical firing activity and hand movement recordings, which were acquired from a non-human primate performing an arm reaching task on 2D spaces with one arm (see Figure 1A). The dataset CRT (center-out reaching task) of Flint et al. includes M1 activities for monkeys to perform a center-out reaching task to acquire eight different targets that were placed at 45 • intervals around a circle with their home placed on the center (Flint et al., 2012). The dataset SRT (sequential reaching task) of Lawlor et al. (2018) includes M1 and dorsal premotor cortical activities during a sequential reaching task, where a series of targets were randomly displayed on 2D spaces Perich et al., 2018). All cortical activities were extracellularly recorded by a 128-channel acquisition system (Cerebus, Blackrock Microsystems, Inc., Salt Lake City, UT, United States) through 96-channel silicon microelectrode arrays chronically implanted in the arm area of M1.
In this study, we analyzed only M1 activities to develop and test decoders. The spike trains of each neuron were binned with a non-overlapping window of 50 ms to maximize the mutual information between neural FRs and kinematics (Paninski et al., 2003;Suminski et al., 2010). FRs were estimated by spike counts within the bin divided by its size (i.e., 50 ms). We also square-root-transformed the FRs in each bin to make them more normally distributed for linear decoding models (Schwartz and Moran, 1999). Then, we performed a Gaussian kernel smoothing process to reduce temporal noise of individual unit activities, where the kernel standard deviation (SD) was determined according to Yu et al. (2009) (SD = 80 ms in the dataset CRT, SD = 140 ms in the dataset SRT). An instantaneous hand position was converted into the velocity and its absolute value (speed). This kinematic combination (velocity and speed) is shown to be appropriate predictors for establishing tuning models (Rasmussen et al., 2017). Using the velocity and speed, we calculated the goodness of fit (r 2 ) of a linear tuning model for each neuron, which was designed based on the cosine tuning model (Moran and Schwartz, 1999a), expressed by: z (t) = β 0 +β 1 v(t) + β 2 v (t) + (t) where z(t) is FRs, β 0 , β 1 , and β 2 are model coefficients, and v(t) and v (t) denote a vector of velocity and its norm (speed) at time t, respectively. Then, we selected the neurons with r 2 > 0.01, where the threshold of r 2 (>0.01) was empirically determined. A total of 155 and 63 neurons passed these criteria in the datasets CRT and SRT, respectively. The datasets CRT and SRT included 175 and 496 successful trials, respectively, in which animals successfully acquired the targets during the tasks. To build and validate decoders, the first 75% of the trials were used for training and the remaining 25% of the trials were used for testing: the training and testing sets of the dataset CRT contained 131 and 44 trials, respectively, and those of the dataset SRT contained 372 and 124 trials, respectively. Every parameter estimation of the models built in this study (see below) was performed using the training set only.

Linear Canonical Correlation Analysis
Canonical correlation analysis is one of the multivariate statistical methods that extracts joint canonical variables from random vectors z and x. In this study, z and x correspond to the FRs [z 1 , z 2 , . . . , z M ] T ∈ R m×1 from m neurons and the hand kinematics [x 1 , x 2 , x 3 ] T ∈ R 3×1 , where x 1 and x 2 denote the velocity of the x-and y-coordinates, respectively, and x 3 denotes the speed. An LCCA seeks linear mappings from z and x to canonical variables by maximizing correlations between canonical variables (Hotelling, 1936;Anderson, 1984;Friman et al., 2007). The canonical coefficients {α, β} on these linear mappings are defined as where ρ(·) denotes a function of the correlation between canonical variables. Z and X are the covariance matrices of centralized dataz andx, respectively, and ZX is the sample cross-covariance matrix. To make the canonical coefficients scale-free, the denominator is constrained to have unit variance, such that The singular value decomposition is used to derive α * and β * . Using these variables, the canonical variables of z and x can be estimated byû By using the eigenvectors corresponding to the largest eigenvalues, we repeatedly computed a pair of canonical variables, {û Z ,û X }, until the number of pairs equals m or 3. For convenience, we call the linear neural (û Z ) and kinematic (û X ) canonical variables Z LCV and X LCV , respectively.

Deep Canonical Correlation Analysis
A DCCA is one of the advanced CCA methods based on DNNs. DCCA finds non-linear mappings from z and x to canonical FIGURE 1 | Simulation overview for assessing the effects of DCCA on two decoders. (A) Behavioral tasks for each dataset. The left panel denotes a center-out reaching task which the monkey C performed, and the right panel is a sequential reaching task which the monkey M performed. (B) The schematic diagram depicts the DCCA between firing rates and kinematic variables. The left inputs (L-input) of the networks indicate the naïve firing rates and the right inputs (R-input) of the networks denote the kinematic variables: x-and y-velocity, and speed. A dotted-line box between the networks denotes a canonical correlation analysis (CCA) between the left-canonical variables (Z DCV ) and the right-canonical variables (X DCV ). (C) The block diagram depicts a simulation paradigm for a comparative study of decoding. (D) Prediction errors for the state dimensionalities (q) of each dataset. The filled circle denotes proper dimensionality corresponding to the minimum prediction error for each dimensionality reduction method . Each color code denotes the dimensionality reduction method.
variables through stacked non-linear transformation layers, as shown in Figure 1B (Andrew et al., 2013). The non-linear mappings f l z (z) and f l x (x) are defined as l denotes a matrix of weights at the l-th layer, u l is a vector of biases at the l-th layer, and σ(·) is a non-linear function. A parameter set θ, which includes W and b, is estimated by maximizing correlations between functional outputs as follows: To seek θ * Z and θ * X , the backpropagation algorithm is used to optimize parameters W and b based on the gradient of ρ(·). The parameters in each layer are initialized in advance through a pretraining process using a denoising autoencoder (Vincent et al., 2008). The deep neural canonical variables can be computed aŝ o Z = f Z (z, θ Z ), and the deep kinematic canonical variables can be computed asô X = f X (x, θ X ). In that case, we call the deep neural (ô Z ) and kinematic (ô X ) canonical variables Z DCV and X DCV , respectively.

Neural Representation Extraction via Unsupervised Learning Methods
For the purpose of comparison, we also extracted lowdimensional representations of neural population firing activity using several methods, including PCA, FA, and LDS, which

Principal Component Analysis
We applied principal component analysis (PCA) to the FR data of all neuronal units. A Gaussian kernel smoothing process was used as preprocessing for FRs before applying PCA to avoid a case where neurons with highly fluctuating firing rates influenced decoding ). Then, we extracted principal components (PCs) of FR using PCA from the training data. Note that PCA was performed on a single trial basis rather than trial-averaged data in order to focus on covariance between neuronal units in single trials. To determine the number of PCs that would be included in the set of neural representations, we followed the procedure proposed by Yu et al. (2009). Briefly, using the eigenvectors obtained from the training set, we extracted all PCs (i.e., as many as neuronal units) for the testing set, which were then sorted according to the magnitude of corresponding eigenvalues in a descending way. Afterward, we selected the first 5 PCs and reconstructed FRs from them. The mean absolute error between true FRs and reconstructed FRs was calculated. We kept adding the next 5 PCs, reconstructing FRs and calculating error in the same way as above. As a result, the minimum reconstruction error was achieved with the first 5 PCs for both datasets of CRT and SRT (Figure 1D), which constituted neural representations by PCA, denoted as Z PCA . Note that the smoothing process was applied again to the final set of PCs before decoding.

Factor Analysis
A factor analysis (FA) allows us to find low-dimensional latent factors to elucidate shared variability among the population activities ). Again, we performed the smoothing to FRs before applying FA. To estimate latent factors from FRs, we adopted the FA method proposed by Yu et al., which adjusted FA for neural data Kao et al., 2015). Then, in a similar way to PCA, we determined the number of factors included in a set of neural representations using the reconstruction error of the testing set. We found the minimum error with 20 factors for both CRT and SRT datasets, which were further used as the set of neural representations by FA, denoted as Z FA .

Linear Dynamical System for M1 States
Observed neuronal population activity can be interpreted as a noisy observation of low-dimensional and dynamical neural states Kao et al., 2015). Using the LDS-based neural state estimation approach proposed by Kao et al. (2015), we estimated dynamic neural latent states from the population activity. We determined the dimensionality of neural states using the procedure above based on reconstruction error. We set the dimensionality to 20 for both CRT and SRT datasets, with which the minimum reconstruction error was achieved.

Neural Representation Analysis
We first examined Pearson correlations between canonical variables; ρ(û Z ,û X ) or ρ(ô Z ,ô Z ). Both canonical variables of neural FRs (û Z orô Z ) are supposed to adequately represent kinematic information because they are highly correlated with the canonical variables of kinematic parameters (û X orô X ), provided that the linear or non-linear mappings of LCCA or DCCA are appropriately built. To validate this assumption, we performed a tuning analysis of not only the neural canonical variables but also other neural representations using a linear regression model, in which the tuning quality of each neural representation with respect to velocity and speed was analyzed.
The temporal linear regression model of a single neural representation (z) to the kinematic parameters (x) was given as z(t) = β 0 + βx(t) + (t) where β 0 and β denote coefficients and (t) is the error term at time t (Schwartz andMoran, 1999, 2000;Paninski et al., 2003;Rasmussen et al., 2017). The tuning quality of a neural representation was assessed by the goodness-of-fit (r 2 ) of the tuning model. In addition to this, we also computed the decoding performance using each neural representation in the training data with a linear Kalman filter. The decoding performance was measured by the mean absolute error between actual and decoded velocity from the training data. Finally, we assessed the predictive performance of each of the five neural representations above during training using both the goodness-of-fit of the tuning model and the training error.

Kalman Filter
A linear Kalman filter (LKF) is one of the popular generative decoding methods based on the linear dynamical system (Wu et al., 2006). LKF follows a first-order Markov rule, such that a state vector (velocity and speed) x t at time t evolves from x t −1 at time t−1. In this study, the state vector corresponds to the kinematic parameter vector. The system model, which describes the state transition, and the observation model, which describes the generation of observation o t from x t , are given by where A denotes the system model parameter matrix, H is the observation model parameter matrix, and Q and V are the process and observation noise following a Gaussian distribution, respectively. The neural observation vector o t can be either the Z E-FR vector (z t ) or the vector of the other neural representations.
To predict x t , we initialized x 0 = 0 at the beginning of every trial  after converging the Kalman gain to its steady state in advance (Dethier et al., 2013).

Long Short-Term Memory in Recurrent Neural Networks
An LSTM-RNN based on an RNN architecture has been well suited in predicting kinematics from neuronal activities (Ahmadi et al., 2019). The components of LSTM-RNN are enumerated as follows: c is a memory cell, f is a forget gate, and i and o are input and output gates, which correspond to R l , where l denotes the number of hidden units. LSTM-RNN operates by regulating the information flow with these gates via the cell. Given W as a matrix of weights with respect to the recurrent connection or input/output, h as a vector of the hidden layer, and b as a vector of biases, each gate can be calculated by where the input vector y is either the Z E-FR vector (z t ) or the vector of the other neural representations at time t and σ sigmoid (·) denotes the sigmoidal activation function. The subscripts indicate the corresponding gates and their recurrent connection. The information flow of the cell memory can be updated by c where σ tanh (·) denotes the hyperbolic tangent function and ⊗ denotes the element-wise product. To train LSTM-RNN, we utilized the Adam optimizer built-in MATLAB deep learning toolbox. The hyperparameters of LSTM-RNN were optimized by the Bayesian optimizer in the same way as DCCA. The Bayesian optimizer performed an objective function evaluation 500 times. In our analysis, we set the gradient decay factor as 0.95 and the squared gradient decay factor as 0.99. Then, the training batches were shuffled at every epoch for the training efficiency. Table 2 shows the optimized hyperparameters for LSTM-RNN.

Decoding Performance Evaluation
To evaluate the effects of CCA on decoding, we composed three representations of neuronal activities: Z E-FR , Z PCA , Z FA , Z LDS , Z LCV , and Z DCV (see Figure 1C). In this study, we performed decoding to predict the hand velocity X VEL from each neural representation using LKF and LSTM-RNN. For the evaluation of the decoding performance, we measured the decoding error by the Euclidean distance between the actual and predicted kinematic parameters. The decoding error was measured for the hand velocity v and hand position p, which were reconstructed from the cumulated velocity for each trial. The decoding error of the i-th trial was calculated as where e(t) is an absolute instantaneous error, e (t) = |v (t) − v (t) | or e (t) = |p (t) −p (t) | at time t, and n is the number of samples in the i-th trial. To compare the decoding performance between the neural representations (Z E-FR , Z PCA , Z FA , Z LDS , Z LCV , and Z DCV ), we applied the Friedman test to evaluate the effects of decoder inputs in accordance with the types of decoders (LKF and LSTM-RNN). For the Friedman test, the dependent variables consist of a decoding error, and the factors include the decoder input and decoder type. We also performed a post hoc statistical analysis using the Bonferroni correction (p < 0.05).

RESULTS
First, we investigated correlations between the neural and kinematic canonical variables. Figure 2 depicts correlations between the canonical variables, each extracted from firing rates (Z) and kinematics (X), respectively. The canonical variables were obtained from the testing set either by using LCCA or DCCA. Correlations were calculated between the corresponding pairs of neural and kinematic canonical variables, where a total of three pairs were determined by the number of kinematic parameters. DCCA resulted in higher correlations than LCCA for every dataset: the correlation coefficients for the dataset CRT ranged from 0.93 to 0.95 using DCCA and from 0.84 to 0.90 using LCCA (p < 0.01, Wilcoxon rank sum test), and those for the dataset SRT ranged from 0.81 to 0.89 using DCCA and from 0.71 to 0.86 using LCCA (p < 0.01, Wilcoxon rank sum test). Next, we examined the tuning of neuronal FRs and neural representations (Z E-FR , Z PCA , Z FA , Z LDS , Z LCV , and Z DCV ) concerning kinematic parameters (X VEL ) using the testing set. The quality of tuning was measured by the r 2 of the linear regression model with X VEL as the regressors (see Section "Neural Representation Analysis"). Figure 3 shows the examples of the actual values of Z E-FR , Z LCV , and Z DCV and the estimated values by the linear velocity tuning model. For Z E-FR , we selected the neuron whose FRs was most accurately estimated by the model (i.e., the highest value of r 2 ). Among the neural representations analyzed here, the linear model tracked the variation of Z DCV most accurately yielding the highest r 2 (Friedman test, Bonferroni correction, p < 0.05). Notably, the linear model can estimate even timeinvariant parts of Z DCV (see the bottom row of Figure 3), which often spanned over multiple trials, even though X VEL varied during these periods. Figures 4A,B depict the distributions of r 2 for Z E-FR , Z LCV , and Z DCV in the datasets CRT and SRT, respectively. The mean values of r 2 for Z DCV (0.93 in the dataset CRT and 0.74 in the dataset SRT) and Z LCV (0.85 in the dataset CRT and 0.67 in the dataset SRT) were considerably higher than those for Z PCA (0.40 in the dataset CRT and 0.30 in the dataset SRT), Z FA (0.17 in the dataset CRT and 0.13 in the dataset SRT), and Z LDS (0.14 in the dataset CRT and 0.17 in the dataset SRT) (Friedman test, multiple comparison with Bonferroni correction, p < 0.05). Moreover, the neural canonical variables found by DCCA (Z DCV ) was more tuned to velocity than those by LCCA (Z LCV ) (Wilcoxon rank sum test: p = 0.02 in the dataset CRT, p = 0.04 in the dataset SRT). Moreover, the neural canonical variables found by DCCA (Z DCV ) was more tuned to velocity than those by LCCA (Z LCV ). Figures 4C,D depict the topographical maps of the neural canonical variables showing high r 2 in the 2D velocity space. Although Z LCV and Z DCV were created to maximize correlations with the canonical variables of kinematics, not kinematics per se, they showed tuning with the actual velocity.
We then examined both the training error and average r 2 of each neural representation, as shown in Figure 5. It reveals that Z DCV yielded not only the highest r 2 but also the lowest training error (0.09 in the dataset CRT, 0.12 in the dataset SRT). Although we also observed relatively low training error using neural representations of FA and LDS, the average r 2 of them were not high compared to those of CCA.
The decoding performance was evaluated for each combination of neural representations and decoders (see  See Tables 3, 4). The standard deviations (STDs) of the actual velocity and position in the dataset CRT are X = 0.24 and Y = 0.26 for velocity and X = 1.82, and Y = 1.76 for position, and those in the dataset SRT are X = 0.21 and Y = 0.20 for velocity and X = 1.66 and Y = 1.52 for position. For LKF, the decoding error is less than the STDs of the X-and Y-axes of the actual velocity by 5.7 and 4.2% on overage, respectively. Moreover, the decoding error is less than the STDs of the actual position by 72.1 and 69.1%. For LSTM-RNN, the decoding error is less than the STDs of the actual velocity by 5.7 and 4.6%, and the decoding error is less than those of the actual position by 72.3 and 70.0%.  Figure 8A depicts the comparison of the decoding error for the hand velocity across different neural representations and decoders. For the dataset CRT, the one-way Friedman test revealed the main effect of neural representation (Z FR , Z PCA , Z FA , Z LDS , Z LCV , and Z DCV ) on the decoding error when using LKF (χ 2 = 166.6, p < 0.01) or when using LSTM-RNN (χ 2 = 128.1, p < 0.01). When using LKF, a post hoc multiple comparison test with Bonferroni correction showed lower decoding error with Z DCV than other neural representations (ps < 0.01) except for Z LDS (p = 0.25). When using LSTM-RNN, it also showed lower decoding error with Z DCV than other neural representations (ps < 0.01). For the dataset SRT, the Friedman  test revealed the main effect of neural representation on the decoding error when using LKF (χ 2 = 75.8, p < 0.01) or when using LSTM-RNN (χ 2 = 25.7, p < 0.01). When using LKF, the post hoc test showed lower decoding error with Z DCV than other neural representations (ps < 0.01) except for Z LDS (p ∼ = 1). When using LSTM-RNN, it showed lower decoding error with Z DCV than Z E-FR (p < 0.01) only, while showing no difference between Z DCV and other representations (ps > 0.05). Figure 8B depicts the comparison of the error between true and reconstructed hand positions. For the dataset CRT, the Friedman test revealed the main effect of neural representation on the position error when using LKF (χ 2 = 71.9, p < 0.01) or when using LSTM-RNN (χ 2 = 80.7, p < 0.01). When using LKF, the post hoc test showed lower error with Z DCV than neural representations (ps < 0.01). When using LSTM-RNN, it showed lower error with Z DCV than other neural representations except for Z LCV (p = 0.53). For the dataset SRT, the Friedman test revealed the main effect of the neural representation on the position error when using LKF (χ 2 = 33.1, p < 0.01) or when using LSTM-RNN (χ 2 = 13.6, p < 0.01). When using LKF, the post hoc test showed lower error with Z DCV than other neural representations except for Z LDS (p = 0.06). When using LSTM-RNN, it showed lower error with Z DCV than Z LDS (p < 0.01), whereas there was no difference between Z DCV and others. Moreover, we evaluated the possible interaction effects of neural representations and decoder types using a two-way Friedman test (Figure 9). For the dataset CRT, the two-way Friedman test revealed the main effects of decoder [χ 2 (1) = 116.9, p < 0.01] and neural representation [χ 2 (2) = 261.9, p < 0.01] on the velocity decoding error ( Figure 9A). The post hoc test with Bonferroni correction showed lower error using LSTM-RNN than using LKF for all neural representations (p < 0.01). For all decoders, the decoding error of velocity with Z DCV was smaller than any other neural representations (ps < 0.01). For the dataset SRT, the two-way Friedman test revealed the main effect of decoder [χ 2 (1) = 175.4, p < 0.01] and neural representation [χ 2 (2) = 59.0, p < 0.01]. The post hoc test showed lower error using LSTM-RNN than using LKF (p < 0.01). For all decoders, the decoding error of velocity with Z DCV was smaller than Z E-FR and Z PCA (ps < 0.01). Figure 9B depicts the same two-way statistical analysis on the error between true and reconstructed hand positions. For the dataset CRT, the two-way Friedman test revealed the main effects of decoder [χ 2 (1) = 4.4, p < 0.05] and neural representation [χ 2 (2) = 143.1, p < 0.01] on the position error. The post hoc test showed no difference between decoders (p = 0.3). For all decoders, the position error with Z DCV was smaller than any other neural representations (ps < 0.01). For the dataset SRT, it showed the main effects of decoder [χ 2 (1) = 14.3, p < 0.01] and neural representation [χ 2 (2) = 28.2, p < 0.01]. The post hoc test showed no difference between decoders (p = 0.1). For all decoders, the position error with Z DCV was smaller than any other neural representations (ps < 0.05).

DISCUSSION
In this study, we proposed a method to identify low-dimensional representations of M1 neuronal FR activities using canonical correlation analyses. Furthermore, we applied those canonical variables to the decoding models to predict the arm movements of non-human primates and compared the effect of the neural representations in terms of decoding performance. As expected, we confirmed that the canonical variables found by DCCA were well tuned to the hand velocity. Decoding arm movement information using canonical variables estimated by DCCA resulted in a superior performance to either cases using LCCA-estimated canonical variables or using the other neural representations regardless of the decoder type, i.e., LKF or LSTM-RNN. In particular, the performance of LKF was significantly more improved using DCCA than decoding FRs using LSTM-RNN. These findings suggest that we can design a simple linear decoder (LKF) with DCCA while achieving performance as good as using relatively complex DNNs.
The improvement of decoding M1 activities using LCCA or DCCA may be partly because canonical variables found by them showed superior tuning to velocity over the other neural representations, including individual neuronal FRs (Figure 3). Therefore, the LKF, drawing heavily on the quality of observation models, can benefit from the extracted canonical variables even when LCCA greatly reduced the number of neural variables. Particularly, DCCA-estimated neural canonical variables showed better tuning indices (r 2 ) than LCCA-estimated neural canonical variables, which subsequently led to a better decoding performance of DCCA. Meanwhile, training error that directly reflects the learning quality of the decoding model revealed superior over the other neural representations along with r 2 . This finding indicates that non-linear projections may be more suitable to extract joint canonical variables between highdimensional neural activities and low-dimensional kinematic parameters. However, DCCA cannot provide direct links between canonical variables and individual neurons, which LCCA can do.
Besides better characteristics of the canonical variables, there could be another reason why DCCA improved decoding using LKF while the other neural representations did not. PCA is known to have difficulty distinguishing between changes in the underlying neural state, which becomes limitations to decoding kinematic information from noisy firing activity . Although FA also is a useful frame to extract independent and shared variability across neurons, it follows the assumption that the noise variance of an individual neuron is fixed over time . Above all, since these approaches (including LDS) aim to extract the latent states of population activity without kinematic information, it is difficult to extract elaborate components related to complex movement. This could be a reasonable reason why DCCA yielded a better performance on decoding models than the neural representations via unsupervised learning methods.
As for decoding methods, a DNN, represented by LSTM-RNN here, efficiently decoded neuronal population firing patterns because it can effectively process neuronal temporal dynamics through memory cells in a relatively concise network architecture. Furthermore, a state-space model, such as LKF, shows an advantage of representing temporal dynamics of kinematics in its system model, but its first-order linear system model may not be sufficient to elucidate the kinematic dynamics of arm movements. In addition, a direct decoding model, such as LSTM-RNN, can be free from any statistical assumption on data, which is often necessary in a generative model, such as LKF. Our results showing the superior performance of LSTM-RNN over LKF are in line with those of previous reports (e.g., Ahmadi et al., 2019).
In addition to direct mapping to velocity through the decoders, a more straightforward linear mapping could be taken into account; for example, we can simply reconstruct velocity from the canonical kinematic representations (X LCV or X DCV ), which were estimated from the corresponding neural representations (Z LCV or Z DCV ). To test whether how this simple mapping worked, we attempted to reconstruct velocity only through LCCA and DCCA without explicit decoders as follows. First, we estimated X LCV (or X DCV ) from Z LCV (or Z DCV ) by linear regression such as where X k and Z k represent the k-th canonical variable, respectively, e represents residual error and α 0 and α 1 are canonical coefficients. Second, we reconstructed velocity from the estimated X LCV (or X DCV ) during testing. For LCCA, the reconstruction of velocity was straightforward simply by inverting linear mapping between X LCV and velocity. For DCCA, velocity was reconstructed by the inverse of activation function (here, a logit function) and the linear model between the layers, which was expressed as: where β l,l−1 represents coefficients between the outputs of layer l and l−1, and W l is a matrix of the weight in l-th layer. We observed that the reconstructed velocity with this procedure exhibited lower performance than directly decoding Z (Z LCV or Z DCV ) into velocity using LKF by 9.9% on average (11.4% for LCCA, and 8.3% for DCCA). Apparently, this analysis verified that direct reconstruction of velocity through mappings built by CCA was poorer than those from the proposed decoding methods to predict velocity from neural representations using LKF or LSTM-RNN. We can expect that the dimensionality of neuronal populations will increase further as the neurotechnology of large-scale neuronal recordings advances in the near future. Such a development will raise an issue of how efficiently we should design a decoder for intracortical BMIs. Our results suggest that DCCA, along with other dimensionality reduction techniques, can provide advantages to construct a compact but informative feature space for effective decoding. Unlike unsupervised dimensionality reduction techniques without kinematic information, DCCA can find a low-dimensional space to maximize correlations with target kinematic parameters, increasing a chance to improve predicting kinematic parameters such as velocity from neural activities. It has been well known that decoding velocity information of a prosthetic device from neural activity can be useful for BMIs in clinical environments (Kim et al., 2008;Wodlinger et al., 2015). Therefore, we suggest that our proposal method can be preferred if one considers the efficiency and performance of BMIs.
Although this study shows the feasibility of the improvement of decoding for BMIs using the proposed method, we have not validated it in an online BMI control paradigm, which should be conducted in future work. When applying the current decoding method to online BMIs in humans with tetraplegia, where the kinematic information of limbs is not available, we should consider how to extract kinematics of a target prosthetic device. To address this issue, many previous human BMI studies employed a training paradigm in which participants imagined limb movements following the instructed motion of an object shown on the screen. Then, a decoding algorithm could be built by associating M1 activities elicited during movement imagination with the kinematics of the object (Hochberg et al., 2006;Kim et al., 2008;Aflalo et al., 2015;Jarosiewicz et al., 2015;Wodlinger et al., 2015). Although there could exist a substantial gap between the true kinematics and the output of the decoding algorithm initially built in this way, the BMI performance could be further increased by repeatedly updating the same decoding algorithm through "closed-loop" training. Importantly, most decoding algorithms used in human BMIs have been initially developed in the preliminary non-human primate studies. Therefore, we believe that our decoding algorithm based on deep CCA in non-human primates can benefit human BMIs in a similar fashion.

AUTHOR CONTRIBUTIONS
M-KK and S-PK conceived the research. M-KK conducted the analysis and wrote the manuscript. J-WS and S-PK provided edited the manuscript. All authors contributed to the article and approved the submitted version.