Online adaptive group-wise sparse Penalized Recursive Exponentially Weighted N-way Partial Least Square for epidural intracranial BCI

Moly, Alexandre; Aksenov, Alexandre; Martel, Félix; Aksenova, Tetiana

doi:10.3389/fnhum.2023.1075666

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 06 March 2023

Sec. Brain-Computer Interfaces

Volume 17 - 2023 | https://doi.org/10.3389/fnhum.2023.1075666

This article is part of the Research TopicSensorimotor Decoding: Characterization and Modeling for Rehabilitation and Assistive TechnologiesView all 7 articles

Online adaptive group-wise sparse Penalized Recursive Exponentially Weighted N-way Partial Least Square for epidural intracranial BCI

¹Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
²Independent Researcher, Montpellier, France

Introduction: Motor Brain–Computer Interfaces (BCIs) create new communication pathways between the brain and external effectors for patients with severe motor impairments. Control of complex effectors such as robotic arms or exoskeletons is generally based on the real-time decoding of high-resolution neural signals. However, high-dimensional and noisy brain signals pose challenges, such as limitations in the generalization ability of the decoding model and increased computational demands.

Methods: The use of sparse decoders may offer a way to address these challenges. A sparsity-promoting penalization is a common approach to obtaining a sparse solution. BCI features are naturally structured and grouped according to spatial (electrodes), frequency, and temporal dimensions. Applying group-wise sparsity, where the coefficients of a group are set to zero simultaneously, has the potential to decrease computational time and memory usage, as well as simplify data transfer. Additionally, online closed-loop decoder adaptation (CLDA) is known to be an efficient procedure for BCI decoder training, taking into account neuronal feedback. In this study, we propose a new algorithm for online closed-loop training of group-wise sparse multilinear decoders using L_p-Penalized Recursive Exponentially Weighted N-way Partial Least Square (PREW-NPLS). Three types of sparsity-promoting penalization were explored using L_p with p = 0., 0.5, and 1.

Results: The algorithms were tested offline in a pseudo-online manner for features grouped by spatial dimension. A comparison study was conducted using an epidural ECoG dataset recorded from a tetraplegic individual during long-term BCI experiments for controlling a virtual avatar (left/right-hand 3D translation). Novel algorithms showed comparable or better decoding performance than conventional REW-NPLS, which was achieved with sparse models. The proposed algorithms are compatible with real-time CLDA.

Discussion: The proposed algorithm demonstrated good performance while drastically reducing the computational load and the memory consumption. However, the current study is limited to offline computation on data recorded with a single patient, with penalization restricted to the spatial domain only.

1. Introduction

Brain–computer interfaces (BCIs) are systems that create a new communication pathway between the brain and an effector without neuromuscular activation. Motor BCIs aim to allow users with severe motor impairments to regain limb mobility by controlling orthoses and prostheses or by recovering motor control over their own limbs, e.g., using electrical stimulation (Mak and Wolpaw, 2009). Most BCI systems include a neural signal acquisition system, a transducer, an effector, and a feedback system (Schwartz et al., 2006). The transducer is typically composed of a neural feature extraction block and a decoder, with optional pre- and post-processing blocks. Decoders are generally user-specific and data-driven. They are created through a supervised tuning of parameters on the training dataset. Once the decoder is established, the transducer can be applied in real time to translate the user's intention into the control command of the effector.

As BCI systems function in real time, computing time and resource management are crucial aspects of such systems (Haufe et al., 2014). High-resolution neuronal activity recording systems are generally required to achieve high-dimensional control of complex effectors. It results in a large volume of data that needs to be processed.

Furthermore, in motor BCI, a high decision rate (8–10 Hz) is necessary to control complex effectors such as robotic arms, an exoskeleton, and so on (Marathe and Taylor, 2015; Shanechi et al., 2017). In addition to high computing power requirements and computing time, the high dimensionality of the feature space presents challenges such as the “curse of dimensionality” during decoder training (Bellman, 1961; Bishop, 2006; Nicolas-Alonso and Gomez-Gil, 2012; Remeseiro and Bolon-Canedo, 2019). The feature space often contains irrelevant and/or redundant features. Moreover, computational load is critical for the potential development of portable BCIs.

Reducing the dimensionality of the neuronal feature space is one approach to addressing these issues. Dimension reduction algorithms have been widely employed in numerous studies on BCI.

Both projection and feature selection methods were applied for dimensional reduction, in both online BCI experiments and offline analysis.

Projections algorithms were often used (Kim et al., 2006; Marathe and Taylor, 2013; Haufe et al., 2014; Bundy et al., 2016; Hsu et al., 2016; Sannelli et al., 2016; Schaeffer and Aksenova, 2016; Eliseyev et al., 2017; Jiang et al., 2017; Seifzadeh et al., 2017; Sreenath and Ramana, 2017; Bousseta et al., 2018; Choi et al., 2018; Lotte et al., 2018; Palmer and Hirata, 2018; Khan et al., 2019; Jafarifarmand and Badamchizadeh, 2020). They project the feature space into a space of a lower dimension by using a linear or non-linear combination of the initial features. The principal and independent component analyses (PCA and ICA), spatio-spectral decomposition (SSD), common spatial pattern (CSP), or partial least squares (PLS) (Kim et al., 2006; Marathe and Taylor, 2013; Haufe et al., 2014; Bundy et al., 2016; Hsu et al., 2016; Sannelli et al., 2016; Schaeffer and Aksenova, 2016; Eliseyev et al., 2017; Jiang et al., 2017; Seifzadeh et al., 2017; Sreenath and Ramana, 2017; Bousseta et al., 2018; Choi et al., 2018; Lotte et al., 2018; Palmer and Hirata, 2018; Khan et al., 2019; Jafarifarmand and Badamchizadeh, 2020) algorithms and variants were applied in BCI research. However, such methods may not improve the computing time as they do not optimize the feature extraction step. The irrelevant and/or redundant features are still computed.

The feature selection family regroups filter-based, wrapper-based, and embedded techniques (Bolón-Canedo et al., 2013; Khaire and Dhanalakshmi, 2019). The filter-based methods rank and select features independently without considering the decoder. Effective in computation time, these methods tend to select highly correlated (redundant) features. The wrapper-based techniques incorporate supervised learning algorithms to evaluate the possible interactions between the features. These techniques add features to the selected subset iteratively and evaluate the subset by combining it with the trained decoder (Lotte et al., 2018). These methods are effective but require a great deal of computing time. On the other hand, embedded techniques integrate the feature selection process directly into the decoding algorithm, thus combining the advantages of both the filter-based and wrapper-based methods (Khaire and Dhanalakshmi, 2019). Embedded feature selection is a promising approach as it is directly performed during the model learning process.

Embedded feature selection in BCI is based on regularization/penalization methods (Cincotti et al., 2008; Lotte and Guan, 2011; Flamary and Rakotomamonjy, 2012; Eliseyev and Aksenova, 2016; Mishra et al., 2018; Nagel and Spüler, 2019). Regularization strategies incorporate a penalty term into the model parameter optimization process to restrict the degree of freedom of the model. Numerous regularization strategies were designed, such as L₀, L₁ (Lasso), L₂ (Ridge), elastic net regularization, and so on. The L₁ regularization process adds a penalty term equal to the sum of the absolute values of model coefficients. L₂ regularization integrates a penalty term equal to the sum of squared values, whereas elastic net regularization is a combination of both L₁ and L₂ penalizations (Bishop, 2006). L₀ (Sreeja and Himanshu, 2019), L₁ (Lotte and Guan, 2011; Eliseyev et al., 2012; Flamary and Rakotomamonjy, 2012; Zhang et al., 2013; López-Larraz et al., 2014), L₂ (Cincotti et al., 2008; Flamary and Rakotomamonjy, 2012; Seifzadeh et al., 2017; Nagel and Spüler, 2019) regularization, and an elastic net (Kim et al., 2018; Peterson et al., 2019) were applied in the BCI field to improve the neural signal decoding performance or other properties, e.g., prediction smoothness or sparsity. Less common regularization methods designed and/or applied in BCI are regularization using polynomial regression (Eliseyev and Aksenova, 2016), sparse regularization based on automatic relevance determination (ARD) (Toda et al., 2011; Nakanishi et al., 2017), and Kullback–Leibler regularization in the Riemannian mean (Mishra et al., 2018). Among regularization techniques, L_p regularization, 0 ≤ p ≤ 1, using a penalty equal to the L_p (0 ≤ p ≤ 1) norm/pseudo-norm of the model coefficients, is known to be effective in discarding irrelevant/correlated features, promoting a sparse solution (Bishop, 2006; Hastie et al., 2015).

Regularization/penalization is generally performed in a single-wise manner. Features are regularized independently and are not evaluated as belonging to a group of features. However, there are many applications, particularly in BCI, with structurally grouped input features. This approach allows the simultaneous setting of zero of the model coefficients within a group, which can be beneficial in cases where the input features are structurally grouped, such as excluding sensors (Eliseyev et al., 2017). For such applications, a single-wise sparsity-promoting penalty may be suboptimal. Group-wise regularization algorithms perform the feature selection process by grouping and applying penalization to the groups at once (Martínez-Montes et al., 2008; Eliseyev et al., 2012; Giordani and Rocci, 2013; Zhang et al., 2013; Hastie et al., 2015). This grouping can cluster features across various modalities, such as electrodes and frequency bands (van Gerven et al., 2009). Group-wise sparsity is ideal for naturally structured data, allowing for the elimination of variables (such as electrodes or frequency) from the signal processing workflow and reducing computational costs. Additionally, it may simplify the model's interpretation.

Despite being potentially beneficial for studies on BCI, group-wise penalization has rarely been applied in this field. The natural structure of BCI features space grouping features over modalities, such as electrodes, frequency bands, and time delay, which are heavily exploited (van Gerven et al., 2009; Eliseyev et al., 2012; Motrenko and Strijov, 2018; Wu et al., 2019). For the penalization of such groups, data may be viewed in the form of a tensor (Hastie et al., 2015; Eliseyev and Aksenova, 2016). Tensors, or multiway arrays, are higher-order generalizations of vectors and matrices (for more details, see Ref. Hsu et al., 2016). Tensors have several dimensions, also known as ways of analysis or modes. In BCIs, recorded signals are primarily analyzed in spatial, frequency, and temporal domains. To extract neural features, each epoch of neural signal recording is commonly mapped to temporal-frequency-spatial (or frequency-spatial) space (e.g., Chao et al., 2010; Schaeffer and Aksenova, 2016; Eliseyev et al., 2017; Choi et al., 2018). Tensor-based analyses, which present features in matrix form using tensor unfolding, are reported to be beneficial for BCI decoding as they preserve the natural structure of data (Zhao et al., 2013; Cichocki et al., 2015; Zhang et al., 2015; Eliseyev et al., 2017).

The tensors of data in BCI often benefit from tensor decomposition techniques. Slice-wise data representation can be obtained through sparsity-promoting penalization of tensor decomposition. Regularized PARAFAC and Tucker decomposition are algorithms designed for group-wise tensor penalization. Slice-wise sparsity-promoting penalization is added to the N-way Partial Least Square (NPLS) in van Gerven et al. (2009), Eliseyev et al. (2012), Motrenko and Strijov (2018), and Wu et al. (2019). These techniques have been used in a few offline BCI studies (Martínez-Montes et al., 2008; Eliseyev et al., 2012) and in other fields (Giordani and Rocci, 2013; Kim et al., 2013, 2014; Hervás et al., 2019). In Eliseyev et al. (2012), the L₁-Regularized N-PLS algorithm was shown to be superior to its non-penalized version by suppressing noisy/irrelevant electrodes. However, these algorithms were not adapted for online decoder training in closed-loop use.

Most of the presented feature-dimensional reduction algorithms were tested offline. Additionally, feature selection performed in an offline preliminary study (Brunner et al., 2006; Huang et al., 2009; Spüler et al., 2012; Marathe and Taylor, 2013; Bousseta et al., 2018; Cantillo-Negrete et al., 2018; Kim et al., 2018; Nagel and Spüler, 2019) may not be optimal when using a CLDA strategy (Schlögl et al., 2010; Clerc et al., 2016). CLDA involves training the decoder online on data acquired during closed-loop BCI control sessions. Decoders trained in this manner have been reported to outperform decoders trained offline using data from open-loop BCI experiments (Jarosiewicz et al., 2013). Adaptive/incremental learning algorithms are particularly suited for the CLDA strategy. These algorithms continuously update the model, using only the latest data block and relevant statistics on the older signals, while not retaining the whole signals in memory (Schlögl et al., 2010; Brandman et al., 2018; Lotte et al., 2018). Adaptive/incremental learning algorithms are beneficial for the CLDA training strategy as they allow higher decoder update rates and are compatible with long decoder learning periods, which are generally necessary for high-dimensional control. However, only a few adaptive dimensional reduction algorithms were proposed. The adaptive dimensional reduction algorithms applied in the BCI (Zhao et al., 2008; Ang et al., 2011; Song and Yoon, 2015; Woehrle et al., 2015; Hsu et al., 2016; Mobaien and Boostani, 2016; Sannelli et al., 2016; Chen and Fang, 2017; Lotte et al., 2018) and other (Dagher, 2010) fields are primarily based on projection strategies (adaptive CSP, PCA, ICA, and xDAWN algorithms) and were only tested offline. Similarly, a few adaptive feature selection algorithms were proposed. Filter methods were tested on BCI simulations using mutual information (Oliver et al., 2013) or during online BCI experiments based on the Fisher score (Faller et al., 2012). The wapper-based strategy was optimized using parallel computation for the online BCI classifier (Mend and Kullmann, 2012), whereas embedded methods using semi-supervised feature selection (Long et al., 2011) and a weighting feature algorithm (Andreu-Perez et al., 2018) have been designed and used during online BCI applications. An adaptive genetic algorithm was proposed for adaptive channel selection in Moro et al. (2017). However, these methods have been applied to simple online binary classification only.

In BCI studies, adaptive regularization algorithms were tested offline (Roijendijk et al., 2016; Mishra et al., 2018; Sharghian et al., 2019). Adaptive algorithms with an L₁ norm regularization strategy were reported in an adaptive logistic regression (Sheikhattar et al., 2015), a kernel least squares (Yang et al., 2019), and recursive least squares algorithms (Chen et al., 2012) in other fields. Only a few feature selection methods were integrated into adaptive algorithms for online incremental calibration during real-time BCI experiments and were generally restricted to binary classification and EEG-based experiments (Long et al., 2011; Faller et al., 2012; Mend and Kullmann, 2012; Moro et al., 2017; Andreu-Perez et al., 2018). Computational complexity and difficulty integrating dimensional reduction methods into real-time algorithms may explain the lack of proposed solutions.

In the study, an adaptive algorithm promoting group-wise model sparsity, L_p-penalized recursive exponentially weighted N-way Partial Least Square (PREW-NPLS), is proposed. L_p, p = 0, 0.5, 1 norm/pseudo-norm penalty is applied to feature groups corresponding to the slices of data tensor related to the mode of analysis (e.g., spatial, frequency, temporal). The algorithm was tested with data recorded during BCI sessions of left/right arm 3D translations of a virtual avatar by a tetraplegic patient during the clinical trial NCT02550522 (ClinicalTrials.gov) conducted at the Grenoble Alpes University Hospital (CHUGA) (Benabid et al., 2019; Moly et al., 2022). The datasets were recorded during online closed-loop experiments using the REW-NPLS decoder that was previously integrated into the BCI system. To reproduce online experimental conditions and to ensure that the proposed algorithm is compatible with real-time CLDA, a pseudo-online simulation was conducted using the same parameters (buffer size, batch training, and so on) and the same model application procedure as it was used in real time. The comparison study was restricted to penalization according to spatial modality, which is the most critical at BCIs due to data recording/transfer limitations. For each type of penalization, a set of models/penalization parameters were evaluated. The PREW-NPLS decoders highlighted equivalent or better decoding performance compared to the generic REW-NPLS algorithm for the majority of the penalization parameters. The sparsest solutions allowed the removal of up to 75% of the electrodes without decreasing performance. L₀-PREW-NPLS and L₁-PREW-NPLS are sufficiently computationally efficient for online closed-loop decoder adaptation at a high-frequency rate.

2. Methods

2.1. Generic partial least squares and family

Algorithms of the PLS family are widely used in BCI studies due to their stability in the case of high-dimensional data and in the presence of correlated and/or irrelevant variables. In motor BCI, the algorithms of the PLS family were applied in both continuous and discrete BCIs. Offline hand/fingers trajectory decoding (Chao et al., 2010; Chen et al., 2013; Eliseyev and Aksenova, 2014; Bundy et al., 2016; Schaeffer and Aksenova, 2016; Schaeffer, 2017; Choi et al., 2018), real-time hand translation/wrist rotation control (Benabid et al., 2019; Moly et al., 2022), error potential (ERP) detection (Rouanne et al., 2021) from ECoG, and EEG/MEG-based classification (Trejo et al., 2006; Eliseyev et al., 2017; Maleki et al., 2018) using the PLS algorithm were reported in preclinical and clinical studies. In Eliseyev and Aksenova (2014), GAM-PLS (generalized additive model—the partial least square) is reported to outperform the generic PLS in the presence of artifacts for 3D hand trajectory decoding from ECoG data in non-human primates (NHP). In Schaeffer (2017), PLS outperformed principle component regression and demonstrated comparable results with Lasso regression for hand trajectory and 1D finger trajectory decoding from ECoG in preclinical and clinical experiments. In Rouanne et al. (2021), NPLS demonstrated comparable results with Logistic Regression, SVM, MLP, and CNN for ERP detection from ECoG recordings in the sensory-motor cortex of tetraplegics, etc.

The generic PLS is a linear regression algorithm based on the iterative projection of input and output variables into the latent variable spaces of dimension f. The hyperparameter f is generally estimated through cross-validation in the preliminary study. Projectors are set to maximize the covariance between the input and latent output variables. The generic PLS is an offline algorithm. For online data stream modeling, recursive PLS (RPLS) and recursive exponentially weighted PLS (REW PLS) (Helland et al., 1992; Dayal and MacGregor, 1997; Qin, 1998) were developed. All the aforementioned PLS algorithms are vector-input-vector-output algorithms. N-way Partial Least Square (NPLS) is a generalization of the conventional PLS for tensor data (Bro, 1996, 1998). The NPLS algorithm projects the input and output tensors into a low-dimensional space of latent variables using a low-rank tensor decomposition. The recursive N-way PLS (RNPLS) (Helland et al., 1992) and recursive exponentially weighted N-way PLS (REW-NPLS) (Dayal and MacGregor, 1997) are generalizations of the adaptive RPLS and REW PLS algorithms to tensor variables and allow online tensor data stream learning of the regression model. RNPLS still requires fixing the hyperparameter f from the offline preliminary study, whereas REW-NPLS proposes a recursive validation procedure for the online optimization of the hyperparameter, enabling a fully adaptive algorithm (Dayal and MacGregor, 1997). In addition, the REW-NPLS algorithm is more computationally effective than the RPLS algorithm (Dayal and MacGregor, 1997).

The adaptive REW-NPLS algorithm has been tested offline in BCI studies for trajectory decoding from ECoG signals and for classification from MEG data, demonstrating similar or better results compared to other algorithms from the PLS family designed for offline use (Eliseyev et al., 2017). It was applied in real time for closed-loop adaptation of 3D hand translation/wrist rotation decoders in tetraplegics (Benabid et al., 2019; Moly et al., 2022). Finally, REW-NPLS was tested in the simulation of auto-adaptive continuous (bi-directional cursor control) and discrete multiclass motor imagery (MI) BCI in a tetraplegic patient (Rouanne et al., 2022). In the (Sliwowski et al., 2022) offline study, ANN algorithms were reported to outperform the REW-NPLS decoder. However, these algorithms cannot be applied in real time under CLDA.

Fully adaptive REW-NPLS is compatible with CLDA. However, it may be further improved by integrating real-time adaptive dimension reduction and promoting group-wise decoder sparsity using regularization. Group-wise sparsity, e.g., in the spatial dimension, may allow the elimination of irrelevant or highly correlated electrodes, decreasing computational time and memory consumption at the BCI use stage. This may be critical for portable BCI systems. A sparse solution may be advantageous for small training data sets, preventing overtraining. As BCI decoders require regular updates due to neuronal signal non-stationarity, reducing the decoder training time is desirable for real-life scenarios.

2.2. REW-NPLS

NPLS (Bro, 1996, 1998) estimates a linear relationship between a tensor of independent (input) and a tensor of dependent (output) variables. Given X_t $\in ℝ^{I_{1} \times \dots \times I_{m}}$ and Y_t $\in ℝ^{J_{1} \times \dots \times J_{n}}$ the m and n order tensors of the input and output variables at time t, Y_t = Beta_t+bias+D_t, where Beta and bias are the tensors of parameters and their associated bias, ${\underline{D}}_{t} \in ℝ^{J_{1} \times \dots \times J_{n}}$ is the tensor of noise. The parameters are estimated from the training dataset ${\underline{X}, \underline{Y}}$ , $\underline{X} \in ℝ^{L \times I_{1} \times \dots \times I_{m}}$ , $\underline{Y} \in ℝ^{L \times J_{1} \times \dots \times J_{n}}$ , L is the training dataset size. NPLS constructs the linear regression iteratively by projecting tensors of observation $\underline{X}$ and $\underline{Y}$ to the space of latent variables using tensor decomposition: $\underline{X} = \sum_{f_{i} = 1}^{f} r_{f_{i}} \circ w_{f_{i}}^{1} \circ \dots \circ w_{f_{i}}^{m} + {\underline{E}}_{x}$ , $\underline{Y} = \sum_{f_{i} = 1}^{f} u_{f_{i}} \circ q_{f_{i}}^{1} \circ \dots \circ q_{f_{i}}^{n} + {\underline{E}}_{y}$ . Here, “∘” is the outer product operator (Eliseyev et al., 2017), $R_{f} = [r_{1}, \dots, r_{f}] \in ℝ^{L \times f}$ and $U_{f} = [u_{1}, \dots, u_{f}] \in ℝ^{L \times f}$ are matrices of the latent variables, $W^{f_{i}} = {w_{f_{i}}^{i}}_{i = 1}^{m}$ , $w_{f_{i}}^{i} \in ℝ^{I_{i}}$ and $Q^{f_{i}} = {q_{f_{i}}^{i}}_{i = 1}^{n}$ , $q_{f_{i}}^{i} \in ℝ^{J_{i}}$ , f_i = 1, …, f are the projection vectors constructed at iteration f_i, and E_x, E_y are the residual tensors. The set of projectors is constructed iteratively, increasing latent variable space dimensions.

REW-NPLS (Eliseyev et al., 2017) update a set of F (F ∈ ℕ^* is the fixed upper bound latent space dimension) models ${{\underline{B e t a}}_{U I}^{f}, {\underline{b i a s}}_{U I}^{f}}_{f = 1}^{F}$ using current blocks of tensors of observation {X_UI, Y_UI} and previously computed models. Here, UI ∈ ℕ is the updated iteration number, $ {\underline{B e t a}}_{U I}^{f} \in ℝ^{(I_{1} \times \dots \times I_{M}) \times (J_{1} \times \dots \times J_{N})}$ , $ {\underline{b i a s}}_{U I}^{f} \in ℝ^{J_{1} \times \dots \times J_{N}}$ are current update of model coefficients, and ${\underline{X}}_{U I} \in ℝ^{Δ L \times I_{1} \times \dots \times I_{M}}$ , ${\underline{Y}}_{U I} \in ℝ^{Δ L \times J_{1} \times \dots \times J_{N}}$ are the current input and output tensors of observations. ΔL ∈ ^* is the number of samples recorded between the update blocks UI − 1 and UI.

The REW-NPLS algorithm is a generalization of the REW-PLS algorithm to tensor variables and belongs to the family of kernel PLS algorithms (Eliseyev et al., 2017). It evaluates a set of projectors and model coefficients from covariance tensors $\underline{X Y} = \underline{X} \times_{1} \underline{Y}$ , $\underline{X Y} \in ℝ^{(I_{1} \times \dots \times I_{m}) \times (J_{1} \times \dots \times J_{n})}$ , and $\underline{X X} = \underline{X} \times_{1} \underline{X}$ , $\underline{X X} \in ℝ^{(I_{1} \times \dots \times I_{m}) \times (I_{1} \times \dots \times I_{m})}$ . Here, “ × _k” is the k-mode tensor product (Eliseyev et al., 2017). First, a set of input variable projectors W are evaluated from the covariance tensor XY. The projectors are estimated using a rank one decomposition of the tensor $\underline{V} = \underline{X Y} \in ℝ^{I_{1} \times \dots \times I_{m}}$ in the case of single output. For higher dimensions, the eigenvector with the largest eigenvalue is computed from the covariance tensor XY to decrease the dimension of the tensor to decompose: e = eig(XY^TXY), $\underline{V} = r e s h a p e (X Y \cdot e), \underline{V} \in^{I_{1} \times \dots \times I_{m}}$ . Here, $X Y \in ℝ^{(I_{1} \cdot \dots \cdot I_{m}) \times (J_{1} \cdot \dots \cdot J_{n})}$ is the unfolded tensor XY, and e = eig() is an eigenvector with the largest eigenvalue. Output projectors and the model parameters Beta and bias are computed using W and covariance tensors XX and XY. The projectors sets and model parameters are evaluated sequentially, increasing the latent variables' space dimension in the internal REW-NPLS iterations f.

Finally, at the UIth update, the covariance tensors XX_UI and XY_UI are computed from the previous XX_UI−1 and XY_UI−1 tensors, and the current block of observations {X_UI, Y_UI}:

{\underline{X X}}_{U I} = μ_{1} {\underline{X X}}_{U I - 1} + {\underline{X}}_{U I} \times_{1} {\underline{X}}_{U I},

{\underline{X Y}}_{U I} = μ_{1} {\underline{X Y}}_{U I - 1} + {\underline{X}}_{U I} \times_{1} {\underline{Y}}_{U I}

μ₁ is a forgetting factor. A set of projectors is evaluated using a rank-one tensor decomposition (Eliseyev et al., 2017) of the current tensor V_UI.

In the REW-NPLS algorithm, only the covariance tensors, the normalization coefficients, and the current model are stored together with the current block of observations collected since the previous update.

2.3. PARAFAC procedure in REW-NPLS

Several tensor decomposition strategies were designed: the Parallel factor analysis (PARAFAC), Tucker, multilinear SVD decomposition, and so forth (Cichocki et al., 2015). Similar to generic NPLS, the REW-NPLS algorithm employs PARAFAC tensor decomposition (Eliseyev et al., 2017). PARAFAC or CANDECOMP/PARAFAC (CP), also known as polyadic decomposition (PD), can be considered the generalization of principal component analysis (PCA) and singular value decomposition (SVD) to the tensor case (Sheikhattar et al., 2015; Sharghian et al., 2019). This method represents a M-order tensor $\underline{V} \in ℝ^{I_{1} \times \dots \times I_{m}}$ as the linear combination of vectors' outer products (rank-one tensors) such as follows:

\underline{V} = \sum_{r = 1}^{R} ρ_{r} w_{r}^{1} \circ w_{r}^{2} \circ \dots \circ w_{r}^{m} + \underline{E},

‖ w_{r}^{i} ‖ = 1, ​​ ​​ r = 1, \dots, R; ​​ ​​ i = 1, \dots, m .

Here, 1 ≤ i ≤ m corresponds to the i^th mode/dimension of the tensor variable, “∘” is the (vector) outer product of the decomposition factors (projectors) $w_{r}^{i} \in ℝ^{I_{i}}$ , R ∈ ℕ is the number of rank-one tensors used for decomposition, ρ_r is the weight associated with each rank-one tensor of the decomposition and $\underline{E} \in ℝ^{I_{1} \times \dots \times I_{m}}$ is the residual tensor (Kolda and Bader, 2009). PARAFAC evaluates the projectors, minimizing the residuals.

Similar to generic NPLS, only one step of PARAFAC (R = 1) is applied to the current tensor decomposition at each internal iteration f_i = 1, …, f of REW-NPLS:

\underline{V} = ρ_{r} w^{1} \circ w^{2} \circ \dots \circ w^{m} + \underline{E},

‖ w^{1} ‖ = 1, ​​ ​​ i = 1, \dots, m .

To solve the optimization problem, the alternating least squares (ALS) algorithm is employed in REW-NPLS. ALS optimizes one projector iteratively at a time and fixes others, reducing, at each iteration of ALS, the optimization problem to a least-squares linear regression (Kolda and Bader, 2009; Cichocki et al., 2015; Pereira Da Silva et al., 2015).

REW-NPLS includes several iterations inside other iterative procedures: ALS iterations for PARAFAC (R = 1) decomposition, internal REW-NPLS iterations increasing latent variables space dimension, and, finally, the update iterations UI.

2.4. L_p-Penalized REW-NPLS (PREW-NPLS)

Sparse input variables projectors may result in a sparse model. Variable excluded from all the projectors $W^{f_{i}}$ , f_i = 1, …, f is excluded from consideration. As for the tensor data, the projection is made according to the mode of analysis (e.g., special, frequency, or temporal). The sparsity of the projectors may allow excluding the slices of data from the model (e.g., exclude non-informative or redundant electrodes). To achieve sparse projectors, the proposed PREW-NPLS algorithm employed a penalized version of the PARAFAC (R = 1), introducing sparsity-promoting penalization. L_p, p = 0, 0.5, 1 penalization, being the classic lasso regularization (L₁) or less conventional L₀ and L_0.5 penalization is studied. This section describes L_p-Penalization in PARAFAC (R = 1) and its integration into the REW-NPLS algorithm to build a new sparsity-promoting adaptive PREW-NPLS algorithm.

2.4.1. L_p-penalized PARAFAC (R = 1)

To simplify the notations without losing generality, a case of a three-order tensor is considered in the section. Given a three-order tensor, one step of PARAFAC (R = 1) applied at each iteration of REW-NPLS solves the optimization problem:

\begin{array}{l} \min_{w^{1}, w^{2}, w^{3}} ‖ \underline{V} - ρ w^{1} \circ w^{2} \circ w^{3} ‖^{2}, & (1) \end{array}

‖ w^{1} ‖ = ‖ w^{2} ‖ = ‖ w^{3} ‖ = 1,

$\underline{V} \in ℝ^{I_{1} \times I_{2} \times I_{3}}$ and $w^{i} \in {ℝ^{*}}^{I_{i}}$ , i = 1, 2, 3. As the norms of the projectors are arbitrary values in (1), the decomposition vectors are evaluated, minimizing $\underline{V} - w^{1} \circ w^{2} \circ w^{3}^{^{2}}$ before being normalized. The optimization problem (1) is solved using the ALS procedure. At each step, ALS fixes two of the three vectors w¹, w², w³ reducing the problem to a linear least-squares optimization, e.g. w², w³ are fixes to approximate w¹ and then w² is evaluated fixing w¹, w³ etc. until convergence (Uschmajew, 2015):

\min_{w^{1}} {‖ {\underline{V}}_{(1)} - w^{1} {(w^{3} \otimes w^{2})}^{T} ‖}^{2},

\min_{w^{2}} {‖ {\underline{V}}_{(2)} - w^{2} {(w^{3} \otimes w^{1})}^{T} ‖}^{2},

\min_{w^{3}} {‖ {\underline{V}}_{(3)} - w^{3} {(w^{2} \otimes w^{1})}^{T} ‖}^{2} .

Here, ${\underline{V}}_{(i)} = (v_{1}^{1} \dots v_{1}^{I_{1}}) \in R^{I_{1} \times I_{2} I_{3}}$ is the tensor $\underline{V}$ unfolded according to i-the direction and ⊗ is the Kronecker product. Taking into account that ${(w^{2} \otimes w^{1})}^{T} \in R^{I_{1} I_{2}}, {(w^{3} \otimes w^{1})}^{T} \in R^{I_{1} I_{3}}$ and ${(w^{3} \otimes w^{2})}^{T} \in R^{I_{2} I_{3}}$ are vectors, the optimization tasks are separated into element-wise optimizations:

\begin{array}{l} \min_{w_{j}^{1}} {‖ v_{1}^{j} - w_{j}^{1} {(w^{3} \otimes w^{2})}^{T} ‖}^{2} j = 1, \dots, I_{1}, & (2) \end{array}

\begin{array}{l} \min_{w_{j}^{2}} {‖ v_{2}^{j} - w_{j}^{2} {(w^{3} \otimes w^{1})}^{T} ‖}^{2} j = 1, \dots, I_{2}, & (3) \end{array}

\begin{array}{l} \min_{w_{j}^{3}} {‖ v_{3}^{j} - w_{j}^{3} {(w^{2} \otimes w^{1})}^{T} ‖}^{2} j = 1, \dots, I_{3} . & (4) \end{array}

Here, $w^{1} = {(w_{1}^{1}, \dots, w_{I_{1}}^{1})}^{T} \in {ℝ^{*}}^{I_{1}}$ , $w^{2} = {(w_{1}^{2}, \dots, w_{I_{2}}^{2})}^{T} \in {ℝ^{*}}^{I_{2}}$ , and $w^{3} = {(w_{1}^{3}, \dots, w_{I_{3}}^{3})}^{T} \in {ℝ^{*}}^{I_{3}}$ . The least square (LS) solutions of (2)–(4) are:

\begin{array}{l} {(w_{j}^{1})}_{L S} = \frac{v_{1}^{j} (w^{3} \otimes w^{2})}{{‖ w^{3} \otimes w^{2} ‖}^{2}}, j = 1, \dots, I_{1}, & (5) \end{array}

\begin{array}{l} {(w_{j}^{2})}_{L S} = \frac{v_{2}^{j} (w^{3} \otimes w^{1})}{{‖ w^{3} \otimes w^{1} ‖}^{2}}, j = 1, \dots, I_{2}, & (6) \end{array}

\begin{array}{l} {(w_{j}^{3})}_{L S} = \frac{v_{3}^{j} (w^{2} \otimes w^{1})}{{‖ w^{2} \otimes w^{1} ‖}^{2}}, j = 1, \dots, I_{3} . & (7) \end{array}

In this study, sparse promoting penalization using L_p, p = 0, 0.5, 1 norm/pseudo norm is proposed to be integrated into the cost function of PARAFAC (R = 1) to provide a slice-wise sparsity to the solution. The optimization task (1) is replaced by:

\begin{array}{l} {‖ \underline{V} - \hat{\underline{V}} ‖}^{2} + P (w^{1}, w^{2}, w^{3}) \to m i n, & (8) \end{array}

\underline{\hat{V}} = ρ w^{1} \circ w^{2} \circ w^{3},

P (w^{1}, w^{2}, w^{3}) = λ_{1} {‖ w^{1} ‖}_{q, L_{1}} + λ_{2} {‖ w^{2} ‖}_{q, L_{2}} + λ_{3} {‖ w^{3} ‖}_{q, L_{3}},

‖ w^{1} ‖ = ‖ w^{2} ‖ = ‖ w^{3} ‖ = 1.

Here, ${‖ w^{i} ‖}_{p, ℒ_{i}}$ for p = 0 , 0.5, 1 and i = 1, 2, 3 is denoted as

{‖ w^{i} ‖}_{0, L_{i}} = \sum_{k \in L_{i}} (1 - δ_{0, w_{k}^{i}}),

{‖ w^{i} ‖}_{1, L_{i}} = \sum_{k \in L_{i}} | w_{k}^{i} |,

and

{‖ w^{i} ‖}_{\frac{1}{2}, L_{i}} = \sum_{k \in L_{i}} \sqrt{| w_{k}^{i} |} .

0 < λ_i ≤ 1 are regularization coefficients, the Kronecker delta $δ_{0, w_{k}^{i}} = 1$ if $w_{k}^{i} = 0$ , and $δ_{0, w_{k}^{i}} = 0$ otherwise. Notably, only a part of the indices defined by a set $ℒ_{i} \subset {1, 2, \dots, I_{i}}$ , i = 1, 2, 3 are penalized, resulting in selective penalization (Lutay and Khusainov, 2021). A set $ℒ_{i}$ may vary depending on the REW-NPLS iteration. Selective penalization is introduced for the integration of L_p. -Penalized PARAFAC (R = 1) into the REW-NPLS algorithm.

Similar sparse promoting penalization in PARAFAC (R = 1) was considered in Eliseyev et al. (2012). However, this study was limited to L₁-norm penalization. Integrated with the conventional non-adaptive NPLS algorithm, the optimization problem was solved using time consuming numerical optimization, which was then applied offline. In the current manuscript, a more general case of L_p, p = 0, 0.5, 1 norm/pseudo norm penalization is considered. In addition, selective penalization is applied for efficient integration of penalized PARAFAC into the REW-NPLS algorithm. Finally, an efficient optimization procedure compatible with real-time online applications is proposed.

The same ALS strategy is applied to complete the optimization task (8). ALS fixed all projectors at each step except one, leading to three successive optimization tasks. The least square (LS) sotion $w_{L S}^{i}$ of the non-regularized problem is used as an initial approximation. Notably, unlike in non-regularized optimization, due to the penalization terms, the norms of the projectors are not arbitrary values anymore. Therefore, the normalization of a current estimate is added to ALS optimization iterations:

\begin{array}{l} \min_{{\tilde{w}}^{1}} ({‖ {\underline{V}}_{(1)} - {\tilde{w}}^{1} {(w^{3} \otimes w^{2})}^{T} ‖}^{2} + λ_{1} {‖ {\tilde{w}}^{1} ‖}_{q, L_{1}}), w^{1} = {\tilde{w}}^{1} / ‖ {\tilde{w}}^{1} ‖, & (9) \end{array}

\begin{array}{l} min_{{\tilde{w}}^{2}} ({‖ {\underline{V}}_{(2)} - {\tilde{w}}^{2} {(w^{3} \otimes w^{1})}^{T} ‖}^{2} ​ + ​ λ_{2} {‖ {\tilde{w}}^{2} ‖}_{q, L_{2}}), ​ w^{2} ​ = ​ {\tilde{w}}^{2} / ‖ {\tilde{w}}^{2} ‖, & (10) \end{array}

\begin{array}{l} min_{{\tilde{w}}^{3}} ({‖ {\underline{V}}_{(3)} - {\tilde{w}}^{3} {(w^{2} \otimes w^{1})}^{T} ‖}^{2} ​ + ​ λ_{3} {‖ {\tilde{w}}^{3} ‖}_{q, L_{3}}), ​ w^{3} ​ = ​ {\tilde{w}}^{3} / ‖ {\tilde{w}}^{3} ‖ . & (11) \end{array}

It should be noted that all considered regularization functions are decomposed as a sum of element-wise functions. Consequently, similarly to (3)–(5) optimization tasks, (9)–(11) are split into element-wise optimizations:

\begin{array}{l} \underset{w_{j}^{1}}{min ​​ ​​} ({‖ v_{1}^{j} - w_{j}^{1} {(w^{3} \otimes w^{2})}^{T} ‖}^{2} + λ_{1} \times g_{p} (w_{j}^{1})), j = 1, ​​ \dots, I_{1}, & (12) \end{array}

\begin{array}{l} \underset{w_{j}^{2}}{min ​​ ​​} ({‖ v_{2}^{j} ​ - w_{j}^{2} {(w^{3} \otimes w^{1})}^{T} ‖}^{2} ​ + ​ λ_{2} \times g_{p} (w_{j}^{2}) ​), j = 1, ​ \dots, ​ I_{2}, & (13) \end{array}

\begin{array}{l} \underset{w_{j}^{3}}{min ​​ ​​} ({‖ v_{3}^{j} - w_{j}^{3} {(w^{2} \otimes w^{1})}^{T} ‖}^{2} ​ + λ_{3} \times g_{p} (w_{j}^{3})), j = 1, ​ \dots, ​ I_{3}, & (14) \end{array}

\begin{array}{l} g_{p} (w_{j}^{i}) = {\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 1 - δ_{0, w_{j}^{i}}, & if p = 0 and w_{j}^{i} \in ℒ_{i} \end{matrix} \\ \begin{matrix} | w_{j}^{i} |, & if p = 1 and w_{j}^{i} \in ℒ_{i} \end{matrix} \end{matrix} \\ \begin{matrix} \sqrt{| w_{j}^{i} |}, & if p = 1 / 2 and w_{j}^{i} \in ℒ_{i} \end{matrix} \end{matrix} \\ \begin{matrix} 0, & otherwise \end{matrix} \end{matrix} . & (15) \end{array}

The particular cases of L₀, L_0.5, L₁ penalization are given below. Details are presented in the Appendix.

L₀ -penalization. In the case of L₀-penalization, the penalization term reflects the number of non-zero coefficients. Considering one of the optimization iterations of ALS, e.g. (12), the solution is an element-wise hard thresholding of the least square solution ${(w_{j}^{1})}_{L S}, j = 1, \dots, I_{1}$ (see Appendix):

{(w_{j}^{1})}_{L_{0}} = {\begin{cases} 0 if j \in L_{1} and ​​ ​​ {(w_{j}^{1})}_{L S} \leq T h r e s h o l d_{L_{0}} \\ {(w_{j}^{1})}_{L S} otherwise \end{cases},

where

T h r e s h o l d_{L_{0}} = \frac{\sqrt{λ_{1}}}{| w^{3} \otimes w^{2}} | .

L_0.5-penalization. In the case of L_0.5 penalization and considering one of the optimization steps of ALS, e.g., (12), the cost function CostF_{L_0.5} to minimize takes the following form:

\begin{array}{l} C o s t F_{L_{0.5}} (w_{j}^{1}) = {‖ v_{1}^{j} - w_{j}^{1} {(w^{3} \otimes w^{2})}^{T} ‖}^{2} + λ_{1} \sqrt{| w_{j}^{1} |} & (16) \end{array}

or, equivalently,

\begin{array}{l} C o s t F_{L_{0.5}} (w_{j}^{1}) = {‖ w^{3} \otimes w^{2} ‖}^{2} {({(w_{j}^{1})}_{L S} - w_{j}^{1})}^{2} + λ_{1} \sqrt{| w_{j}^{1} |}, & (17) \end{array}

with solution,

{(w_{j}^{1})}_{L_{0.5}} = {\begin{cases} 0, i f j \in L_{1} a n d {(w_{j}^{1})}_{L S} \leq T h r e s h o l d_{L_{0.5}} \\ ​ a r g m i n (C o s t F_{L_{0.5}} (0), C o s t F_{L_{0.5}} (ℬ \cdot {(w_{j}^{1})}_{L S})), \\ i f j \in L_{1} a n d {(w_{j}^{1})}_{L S} > T h r e s h o l d_{L_{0.5}} \\ {(w_{j}^{1})}_{L S}, o t h e r w i s e, \end{cases}

where

T h r e s h o l d_{L_{0.5}} = \frac{3}{4} {(\frac{λ_{1}}{‖ w^{3} \otimes w^{2} ‖^{2}})}^{^{2} ​​ ╱ ​ ​_{3}},

and $B$ is the biggest root of the cubic equation,

x {(1 - x)}^{2} = C,

C = \frac{λ_{1}^{2}}{‖ 16 w^{3} \otimes w^{2} ‖^{4} {({(w_{j}^{1})}_{L S})}^{3}},

in the interval [0; 1] (see Appendix).

L₁ -penalization. Finally, in the case of L₁ penalization, considering one optimization step of ALS optimization, e.g., (12), the solution is an element-wise soft-thresholding of the least square solution ${(w_{j}^{1})}_{L S} j = 1, \dots, I_{1}$ (see Appendix):

{(w_{j}^{1})}_{L_{1}} = {\begin{cases} 0, i f j \in L_{1} a n d {(w_{j}^{1})}_{L S} \leq T h r e s h o l d_{L_{1}} \\ s i g n ({(w_{j}^{1})}_{L S}) (| {(w_{j}^{1})}_{L S} | - T h r e s h o l d_{L_{1}}), \\ i f j \in L_{1} a n d {(w_{j}^{1})}_{L S} > T h r e s h o l d_{L_{1}} \\ {(w_{j}^{1})}_{L S} o t h e r w i s e \end{cases}

where

T h r e s h o l d L_{1} = \frac{λ_{1}}{{‖ w^{3} \otimes w^{2} ‖}^{2}} .

2.4.2. Integration to PREW-NPLS

A Penalized PARAFAC (R = 1) is used in the REW-NPLS algorithm to extract a set of projectors ${w_{f}^{1} \in ℝ^{I_{1}}, w_{f}^{2} \in ℝ^{I_{2}}, \dots, w_{f}^{m} \in ℝ^{I_{m}}}_{f = 1}^{F}$ . For f = 1, all projector elements can be potentially penalized: $ℒ_{j, 1} = {1, 2, \dots, I_{j}}$ , j = 1, …m. After the first set of projectors' extraction, non-zero elements of the projectors correspond to tensor slices already included in the decoding model with non-zero coefficients. For the next iterations, corresponding indexes were removed from a set of indexes to be penalized, resulting in a sequence $ℒ_{j, 2} \subset ℒ_{j, 1}$ . A scheme representing the PREW-NPLS algorithm in the case of penalizing one data tensor direction is shown in Figure 1.

FIGURE 1

Figure 1. Penalized REW-NPLS (PREW-NPLS) algorithm. For the model update iteration UI, an updated training data set is collected. The update loop includes the recursive validation procedure for the evaluation of optimal latent variable space dimension f (number of factors) and penalized NPLS loop, which includes the sequential tensors XY and XX decomposition. Each step of decomposition uses Rank 1 PARAFAC, which employs penalized alternating least square procedure. Selective sparsity-promoting penalization with penalization parameter λ is applied, e.g., to the spatial dimension. Part of the weights is set to zero due to penalization (shown in red). This results in zero slices in the decoding model (shown in red). A set of updated models for all parameters f are stored up to the maximum f value, 100 in the current setting.

2.5. Experiments

This study relies on the neural signal dataset recorded during the online closed-loop BCI clinical experiments. The ≪ BCI and Tetraplegia ≫ clinical trial (NCT02550522, ClinicalTrials.gov) (University Hospital, Grenoble, 2015) was approved by French authorities: National Agency for the Safety of Medicines and Health Products (Agence nationale de sécurité du médicament et des produits de santé, ANSM) with the registration number 2015-A00650-49 and the Ethic Committee for the Protection of Individuals (Comité de Protection des Personnes, CPP) with the registration number 15-CHUG-19. The research activities were carried out in accordance with the guidelines and regulations of the ANSM and the CPP. The patient provided informed consent for the clinical trial and publication as well as to publish the information/image(s) in an online open-access publication. Details of the clinical trial protocol are available in Benabid et al. (2019).

The participant was a 29-year-old right-handed male with traumatic sensorimotor tetraplegia caused by a complete C4–C5 spinal cord injury two years prior to the study. He underwent bilateral implantation of two chronic wireless WIMAGINE implants (Benabid et al., 2019) for ECoG signal recording on 21 June 2017. Two WIMAGINE recording systems were surgically implanted into the skull near the sensory-motor cortex (SMC) through a 25-mm radial craniotomy. Before the surgery, the patients' SMC was clearly localized using functional imaging. Details are provided in Benabid et al. (2019). The WIMAGINE device is made up of an active implantable medical device composed of 64-plane platinum-iridium 90/10 electrodes with a 2.3 mm diameter and a 4–4.5 mm inter-electrode distance (Sauter-Starace et al., 2019). The recorded signals were low- and high-pass filtered, with a bandwidth range of 0.5Hz to 300Hz, using analog low-pass filters as well as a digital low-pass FIR filter embedded into the implant hardware. The digitized ECoG data from 32 electrodes from each implant (Figure 2) were radio transmitted to a custom-designed base station at a 586 Hz sampling rate.

FIGURE 2

Figure 2. Experimental design. (A) The patient underwent bilateral implantation of two chronic wireless WIMAGINE implants. The recording systems were epidurally implanted within a 25 mm radial craniotomy in front of the sensory-motor cortex. The WIMAGINE implant is composed of 64 plane electrodes with a 2.3 mm diameter and a 4–4.5 mm inter-electrode distance. The digitized ECoG data from 32 electrodes from each implant (shown in red) are radio transmitted to a base station at the 586 Hz sampling rate. Reference electrodes are shown in green. (B) The database was recorded during the online closed-loop BCI experiments of alternative upper limbs BCI control. The virtual avatar was used as visual feedback. (C) The data sets were recorded in the period of 468 to 666 days after implantation and includes 43 sessions in general. The first six experimental sessions were used for incremental real-time decoder training. Separate decoders were identified to control different tasks: alternative 3D reaching tasks, and wrists rotation. They were mixed together using the Markov Mixture of expert approach (Moly et al., 2022). The decoders were tested without re-calibration with 37 experimental sessions distributed over 5–203 days after the last model update. Only 3D reaching tasks of both hands are included in the current study. The 3D reaching tasks are composed of a series of trials in which the sequentially proposed targets must be reached (pursuit tasks). The cursor position is not reset during the task, between tasks, and during the idle state.

Since the implantation, the patient had been trained to control multiple real and virtual effectors, such as games that were specially created for BCI training, a wheelchair, an exoskeleton, an avatar, and so on, using a custom-made BCI platform (Benabid et al., 2019). The database used in the study was recorded during the online closed-loop BCI experiments of upper limbs BCI control. The BCI sessions included alternative active states (AS) of the 3D reaching task for each hand, 1D wrist rotation of each hand, and the idle state (IS) (Benabid et al., 2019). The patient aimed to reach the proposed targets or rotate the wrist to specific angles following pursuit tasks. A pursuit task session was composed of successive tasks, e.g., a left-hand 3D reaching task, a right-hand 3D reaching task, IS, and so on. Each task is composed of several trials in which the cursor must reach the proposed targets. The cursor position is not reset between tasks, during the task, or during the idle state. Twenty-two targets were symmetrically distributed in two cubes in front of the patient. The virtual avatar was used as visual feedback (Benabid et al., 2019). In this study, only the left- and right-hand 3D translation trials were used for algorithm evaluation. The experimental paradigm is illustrated in Figure 2.

Data were recorded in the period of 468–666 days after implantation and included 43 sessions. The average session duration was 29 ± 8 min. During the experiments, the first six experimental sessions were used for incremental real-time decoder updates. REW NPLS decoders for each active state task were integrated into the BCI clinical trial platform. The decoders were initialized with a zero. The total training time of the decoding models was 3 h and 37 min, including a total of 189 and 194 trials of the left- and right-hand translation, respectively. The decoders were calibrated during experimental sessions in late September 2018 and were tested without re-calibration in experiments from early October 2018 to mid-March 2019, with 37 experimental sessions distributed over 5–203 days after the last model update session (468–666 days after implantation; Figure 2).

2.6. Application, feature extraction

During the experimental sessions, neural signal epochs of 1s for 64 recording channels $X_{t} \in ℝ^{586 x 64}$ with sliding steps of 100 ms used for feature extraction. The epochs were mapped into the feature space using a complex continuous wavelet transform (CCWT) (Morlet) with a frequency range of 10–150 Hz and a 10-Hz step. CCWT is a common feature extraction strategy that is widely used in the field of BCIs (Chao et al., 2010; Shimoda et al., 2012; Schaeffer and Aksenova, 2016; Eliseyev et al., 2017; Choi et al., 2018). The absolute value of CCWT was decimated by averaging along the temporal modality to obtain a 10-point description of 1s time epoch for each frequency band and for each channel, resulting in the temporal-frequency-spatial neural feature tensor ${\underline{X}}_{t} \in ℝ^{10 x 15 x 64}$ . The observations of neural features X_t and the movement features y_t recorded during the decoder update/calibration sessions were used for the decoding model identification. The movement features (optimal prediction) at time t is defined as the 3D Cartesian vector between the current effector position and the target position (Benabid et al., 2019). The control command ŷ_t for 3D hand translation is defined as the cartesian increments of the current position and is sent to the effector at 10 Hz.

2.7. Comparison study

To compare the proposed PREW-NPLS and the generic, non-penalized REW-NPLS algorithms, the decoding of the 3D reaching task for each hand is considered. The comparison study was restricted to penalization according to spatial modality. Three types of penalization L_p, p = 0, 0.5, 1 were tested. For L_p, p = 0.5, 1 penalties, a set of models were evaluated with the penalization parameter λ going from 0 to 0.5 with 0.02 steps. In the case of the L₀-PREW-NPLS, preliminary results highlighted that the studied λ range was not relevant. Therefore, the models with the penalization parameter λ going from 0 to 0.05 with 0.002 steps were estimated. The cases with λ= 0 correspond to the generic, non-penalized REW-NPLS algorithm.

The performance of algorithms was evaluated offline in a pseudo-online manner. Pseudo-online simulation uses the procedure and parameters of online stream data processing. The dataset was recorded during the online closed-loop BCI experiments using the REW-NPLS decoder previously integrated into the BCI system. As the recorded data integrate the neuronal feedback into real-time decoding, the presented offline comparison study is not fully generalizable to the online case. Nevertheless, it allows the characterization of, to some extent, the studied algorithms.

To be as close as possible to the settings of the online experiments, in the comparison study, the penalized models were calibrated on the same experiments that were used for decoder training during the online closed-loop experiments.

The predicted trajectories performed during the online closed-loop experiments are related to the decoding model used during the experiments. Consequently, sample-based indicators were used to compare the predictions of the tested algorithms in the offline study. The cosine similarity indicator was based on the comparison between the predicted directions ŷ_t and the optimal direction y_t was employed:

C o s S i m_{t} = \frac{y_{t} \cdot {\hat{y}}_{t}}{‖ y_{t} ‖ ‖ {\hat{y}}_{t} ‖} .

Here, “·” defined the dot product y_t and ŷ_t, and they are the optimal and predicted output. The optimal 0 output is defined as the 3D Cartesian vector between the current position and the target. Notably, CosSim_t[−1, 1] represent how direct the movement is to the final destination. A mean cosine similarity of 1 corresponds to a direct and short trajectory. CosSim performance criterion is evaluated as the median, 25th (Q₁) and 75th (Q₃) percentiles of the CosSim_t over samples and is notated as CosSim = median (Q₁−Q₃).

The PREW-NPLS algorithms converge to sparse solutions, fixing non-relevant model coefficients to exactly zero. The decoding performance, therefore, is not the only relevant indicator. Considering a penalized model with the penalization restricted to the ith dimension, the model Sparsity index for i^th dimension is introduced as the percentage of slices according to ith dimension of the tensor of model coefficients fixed fully to zero. For example, if the model is penalized according to the spatial dimension, minimizing the number of electrodes involved, Sparsity index for the spatial dimension corresponds to the percentage of electrodes fully excluded from the decoder (zero model coefficient for these electrodes for all frequencies and all time delays).

The significance of the differences in the cosine similarity between the REW-NPLS and PREW-NPLS algorithms was computed for the left and right-hand 3D translation studies and for each penalization parameter λ. The statistical analysis was performed with the non-parametric paired Wilcoxon signed rank test with (α_{multi−class} = 0.00161) and without (α = 0.05) the Bonferroni correction.

3. Results

The cosine similarity performance and the Sparsity of models with L_p, p = 0, 0.5, 1 penalization according to spatial modality for the left and right-hand 3D movement tasks are presented depending on the penalization coefficient λ in Figure 3. The generic REW-NPLS (λ = 0) performance is presented in the first position of each sub-figure.

FIGURE 3

Figure 3. The cosine similarity and the model sparsity for the L_p penalized REW-NPLS algorithm for 3D reaching tasks decoding. The cosine similarity and the model sparsity for the L₀-PREW-NPLS (A), L_0.5-PREW-NPLS (B), and L₁-PREW-NPLS (C) estimated for the left-hand 3D reaching task control. The cosine similarity and the model sparsity for the L₀-PREW-NPLS (D), L_0.5-PREW-NPLS (E), and L₁-PREW-NPLS (F) estimated for the right hand 3D reaching task control. The cosine similarly is summarized using a box plot where the red line is the median the blue lines indicate the 25th and 75th percentiles (Q₁ and Q₃). Additionally, the whiskers show the upper and lower extreme cosine similarity obtained for the data set. The generic REW-NPLS algorithm results are presented in the first box plot of each sub-plot corresponding to λ = 0.

The generic REW-NPLS (λ = 0) performance highlighted a CosSim = 0.223 (0.158 − 0.266) (median = 0.223, Q₁ = 0.158, Q₃ = 0.266), for the left hand, and CosSim = 0.127 (0.047 − 0.155) (median = 0.127, Q₁ = 0.047, Q₃ = 0.155) for the right-hand 3D translation decoding.

The penalized decoders showed relevant performance for different penalization parameters λ. Sparsity indexes generally increase with higher penalization coefficients. Decoding performance for all penalty types and penalization parameters were highly variable between sessions (high inter-session variability). Decoding of the right-hand translation stresses worse cosine similarity than decoding of the left-hand, but PREW-NPLS allowed for better decoding performance than generic REW-NPLS.

L_0 PREW-NPLS algorithm demonstrated improved performance compared to the generic REW-NPLS for different parameters λ in both hand translation and decoding tasks (Figures 3A, D). For a minor penalization, λ = 0.01, low sparsity was observed: Sparsity = 0% for the left hand, and Sparsity = 4.68% corresponding to only three electrodes removed for the right hand. Additionally, the performance reached CosSim = 0.252 (0.165 − 0.296) for the left hand and CosSim = 0.157 (0.1018 − 0.203) for the right hand translation, resulting in a median movement of the cosine similarity of 13 and 24%, respectively.

Performance improvement was demonstrated for sparser solutions using different penalization parameters. For a moderate penalty, λ = 0.026, cosine similarity CosSim = 0.248 (0.173 − 0.288) was achieved with a Sparsity = 56.25% for the left hand. For the right hand, with λ = 0.018, cosine similarity CosSim = 0.157 (0.0989 − 0.185) was achieved with Sparsity = 37.5%. A sparser solution with Sparsity = 45.31% was obtained for the right hand for λ = 0.024, with CosSim = 0.153 (0.0786 − 0.198).

For a strong penalty (0.04 ≤ λ ≤ 0.046), L_0 PREW-NPLS demonstrated for the left hand a similar decoding performance CosSim = 0.248 (0.162 − 0.294) with 40 electrode parameters set to the 0 value, Sparsity = 62.5%. For the right hand, L_0 PREW-NPLS models converged to a highly sparse solution with λ> 0.046 with 48 electrodes over 64 removed (Sparsity = 75%), which highlighted decoding performance similar to the generic REW-NPLS [CosSim = 0.128 (0.058 − 0.168)].

Similarly, the L_0.5 PREW-NPLS decoder demonstrated better decoding performance with sparser solutions for several penalization parameters compared to the generic REW-NPLS decoder (Figures 3B, E).

In particular, for a moderate penalty, λ = 0.22, for the left hand, 18 electrode parameter weights were set to zero (Sparsity = 28.13%), achieving higher performance CosSim = 0.253 (0.189 − 0.301) compared to the generic REW-NPLS model. Similarly, for the right hand, for penalization λ = 0.16, a sparse model (Sparsity = 54.69%) was identified, increasing the decoding performance CosSim = 0.150 (0.0881 − 0.176). Sparsest models were obtained for λ = 0.26, 0.28 with Sparsity = 79.69% and Sparsity = 78.13%, respectively.

With higher penalization, for the left hand, the performance CosSim = 0.245 (0.156 − 0.2838) was achieved with Sparsity = 35.94% for λ = 0.3. For λ ≥ 0.32, the sparsity converged to Sparsity = 28.13% with decoding performance similar to the generic REW-NPLS model: CosSim = 0.217 (0.143 − 0.261). For the right hand, for λ ≥ 0.36, the models converged to the same solutions with Sparsity = 68.75% (44 electrodes removed) and a cosine similarity CosSim = 0.131 (0.0835 − 0.186) slightly superior compared to the generic REW-NPLS decoder.

L_1 PREW-NPLS decoder demonstrated similar results (Figures 3C, F). For low penalty, λ = 0.12, Sparsity = 0%, the L_1 PREW-NPLS model highlighted CosSim = 0.253 (0.151 − 0.286) for the left hand. For the right hand, the models with small penalization parameters λ = 0.04, 0.06 and 0.1 with zero sparsity (Sparsity = 0 %) highlighted a cosine similarity CosSim = 0.154 (0.0915 − 0.202), CosSim = 0.158 (0.0791 − 0.184) and CosSim = 0.164 (0.0959 − 0.191) representing an improvements of 21, 24, and 29%, respectively.

For higher penalization parameters, Sparsity = 29.69% was reached for λ = 0.20 with a decoding performance of CosSim = 0.249(0.162 − 0.295) for the left hand. Similarly, for the right hand, the decoding performance CosSim = 0.154 (0.101 − 0.192) and CosSim = 0.152 (0.0872 − 0.197) with 33 (51.56%) and 44 (68.75%) electrode parameters weights set to zero was achieved for λ = 0.22, 0.26, respectively.

Finally, for the strong penalization, λ ≥ 0.34, 41 electrode parameter weights were set to zero leading to CosSim = 0.245 (0.173 − 0.283) for the left-hand. For penalization parameters λ ≥ 0.38, the models stabilized to a solution with Sparsity = 68.75% with a decoding performance CosSim = 0.131 (0.0835 − 0.186) for the right hand.

The REW-NPLS and the L_p PREW-NPLS model parameter weights were shown in the temporal, frequency, and spatial domains in Supplementary Figures 1, 2 for the left- and right-hand translation models, respectively. The presented models were the ones with the maximal penalization coefficient λ=0.06,0.4 and 0.4 for the L_0-, L_0.5,- and L_1 PREW-NPLS algorithms, respectively. The spatial parameter weights are presented in Supplementary Figures 3, 4 on a map with the electrode locations relative to the sensory (SS) and motor (MS) sulci.

The results of the statistical analysis performed with the non-parametric paired Wilcoxon signed rank test with and without the Bonferroni correction are presented in Supplementary Tables 1, 2 for the left- and right-hand translation and decoding for each of the PREW-NPLS algorithms. Numerous models highlighted the statistically significant performance improvement of the proposed PREW-NPLS algorithms compared to the generic REW-NPLS for all the penalization types.

4. Discussion

Among the various potential applications, the functional compensation/restoration of individuals suffering from severe motor disabilities has always been a focus of BCI research. The primary challenge of motor BCIs is the high-dimensional control of complex effectors. To achieve this objective, high-resolution neuronal activity recording is generally required, which results in the high-dimensional data flow being processed in real time with a high decision rate of at least (8–10 Hz). Moreover, closed-loop decoder training is reported to be more effective (Jarosiewicz et al., 2013) compared to decoders trained classically using training data recorded in an open-loop setting. CLDA BCIs use incremental learning/adaptive algorithms capable of updating a decoder in real time during BCI sessions, increasing the computational load. In addition to high computing time and power requirements, high-dimensional feature space may lead to numerous issues, complicating decoder training. Efficient feature selection decreases the feature space dimension and may allow for improving the generalization ability of the decoder by excluding irrelevant, redundant, or noisy variables. As neuronal activity features are naturally grouped (e.g., by the sensors and the frequency bands), group-wise feature selection may be beneficial, which can help fully exclude non-informative or redundant electrodes or/and frequency bands from consideration. A group-wise feature selection procedure embedded in adaptive/incremental learning of the decoder is proposed in the study. It was based on sparsity-promoting penalization of the data tensor decomposition integration to the REW-NPLS algorithm. L_p, p = 0, 0.5, 1 norm/pseudo-norm penalties were studied. Sparsity-promoting penalties were applied to feature groups corresponding to slices of the data tensor related to the mode of analysis (e.g., spatial).

The proposed algorithms were tested with a dataset of left/right arms 3D translations of a virtual avatar controlled by the tetraplegic subject. The studied models were trained on the same data (the first six sessions), which were used for decoder training during the online experiments. The number of training sessions was relatively small (14%) and focused at the beginning of the experiments. This may partially explain the high inter-session variability of the decoding performance for all the algorithms (depicted in Supplementary Figure 5). However, L_p-PREW-NPLS algorithms highlighted equal or better decoding performance compared to the generic REW-NPLS decoder with sparse solutions, with up to two-thirds of the electrode parameter weights set to zero for both tasks of the left and the right-hand translation. Computational load measurements demonstrated the expected decrease (close to linear) in memory consumption and computational time during the model execution stage, depending on the model sparsity for 3D hand translation. 25, 50, and 75% of sparsity in the spatial dimension results in an average of 25, 50, and 75% decreases in memory consumption and a 23, 45, and 67% decrease in computational time.

The PREW-NPLS decoders highlighted equal or better decoding performance compared to the generic REW-NPLS algorithm for the majority of the penalization parameters. For example, for the right-hand decoding, the L₁-PREW-NPLS decoder with a moderate penalization parameter λ = 0.26, highlighted a significant (p−value = 10⁻⁶) cosine similarity improvement of 21, 116, and [[Mathtype-mtef1-eqn-317.mtf]] for the median and the 25th and 75th percentiles, respectively, with fewer than half of the electrodes, maintained at a non-zero value (33 electrodes set to zero). For the small penalization parameters (λ = 0.1), the model converged to a non-sparse solution with a significant cosine similarity improvement (p−value = 10⁻⁶), leading to a median, 25th and 75th percentile enhancement of 24, 104, and 23%, respectively. The sparsest solution with the L₁-PREW-NPLS algorithm removed 75% of the electrodes without decreasing the cosine similarity indicator and reducing the features space from 10 × 15 × 64 = 9, 600 features to 10 × 15 × 16 = 2, 400 features.

Decoding performance improvements were more evident for the 3D right-hand translation models than the left-hand 3D translation. As the 3D right-hand translation models demonstrated lower decoding performance than left-hand translation models, a sparse decoder is more effective for a less performant neural signal decoder in this study.

An increase in the training dataset size may potentially improve decoder performance and limit overtraining. Supplementary Figure 6 shows the cosine similarity and model sparsity using L₁-PREW-NPLS with larger training datasets: 30 and 50% of the whole data (13 and 22 sessions, respectively). It results in fewer sparse models and an improvement in the decoder's performance (Supplementary Figure 6). However, in BCI research, the recording of data is highly expensive, limiting access to big datasets. The decoder training method with shorter recordings is highly profitable for real-life BCI applications.

The relative efficiency of penalization types is unclear in the current study. The maximum improvement in performance was achieved using L₁ penalization, but the results are comparable to other penalization types. Application L_0.5 penalty is computationally heavier compared to L₀ or L₁ penalties without evident advantages. A complimentary comparative study may allow clarification of the relative efficiency of the proposed penalizations.

L₀-PREW-NPLS and L₁-PREW-NPLS algorithms are sufficiently computationally effective to be applied for the online closed-loop decoder adaptation. To reproduce the online experimental conditions, a pseudo-online simulation was conducted using the same parameters (buffer size, batch training, and so on) as used in real time to ensure the compatibility of the proposed algorithm with the real-time application.

Finally, reducing the feature space dimension may lead to more interpretable models. Several activation patterns were discernible. In the spatial domains, the electrodes close to the motor and sensory sulci exhibit important parameter weights.

The current study is limited to offline simulations using datasets recorded from previous online closed-loop experiments with the REW-NPLS decoder integrated into the BCI system. As the neuronal signal integrates the neuronal feedback to the decoder, which was applied in real time, the offline comparison study is not fully generalizable. However, conducting real experiments to compare all algorithms is too costly. In addition, as the neuronal feedback is related to the real-time decoder output, the comparison results are likely to be biased in favor of the decoder applied during real-time experiments rather than the newly tested algorithms.

The study presented in the study is limited to penalization in the spatial domain, which is crucial for decreasing data transfer. In order to evaluate the influence on the decoding performance of penalization applied to other modalities, further research needs to be conducted to include decoders that promote sparsity in the frequency and temporal domains.

This study reports a case study. The results are obtained from the dataset, which was based on a single-subject clinical trial. Further tests with more subjects are required to support the conclusions. The authors plan to contribute to conducting clinical trials to assess the proposed algorithm in real-time experiments.

Data availability statement

The data analyzed in this study is subject to the following licenses/restrictions. Data used for this study are from recordings from the clinical trial (ClinicalTrials.gov, NCT02550522) and are not publicly available. Part of the dataset may be provided upon reasonable request. Requests to access these datasets should be directed at: tetiana.aksenova@cea.fr.

Ethics statement

The studies involving human participants were reviewed and approved by National Agency for the Safety of Medicines and Health Products (Agence nationale de sécurité du médicament et des produits de santé: ANSM), registration number 2015-A00650-49, and the Committee for the Protection of Individuals (Comité de Protection des Personnes—CPP), registration number 15-CHUG-19. The patients/participants provided their written informed consent to participate in this study.

Author contributions

AM, AA, and TA: designed the algorithm. AM and AA: helped with using software. AM and FM: analyzed the data. AM, AA, FM, and TA: wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding

CLINATEC^® is a Laboratory of CEA-Grenoble and has statutory links with the University Hospital of Grenoble (CHUGA) and with the University Grenoble Alpes (UGA). This study was funded by CEA (recurrent funding) and the French Ministry of Health (Grant PHRC-15-15-0124), Institut Carnot, and Fonds de Dotation CLINATEC^®. The Foundation Philanthropique Edmond J. Safra is a major funding institution for the CLINATEC^® Edmond J. Safra Biomedical Research Center.

Acknowledgments

We thank Thomas Costecalde, Serpil Karakas, Olivier Faivre, Alexandre Verney, Benoit Milville, Guillaume Charvet, Stéphan Chabardes, Alim-Louis Benabid, and all the team members working at CLINATEC^® for the help in the experimental setup and clinical trial experiments.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.2023.1075666/full#supplementary-material

Supplementary Table 1. Significance of the performance (CosSimmedian) differences between the REW-NPLS and the L_0, L_0.5, and L_1 PREW-NPLS algorithms for the left hand 3D translation study using the non-parametric paired Wilcoxon signed rank test with and without the Bonferroni correction (α = 0.05 and α_Bonferroni = 0.00161).

Supplementary Table 2. Significance of the performance (CosSimmedian) differences between the REW-NPLS and the L_0, L_0.5, and L_1 PREW-NPLS algorithms for the right hand 3D translation study using the non-parametric paired Wilcoxon signed rank test with and without the Bonferroni correction (α = 0.05 and α_Bonferroni = 0.00161).

Supplementary Figure 1. Parameter weights of the generic REW-NPLS and L_p, p = 0, 0.5, 1 PREW-NPLS for the left-hand 3D translation. Absolute values of model coefficients are projected into spatial, frequency, or temporal domains. The parameter weights related to the three controlled axes (y₁, y₂, and y₃) are represented using blue, orange, and yellow lines, respectively.

Supplementary Figure 2. Parameter weights of the generic REW-NPLS and L_p, p = 0, 0.5, 1, PREW-NPLS for the right-hand 3D translation. Absolute values of model coefficients are projected into spatial, frequency, or temporal domains. The parameter weights related to the three controlled axes (y₁, y₂, and y₃) are represented using blue, orange, and yellow lines, respectively.

Supplementary Figure 3. Parameter weights of the models identified using generic REW-NPLS and L_p, p = 0, 0.5, 1, PREW-NPLS algorithms (the left-hand 3D translation study) projected into the spatial domain depending on the electrode locations in the implant. The sensory sulcus (SS) and motor sulcus (MS) are shown by yellow and red curves respectively.

Supplementary Figure 4. Parameter weights of the models identified using generic REW-NPLS and L_p, p = 0, 0.5, 1, PREW-NPLS algorithms (the right-hand 3D translation study) projected into the spatial domain depending on the electrode locations in the implant. The sensory sulcus (SS) and motor sulcus (MS) are shown by yellow and red curves respectively.

Supplementary Figure 5. The cosine similarity for the L_p penalized REW-NPLS algorithms (p = 0, 0.5, 1), for different λ for 37 test sessions. Cosine similarity of decoding of left-hand 3D translation is depicted in subplots (A–C). Cosine similarity of decoding of right-hand 3D translation is depicted in subplots (D–F).

Supplementary Figure 6. The cosine similarity and the model sparsity for the L_1 penalized REW-NPLS algorithm for 3D reaching tasks using 30% (A, C) and 50% (B, D) of the data set for the decoder training. The cosine similarly is summarized using a box plot where the red line is the median the blue lines indicate the 25th and 75th percentiles (Q₁ and Q₃). Additionally, the whiskers show the upper and lower extreme cosine similarity obtained for the data set. The generic REW-NPLS algorithm results are presented in the first box plot of each sub-plot corresponding to λ = 0.

References

Andreu-Perez, J., Cao, F., Hagras, H., and Yang, G. (2018). A self-adaptive online brain–machine interface of a humanoid robot through a general type-2 fuzzy inference system. IEEE Trans. Fuzzy Syst. 26, 101–116. doi: 10.1109/TFUZZ.2016.2637403

Online adaptive group-wise sparse Penalized Recursive Exponentially Weighted N-way Partial Least Square for epidural intracranial BCI

1. Introduction

2. Methods

2.1. Generic partial least squares and family

2.2. REW-NPLS

2.3. PARAFAC procedure in REW-NPLS

2.4. Lp-Penalized REW-NPLS (PREW-NPLS)

2.4.1. Lp-penalized PARAFAC (R = 1)

2.4.2. Integration to PREW-NPLS

2.5. Experiments

2.6. Application, feature extraction

2.7. Comparison study

3. Results

4. Discussion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher's note

Supplementary material

References

2.4. L_p-Penalized REW-NPLS (PREW-NPLS)

2.4.1. L_p-penalized PARAFAC (R = 1)