Subject-invariant feature learning for mTBI identification using LSTM-based variational autoencoder with adversarial regularization

Salsabilian, Shiva; Najafizadeh, Laleh

doi:10.3389/frsip.2022.1019253

ORIGINAL RESEARCH article

Front. Signal Process., 30 November 2022

Sec. Biomedical Signal Processing

Volume 2 - 2022 | https://doi.org/10.3389/frsip.2022.1019253

This article is part of the Research TopicAdversarial Machine Learning and Domain Generalization in Neurophysiological Signal AnalysisView all 4 articles

Subject-invariant feature learning for mTBI identification using LSTM-based variational autoencoder with adversarial regularization

Shiva Salsabilian

Laleh Najafizadeh*

Integrated Systems and NeuroImaging Laboratory, Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, United States

Developing models for identifying mild traumatic brain injury (mTBI) has often been challenging due to large variations in data from subjects, resulting in difficulties for the mTBI-identification models to generalize to data from unseen subjects. To tackle this problem, we present a long short-term memory-based adversarial variational autoencoder (LSTM-AVAE) framework for subject-invariant mTBI feature extraction. In the proposed model, first, an LSTM variational autoencoder (LSTM-VAE) combines the representation learning ability of the variational autoencoder (VAE) with the temporal modeling characteristics of the LSTM to learn the latent space representations from neural activity. Then, to detach the subject’s individuality from neural feature representations, and make the model proper for cross-subject transfer learning, an adversary network is attached to the encoder in a discriminative setting. The model is trained using the 1 held-out approach. The trained encoder is then used to extract the representations from the held-out subject’s data. The extracted representations are then classified into normal and mTBI groups using different classifiers. The proposed model is evaluated on cortical recordings of Thy1-GCaMP6s transgenic mice obtained via widefield calcium imaging, prior to and after inducing injury. In cross-subject transfer learning experiment, the proposed LSTM-AVAE framework achieves classification accuracy results of 95.8% and 97.79%, without and with utilizing conditional VAE (cVAE), respectively, demonstrating that the proposed model is capable of learning invariant representations from mTBI data.

1 Introduction

Mild traumatic brain injury (mTBI) is a common form of brain injury and a growing public health problem. mTBI can have long-lasting effects on patients’ cognitive abilities and social functioning. The diagnosis of mTBI, specially at its early stages, has remained challenging despite the negative effects it has on patient’s quality of life (Iverson et al., 2000; Kou and Iraji, 2014; Schmid et al., 2021). The main reasons for this include rapid recovery of symptoms (e.g., loss of consciousness, confusion, disorientation), and incapability of imaging methods (e.g., computerized tomography (CT) or magnetic resonance imaging (MRI)) in detecting injury at the mild level (Ruff et al., 2009). Hence, developing accurate mTBI diagnostic methods is essential for an early diagnosis of mTBI and providing proper and timely treatments to patients (Eierud et al., 2014; Levin and Diaz-Arrastia, 2015). One major challenge in mTBI identification is the undesirable variability in the data obtained from subjects. Such variability poses difficulties for accurate mTBI diagnosis by preventing mTBI-identification models from being generalizable and transferable to data from new subjects. Therefore, a robust feature extractor for learning accurate subject-invariant injury-related features is desired. In addition, limited data from mTBI subjects are often collected or are available, posing the problem of having inadequate data for training reliable models.

Recently, transfer learning has been utilized to extract features by leveraging knowledge across domains, tasks, or subjects. Cross-subject transfer learning aims at discovering and exploiting invariant and generalizable features across subjects. For example, transfer learning approaches such as learning population level common spatial base dictionaries (Morioka et al., 2015), spectral transfer using information geometry (Waytowich et al., 2016), and regularizing classifiers (Fazli et al., 2009) or feature extractors (Lotte and Guan, 2010), have been proposed for brain computer interfaces (BCIs). In (Bethge et al., 2022a), a multi-source learning framework based on maximum mean discrepancy (MMD) alignment of electroencephalography (EEG) data for emotion classification was presented. In (Peterson et al., 2021), to classify arm movements, a decoder model based on Hilbert transform was trained using pooled electrocorticography (ECOG) data, and tested on ECOG or EEG data from unseen subjects.

Invariant representation learning in neural networks, suggesting the idea of using a latent space shared across subjects, was presented in (Louppe et al., 2017; Xie et al., 2017). In (Angjelichinoski et al., 2020) linear classifiers are trained with multiple transfer functions to transfer data from one subject to another. The idea of generative models, algorithms that learn the posterior distribution of data via Bayesian rules, has been recently proposed. Using deep generative models, the original data are refined and converted into features that increase the intra-class variations and minimize the inter-class variations in the dataset. The most extensively used deep generative learning models include the variational autoencoder (VAE) and generative adversarial networks (GAN) (Goodfellow et al., 2014). The advantage of VAE is the smooth latent representation learning of data and bringing the ability to control the distribution of the latent space, which can be combined with feature learning methods. The principle of GAN has also been applied to transfer learning to address the data variability problem for domain- and subject-invariant feature learning (Ming et al., 2019; Wu et al., 2020; Özdenizci et al., 2020; Salsabilian and Najafizadeh, 2021a). For example, in (Li et al., 2019), to generalize the models for EEG emotion recognition across subjects and sessions, the marginal distribution is adapted in the early layers of the neural networks using adversarial training. In another work (Özdenizci et al., 2020), using an adversarial inference approach, subject variability for motor imagery decoding is decreased. A subject adaptation network inspired by GAN was proposed in (Ming et al., 2019) to align the distribution of data from different subjects. Autoencoder-based neural representation learning models have also recently adapted adversarial regularization for feature disentanglement. For example, subject-invariant representations were learned via a conditional variational autoencoder (cVAE) and an adversarial network from unseen users’ EEG data in motor imagery BCIs (Özdenizci et al., 2019). In (Han et al., 2020, 2021) disentangled adversarial autoencoder (AE) and rateless AE (RAE) feature extractors were proposed to extract nuisance-robust universal features from physiological signals for stress level assessment, demonstrating improvements in cross-subject transfer analysis.

In this paper, we present an mTBI-identification model using cross-subject transfer learning and adversarial networks. The proposed method consists of a long short-term memory-based variational autoencoder (LSTM-VAE) representation learning model with an attached adversarial network. In the proposed model, the adversary network is utilized as a constraint for latent representations to learn representations that are invariant to cross-subject variability. The model can therefore, learn the common structure of the data shared among subjects, making it suitable for cross-subject feature learning and mTBI identification. The LSTM-VAE model combines the representation learning abilities of the VAE with the temporal modeling capabilities of the LSTM. The adversarial network is attached to the encoder in an attempt to ensure the latent representation contains minimum subject-specific information. After training, the trained encoder is used as a feature extractor, and a separate classifier learns to predict mTBI or normal class labels, given the latent representation obtained from the trained encoder. We evaluate the proposed model using cortical activity recordings of Thy1-GCaMP6s transgenic mice that were obtained via widefield calcium imaging, before and after inducing injury.

The rest of the paper is organized as follows. Methods, including the description of the dataset as well as the proposed framework, are presented in Section 2. Results are discussed in Section 3, and the paper is concluded in Section 4.

2 Methods

In this section, we first describe the details of the experimental procedure, data collection, and preprocessing steps. Then, we present the proposed LSTM-based adversarial variational autoencoder (AVAE) and other feature extraction models that we used for comparison.

2.1 Experimental procedure

Cortical recordings from Thy1-GCaMP6 transgenic mice were acquired in the Department of Cell Biology and Neuroscience of Rutgers University using widefield optical imaging. All procedures were approved by the Rutgers University Institutional Animal Care and Use Committee. Animal models of mTBI, in comparison to studying mTBI in human patients offer the opportunity of having control over the experimental parameters and conditions, such as maintaining the same site of injury and similar injury severity levels across subjects. Among the animal models, mice have been widely used (Morganti-Kossmann et al., 2010; Marshall and Mason, 2019). The mouse model mimics many features of TBI seen in humans, including cell death, neuroinflammation, and changes in behavior (Morganti-Kossmann et al., 2010; Wiltschko et al., 2015; Ellenbroek and Youn, 2016; Marshall and Mason, 2019). An important similarity between mice and humans is in their genetic makeup, suggesting that findings from mouse studies can often be related to human (Breschi et al., 2017; Beauchamp et al., 2022). Nevertheless, while mice and human brains exhibit similarities, there are differences between mice and humans that must be considered when using them as models for brain injuries. Among them is their quick healing process from injuries, which should be taken into consideration when studying brain injury’s long-term effects and recovery in mice vs. human TBI models (You et al., 2007; Cortes and Pera, 2021). Widefield calcium imaging in animals enables recording of the neural activity with high temporal and spatial resolutions. This imaging technique has been used to study the relationship between cortical activity and behavior (Zhu et al., 2017; Salsabilian et al., 2018; 2020b; Lee et al., 2020; Salsabilian and Najafizadeh, 2021b), as well as investigating brain’s functional changes in response to injury (Cramer et al., 2019; Salsabilian et al., 2020a; Koochaki et al., 2020; Salsabilian and Najafizadeh, 2020; Salsabilian and Najafizadeh, 2021a; Koochaki and Najafizadeh, 2021; Cramer et al., 2022).

Data acquisition process, the experimental setting and the injury procedure were described previously in (Zhu et al., 2018; Salsabilian et al., 2019; Salsabilian and Najafizadeh, 2020). In summary, Thy1-GCaMP6 transgenic mice were prepared with a transparent skull and a fixation post to record cortical Ca²⁺ transient activity (Lee and Margolis, 2016; Salsabilian et al., 2020b). The left hemisphere and a portion of the right hemisphere were visualized using a custom-designed microscope. The excitation light was filtered (479/40 nm; Chroma) and reflected by a 50 dichroic mirror (Q470/lp, Chroma). Through a 100 × 100 pixel sensor, filtered fluorescence emission was captured at 100 frames per second with a MiCam Ultima CMOS camera (Brain vision). On the day of injury, a small craniotomy ( $\sim 1$ mm diameter) was made over the left frontal bone motor cortex region, leaving the dura intact. Trauma through craniotomy was caused in the motor cortex by activating the controlled cortical impact device, with its parameters set to cause mild trauma.

Spontaneous cortical activity from 12 animals were acquired in two sessions, one prior to and one after inducing the injury. Data obtained from the sessions prior to and after inducing injury are referred here to as normal and mTBI data, respectively. Each recording session included 8 trials of duration of 20.47 s.

2.1.1 Preprocessing

Relative GCaMP6s fluorescence calcium signal changes (ΔF/F%) were computed in every pixel value by subtracting and then dividing each pixel by the baseline. The baseline for each pixel was defined as the average of the fluorescence intensities of that pixel in the first 49 frames. We selected twenty-five 5 × 5-pixel regions of interest (ROIs) or channels (i.e., C = 25) distributed over the cortex based on their location according to S1 (Salsabilian et al., 2019). We obtained timeseries from each ROI by calculating the average pixel intensities within the ROI.

2.1.2 Dataset preparation

A sliding window with the duration of T = 400 and step size of w = 20 time points is moved over the timeseries in each trial (Figure 1). This duration was found to be optimal for capturing the necessary information required for classification. The data within each window is formed as a sampled data matrix $X \in R^{C \times T}$ . By collecting data from all the trials for subject i (i ∈ {1, … , 12}), the dataset ${(X_{n}^{i}, y_{n}^{i})}_{n = 1}^{N_{i}}$ , where $X_{n}^{i}$ denotes the data from C = 25 ROIs under window n (n = 1 … N_i, with N_i = 1328 being the total number of windows), and $y_{n}^{i} \in {0,1}$ representing the mTBI or normal class label of data under window n, is formed. Note that from the total number of data samples N_i = 1328, the number of data per class label yⁱ = 0 or 1 for each subject i is n = 664, resulting in balanced class labels.

FIGURE 1

FIGURE 1. Schematic representation of the proposed model architecture for mTBI subject-invariant feature extraction. Cortical activity from C =25 channels are collected. For each subject s_i, a sliding window with the duration of T and step size of w time points is used, and the sampled data matrices ${(X_{n}^{i}, y_{n}^{i})}$ are collected. The $y_{n}^{i} \in {0,1}$ represents the mTBI or normal class label. The autoencoder maps the data $X_{n}^{i}$ to the latent representation $z_{n}^{i}$ . The adversarial network is trained to minimize the subject dependency in the representations.

To evaluate the subject-invariant feature learning performance of the models in a cross-subject transfer learning experiment, the subject attribute s_i is defined as the subject one-hot encoded label, i.e., a zero vector of size 1 × 12 with 1 at the ith index. The goal is to learn subject-invariant features and an mTBI discriminative model that predicts class label $y_{n}^{i}$ from observation $X_{n}^{i}$ , and is robust to subject variability. To achieve this goal, we impose a requirement that latent representations of autoencoders be independent of the subjects’ attributes s_i.

2.2 Feature extraction models

We now present the proposed subject-invariant feature extraction approach and the mTBI-identification model. The proposed model aims to achieve discriminative properties that are robust to subject variability. The proposed method consists of two components: an LSTM-based variational autoencoder and an adversarial network. The autoencoder is first trained to minimize the reconstruction error of the decoder to ensure that the latent representation contains enough information to allow for the reconstruction of the input. Next, the latent representation is refined to include minimum subject-dependent information by preventing it from predicting the correct subject attribute using adversarial regularization. With this approach, the model is capable of achieving latent representations that contain discriminative properties that correspond to the structure of the data that is common among subjects and therefore, becomes robust to subject variability. Finally, a separate classifier is trained to predict mTBI or normal class labels, given the latent representation obtained from the trained encoder as the feature extractor. In addition, we incorporate conditional VAE (cVAE) in the decoder architecture of the proposed model, to further explore the benefits of removing subject-dependent information from learned representations.

We compare the performance of the proposed model with that of different variants of autoencoder models for feature extraction. The considered models are a simple autoencoder (AE), a variational autoencoder (VAE), a supervised VAE (SVAE), and an adversarial variational autoencoder (AVAE). The schematic illustration of the proposed model and other considered autoencoder feature extraction models are shown in Figure 2. Except SVAE, all the autoencoder models are trained based on the mean squared error (MSE).

FIGURE 2

FIGURE 2. Autoencoder-based data representation learning models: (A) simple autoencoder (AE), (B) variational autoencoder (VAE), (C) supervised VAE (SVAE), (D) adversarial VAE (AVAE), (E) proposed LSTM-AVAE.

2.2.1 Autoencoder (AE)

AE (Figures 2A) reconstructs the input data X as $\hat{X}$ , by learning how to effectively compress and encode the data, usually in an unsupervised manner. The AE loss, $L_{AE}$ , is defined as

L_{AE} = ‖ X - \hat{X} ‖^{2} . (1)

2.2.2 Variational autoencoder (VAE)

VAE (Figure 2B) is a probabilistic variant of AE, that uses variational lower bound of the marginal likelihood, based on Bayesian inference, to identify multivariate patterns in data. The latent variable z is a stochastic variable given the input data X. The probabilistic encoder approximates the posterior q_ϕ(z|X), and the generative decoder represents the likelihood of data X generation by the conditional probability p_θ(X|z). To make the model trainable, the reparameterization approach introduced in (Doersch, 2016) was used. The loss function of the VAE, $L_{VAE (θ, ϕ)}$ , known as the evidence lower bound, is defined as

L_{VAE (θ, ϕ)} = - E [\log p_{θ} (X | z)] + D_{KL} (q_{ϕ} (z | X) ‖ p (z)), (2)

where ϕ, and θ are the encoder and decoder parameters, respectively. The first term in (Eq. 2) represents the autoencoder reconstruction loss, and the second term is the Kullback-Leibler (KL) (Kullback and Leibler, 1951) divergence. The KL divergence is a measure of the similarity between the prior distribution p(z) and the posterior distribution q_ϕ(z|X) of the latent variable. Minimizing the KL divergence regularizes the latent space.

2.2.3 Supervised variational autoencoder (SVAE)

SVAE (Figure 2C) method differs from VAE only in that the data label $y_{n}^{i}$ is also used as input to the decoder. As such, the model is trained in a supervised manner instead, which improves the learning process. The label $y_{n}^{i}$ is concatenated with the latent variable $z_{n}^{i}$ , and used as input to the decoder. The decoder estimates the probability that $X_{n}^{i}$ is generated for a given latent variable $z_{n}^{i}$ and label $y_{n}^{i}$ . In the training phase, SVAE is first trained first as VAE. Next, the network is fine tuned with the binary cross-entropy reconstruction loss of the training data sample $(X_{n}^{i}, y_{n}^{i})$ .

2.2.4 Adversarial variational autoencoder (AVAE)

In order for the model to be generalizable across subjects, representations should be invariant to the subject attribute s_i. To achieve this goal, we utilize an adversary network (Makhzani et al., 2015) parameterized by q_ψ(.). The adversary network is attached to the encoder to enforce the latent representations to include minimum subject-dependent information (Figure 2D). The adversary network is trained to maximize the likelihood q_ψ(s_i|z), which maximizes its ability to predict subject attribute s_i. The VAE is simultaneously trained based on two objectives: the decoder’s reconstruction loss is minimized to ensure that the latent representations include sufficient information for minimizing the input reconstruction error; and the latent representation is enforced to include minimum subject-dependent information by preventing the adversary network from predicting the correct subject attribute. This leads to a model, capable of extracting discriminative features that are common across subjects. The AVAE network is trained simultaneously by these objectives with the loss function, $L_{AVAE} (θ, ϕ, ψ)$ , defined as

\underset{ϕ, θ}{arg min} \max_{ψ} L_{AVAE} (θ, ϕ, ψ), (3)

L_{AVAE (θ, ϕ, ψ)} = - E [\log p_{θ} (\hat{X} | z)] + D_{KL} (q_{ϕ} (z | X) ‖ p (z))] + λ E [\log q_{ψ} (s_{i} | z)],

where λ ≥ 0 denotes the weight parameter that adjusts the impact of the adversary network. Note that AVAE is equivalent to VAE when λ = 0. At each iteration, first, the log-likelihood (max objective) is maximized and the parameters of the adversary network are updated. Then, the parameters of the autoencoder in the min Objective are updated towards the overall loss back-propagation.

2.2.5 LSTM-AVAE

In the proposed LSTM-AVAE model (Figure 2E), we combine the timeseries feature representation learning advantages of the VAE with the temporal modeling performance of LSTM to extract features from data. Bidirectional recurrent neural networks (RNN) have shown to be advantageous over unidirectional RNN (Yu Y. et al., 2019), and hence, here, we use a variant of VAE with its encoder and decoder implemented by bidirectional LSTM networks.

In the proposed model, a bidirectional LSTM layer has two sets of LSTM cells. For each input data sample $X_{n}^{i} \in R^{C \times T}$ , a sliding window with size l = 20 and step size r = 5 moves over the data, creating the subsequent input sample points for the bidirectional LSTM-encoder. The model is trained on the obtained data points. The LSTM encoder iterates through each point in both forward and backward manners. The decoder, which has the same architecture as the encoder, reconstructs the input sequence in reverse order (Srivastava et al., 2015). In bidirectional LSTM, the final hidden state is obtained as $H^{f} = H_{t}^{f} \oplus H_{1}^{b}$ , where $H_{t}^{f}$ and $H_{1}^{b}$ are the final hidden states resulting from the forward and backward progress, respectively (Yu W. et al., 2019). In the LSTM-VAE model, the posterior approximation function, q_ϕ(z|X), is used to define the deterministic function H. Thus, the LSTM-VAE learns the compressed information of the input sequence as a region of the latent space. The randomly sampled latent variable z from the posterior p_θ(z|X) is fed into the decoder’s LSTM. Similar to the AVAE, the adversary network is attached to the encoder to detach subject variability from the learned representations. After the LSTM-VAE model is trained, the model parameters will be fine tuned with the adversary network, similar to the training procedure of the AVAE model.

The details of the architectures for the LSTM, the encoder, the decoder, and the adversary network are summarized in Table 1. The number of hidden layers is set to 1. As a result of the bidirectional structure, the total number of hidden layers is doubled. The number of LSTM hidden units is searched and the results with the highest accuracy are reported. The learning rate of 0.001, the batch size of 32, and the regularization drop-out ratio of 0.2 were selected. For the encoder architecture, 20 temporal and 10 spatial convolutional units are used, embedding the temporal and spatial filtering. The last fully connected layer at the output of the encoder generates the d_z-dimensional latent parameter vector. Our experiments with different activation functions in the architectures showed that, on average, adding an activation function to the last layer and using ReLU activation function offered slightly better results. The classifier utilizes representation z as an input to a fully connected layer with a softmax unit for class label discrimination. The adversary network is realized as a fully connected layer with 8 softmax units for subject discrimination, to obtain normalized log-probabilities that are used to calculate the losses. We used temporal convolution kernel size of 20, and spatial convolution kernel size of 30.

TABLE 1

TABLE 1. Network architectures and parameters.

2.3 Classification

In the proposed model, after training the autoencoder model, the trained encoder with frozen network weights is utilized as an static feature extractor. The feature representations are sampled from the learned encoder μ_x and σ_x. An independent classifier is attached to the encoder and is trained to estimate the class label $y_{n}^{i}$ given the input data $z_{n}^{i}$ (Figure 3).

FIGURE 3

FIGURE 3. Classification model architecture.

The classifier is optimized to minimize the cross-entropy loss, $L_{C (z; γ)}$ , with the parameter γ, defined as

L_{C (z; γ)} = E [- \log p_{γ} (\hat{y} | z)] . (4)

After training, new input data $X_{n}^{i^{'}}$ passes through the trained encoder, and the extracted feature representation $z_{n}^{i^{'}}$ is used as input for the classification to predict the class category ${\hat{y}}_{n}^{i^{'}}$ .

We considered, multilayer perceptron (MLP) (realized as a single layer with 15 neurons), nearest neighbors (NN), linear discriminant analysis (LDA), linear regression (LR), and support vector machine (SVM), as classifiers.

3 Results and discussion

7The goal of this work is to utilize subject-invariant feature extraction to develop models for accurate mTBI identification. The performance of the models were evaluated in a cross-subject transfer learning task with a 1 held-out subject training and testing approach. That is, data from one subject was held out for testing. Data from the remaining subjects were used for training. Data for training and validation sets was formed by randomly selecting 80% and 20% of the data from each of the remaining subjects, respectively. This procedure was repeated for each held-out subject, and the accuracy result averaged over all cross-subject runs was computed. The autoencoder models were first trained using the training data that were normalized to have zero mean. To prevent overfitting, the validation set was used to stop the training process early. The training process was terminated if the performance over the validation set was decreased compared to the previous training epoch. In our models, following the training of the autoencoder models, the trained encoders with frozen weights were used as feature extractors. Next, utilizing the feature representations from the training set and their corresponding class labels y, a separate classifier was trained as shown in Figure 3. To train the classifiers, the binary cross-entropy loss was minimized with the class labels y. In the last step, the held-out subject was used to assess the models’ cross-subject transfer learning performance. We repeated the described training and testing procedures for all subjects, and averaged the accuracy results over all cross-subject runs. The average accuracy results for different feature extraction models using the considered classifiers are reported in Figure 4. To be consistent, these results are based on setting the dimension of the latent representation feature, d_z, to 10 for all models. This value was selected based on further investigation about the effects of d_z on the models’ performance, as will be discussed in Section 3.3.

FIGURE 4

FIGURE 4. Average accuracy results of different models and classifiers in subject-invariant feature extraction for mTBI identification.

Overall, comparing the performance of all the classifiers, we observe that on average, the MLP classifier offers the highest accuracy with various models. The highest accuracy (= 95.8%) is achieved by the proposed LSTM-AVAE using this classifier.

3.1 Impact of variational inference in feature learning

To assess the impact of variational inference in learning features for mTBI identification, we compare the results of AE and VAE. From Figure 4, one can observe that VAE achieves higher accuracy than AE for all the considered classifiers, suggesting that variational representation learning allows for a better feature representation extraction, ultimately resulting in a more accurate mTBI identification model. The reason is due to the additional tunable parameters in the VAE model, in comparison to the AE model, that provide more control over the latent distribution learning ability (Higgins et al., 2016b). It has also been shown that VAE models are capable of learning representations with disentangled factors (Higgins et al., 2016a) due to the isotropic Gaussian priors on the latent variable, the known power of the Bayesian models. The better performance of VAE compared to AE models has also been shown previously in other applications such as anomaly detection, object identification, and BCIs (Dai et al., 2019; Tahir et al., 2021; Zhou et al., 2021). As an additional point, comparing the results of VAE with SVAE, suggests the added value of supervised learning in training better models.

3.2 Impact of adversary network

The positive impact of adversarial networks in learning generalizable representations that are domain-, task-, subjects- and source-invariant has been shown recently in many applications such as in drug molecular analysis (Hong et al., 2019), decoding brain states (Du et al., 2019), brain lesion segmentation (Kamnitsas et al., 2017), and evaluating subjects’ mental states (Bethge et al., 2022b). These methods learn representations that are independent of some nuisance variables such as subject-specific or task-specific variations. In this case, there will be a trade-off between enforcing representations that are independent of the nuisance variable via adversary and retaining enough information for the successful data reconstruction of the decoder. In our model architecture, this trade-off can be balanced via the weight parameter λ.

To investigate the impact of adversary networks on subject-invariant feature learning in cross-subject mTBI identification, we varied the value of the weight parameter λ, which adjusts the impact of adversary network in feature learning, for λ ∈ {0, 0.01, 0.05, 0.1, 0.2, 0.5}. The AVAE and the proposed LSTM-AVAE models with the MLP and LR classifiers were considered for performance comparison. Note that for λ = 0 the models are equivalent to simple VAE and LSTM-VAE models, respectively. We computed the accuracy and F1 score of the classifiers for each λ value, and the results are summarized in Table 2.

TABLE 2

TABLE 2. Comparing the impact of the adversary network on the model accuracy (MLP: multilayer perceptron, LR: linear regression).

From Table 2, one can observe that for both methods and considered classifiers, adding the adversary network to the models (i.e., λ ≠ 0) increases the accuracy, further emphasizing the positive impact of the inclusion of the adversary networks in subject-invariant feature learning. Moreover, it can be seen that the LSTM-AVAE and the MLP classifier with λ = 0.1 achieves the highest classification accuracy of 95.8%, which signifies the robustness of the proposed method in cross-subject transfer learning for mTBI identification.

Moreover, we performed a repeated measures analysis of variance (ANOVA) statistical test on the results of LSTM-VAE and LSTM-AVAE, to statistically compare the performances of the adversary and non-adversary LSTM models, i.e., LSTM-AVAE and LSTM-VAE. We compared the accuracy results for each held-out subject (test data) and across different repeated runs using the MLP classifier. Results, as shown in Figure 5, indicate a significant increase in accuracy with the adversarial training (p = 0.02), which rejects the hypothesis that results are equal across subjects and runs.

FIGURE 5

FIGURE 5. Statistical analysis using ANOVA test comparing adversary (LSTM-AVAE) and non-adversary (LSTM-VAE) models (p =0.02).

3.3 Impact of feature dimension

To investigate the effect of the feature dimension d_z on the models’ cross-subject classification performance and determine the proper feature dimension d_z, we considered different feature dimensions d_z ∈ {3, 5, 7, 8, 9, 10, 11, 12, 13, 15, 18, 20}. The VAE, AVAE, and the proposed LSTM-AVAE models were trained with the extracted latent feature of different dimensions and their optimized parameters, and the corresponding accuracy for the 1 held-out subject averaged over all subjects using the MLP classifier was computed. The results are shown in Figure 6. As can be seen, the highest accuracy is obtained when d_z = 10. Increasing the size of features further than d_z = 10, does not improve the accuracy results of the models, suggesting that higher feature dimension does not provide additional information for the models. Moreover, the result illustrates that LSTM-AVAE provides higher accuracy than VAE and AVAE for all feature dimensions, indicating the importance of including temporal information for mTBI identification, as will be discussed further in the next section.

FIGURE 6

FIGURE 6. Classification accuracy results of the LSTM-AVAE and the VAE models (λ =0.1) for different dimensions of the latent feature, d_z, using MLP classifier.

3.4 Impact of temporal information for mTBI identification

The significance of considering temporal dependency in the analysis of brain activity has been previously noted (Linkenkaer-Hansen et al., 2001; Cornblath et al., 2020; Gu et al., 2021). For example, the added value of considering temporal dependencies has been shown in applications such as emotion recognition (Alhagry et al., 2017) or gait decoding from EEG data (Tortora et al., 2020), or using LSTM recurrent neural networks in discriminating motor tasks from EEG data (Shamsi et al., 2021). Recently, it has been shown that considering the temporal dependency in the GCaMP brain dynamics also improves the performance of the analysis. For example, considering temporal data in studies such as (Salsabilian and Najafizadeh, 2021b; Perich et al., 2021) has improved the behavior decoding and modeling performance using GCaMP data.

Moreover, the improved performance of the VAE models along with LSTM models has been shown in some studies. For example, in (Niu et al., 2020), the added value of considering temporal information is shown using an LSTM-based VAE-GAN network for timeseries anomaly detection. In (Park et al., 2018), using an LSTM-based VAE detector has improved the performance of the robot-assistive model.

In the case of brain injury, brain functional connections may have been disrupted, and considering temporal information may indeed help to find these disruptions and alterations of the brain communication flow. To investigate whether including the temporal dependencies can lead to more precise predictions of mTBI, we used an LSTM network to introduce the temporal dependency to the VAE model. In the presented structure for the LSTM-VAE, the model projects the multivariate timeseries into the latent space representations at each time step, and the decoder uses the latent space representations to reconstruct the input. In this approach, the temporal dependency between the points in each data sample (i.e. X) is processed by the LSTM in the VAE model.

To assess the impact of capturing the spatio-temporal features of the data for mTBI identification, we compare the results of VAE to LSTM-VAE and the results of AVAE to LSTM-AVAE in Figure 4. It can be observed that although we considered temporal and spatial convolution layers in the structure of the autoencoders (see Table 2), all classifiers achieve higher accuracy results with the LSTM-based models (LSTM-VAE and LSTM-AVAE) compared to their non-LSTM counterparts (VAE and AVAE). This result demonstrates that the LSTM-based models are effective in extracting and learning the temporal dependencies in the neural data that are informative for mTBI identification. The highest accuracy of 95.8% is achieved by the proposed LSTM-AVAE using the MLP classifier.

3.5 Subject-specific performance

The accuracy result of the LSTM-AVAE model with the MLP classifier, d_z = 10, and λ = 0.1 obtained for each subject is reported in Table 3. We observe that the proposed model is capable of achieving high accuracy among all subjects with a minimum and maximum mean accuracy of 91.96% and 98.79%, respectively.

TABLE 3

TABLE 3. Subject-specific classification accuracy results of LSTM-AVAE with MLP classifier.

3.6 Impact of conditional decoder

Inspired by recent studies that suggest the experimental results benefit from utilizing conditional VAE (cVAE) (Sohn et al., 2015) in removing the impact of a nuisance variable from the learning representations (e.g., (Özdenizci et al., 2019)), here, we adapt cVAE in the proposed architecture to explore its impact in further removing subject-dependency from mTBI latent representations during training of the encoder. In cVAE, the decoder is conditioned on a nuisance variable as an additional input besides the latent representations. In this way, as the nuisance variable is already given to the decoder, the encoder is expected to only learn representations that are invariant of the nuisance variable. The loss function of the cVAE, $L_{cVAE (θ, ϕ)}$ , is given by

L_{cVAE (θ, ϕ)} = - E [\log p_{θ} (X | z, s)] + D_{KL} (q_{ϕ} (z | X) ‖ p (z)) . (5)

Considering that only the encoder is used for classification, conditioning the decoder on subject variable s, will not affect the rest of the modeling and classification chain. Here, we consider the LSTM-A-cVAE model, by conditioning the decoder in the LSTM-AVAE model on the subject label s_i. We compare the performance of LSTM-A-cVAE with LSTM-AVAE considering the LR and MLP classifiers with λ = {0.1, 0.2}. The training and testing procedures were kept similar to as described earlier. Results are presented in Table 4. We observe that conditioning the VAE on the subject variable has increased the accuracy by 2% on average in cross-subject transfer learning, suggesting the added value of cVAE in removing subject-dependent information from representations for cross-subject mTBI identification.

TABLE 4

TABLE 4. Impact of conditional decoder on accuracy.

4 Conclusion

In this paper, by taking advantage of adversary networks and proposing an LSTM-AVAE model, we addressed the problem of subject variability, which imposes challenges in extracting injury-related features for accurate mTBI diagnosis. The proposed model considers the temporal dependency of neural data and learns representations from neural data, while the attached adversary network disentangles the subject-related information from learned representations, making the model proper for cross-subject feature extraction. The experimental results demonstrated the benefits of the proposed LSTM-AVAE model for accurate mTBI identification, proving the ability of the model in extracting robust subject-invariant features. The proposed approach can be generalized to other domains to learn subject-invariant features.

Data availability statement

The data analyzed in this study is subject to the following licenses/restrictions: The dataset was collected in another research laboratory. Requests to access these datasets should be directed to laleh.najafizadeh@rutgers.edu.

Ethics statement

The animal study was reviewed and approved by the Rutgers University Institutional Animal Care and Use Committee.

Author contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

This work was supported by the National Science Foundation (NSF) under award 1605646, and the New Jersey Commission on Brain Injury Research (NJCBIR) under award CBIR16IRG032.

Acknowledgments

The authors thank Prof. David J. Margolis and Dr. Christian R. Lee, with the Department of Cell Biology and Neuroscience at Rutgers University, for providing the experimental data.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alhagry, S., Fahmy, A. A., and El-Khoribi, R. A. (2017). Emotion recognition based on EEG using LSTM recurrent neural network. ijacsa. 8. doi:10.14569/ijacsa.2017.081046