Discriminant Subspace Low-Rank Representation Algorithm for Electroencephalography-Based Alzheimer’s Disease Recognition

Alzheimer’s disease (AD) is a chronic progressive neurodegenerative disease that often occurs in the elderly. Electroencephalography (EEG) signals have a strong correlation with neuropsychological test results and brain structural changes. It has become an effective aid in the early diagnosis of AD by exploiting abnormal brain activity. Because the original EEG has the characteristics of weak amplitude, strong background noise and randomness, the research on intelligent AD recognition based on machine learning is still in the exploratory stage. This paper proposes the discriminant subspace low-rank representation (DSLRR) algorithm for EEG-based AD and mild cognitive impairment (MCI) recognition. The subspace learning and low-rank representation are flexibly integrated into a feature representation model. On the one hand, based on the low-rank representation, the graph discriminant embedding is introduced to constrain the representation coefficients, so that the robust representation coefficients can preserve the local manifold structure of the EEG data. On the other hand, the least squares regression, principle component analysis, and global graph embedding are introduced into the subspace learning, to make the model more discriminative. The objective function of DSLRR is solved by the inexact augmented Lagrange multiplier method. The experimental results show that the DSLRR algorithm has good classification performance, which is helpful for in-depth research on AD and MCI recognition.


INTRODUCTION
Alzheimer's disease (AD) is a disease characterized by memory loss, slow and gradual changes in brain function, and the manifestations of intellectual loss (Zhang et al., 2021). With the advancement of global aging, AD has now become a major public health problem affecting the world. The existing treatment of AD can only temporarily help relieve memory and cognition, but not a cure. To obtain disease-controlling treatments, it is an urgent need to classify the course of AD for early diagnosis. And especially, the National Institutes of Health revised the clinical diagnostic criteria for AD, characterizing research guidelines for early diagnosis and treatment (Cummings, 2021). The progression of AD is mainly divided into three stages. The first is the early clinical stage with no symptoms; the second is the intermediate stage with mild cognitive impairment (MCI); and the final stage with dementia symptoms (Mirzaei and Adeli, 2022).
More researchers are studying methods that can sensitively and conveniently monitor AD, involving cognitive neuropsychological detection, biochemical detection, neuroimaging detection, and so on. In recent years, electroencephalography (EEG) has become an important tool for studying human brain activity (Ghorbanian et al., 2015). Noninvasive EEG imaging methods are directly related to neural local field potentials and have a high temporal resolution. The millisecond-level temporal resolution and direct electrophysiological information provided by EEG can accurately reflect cognitive behaviors related to human neural activity. Therefore, more studies are beginning to use EEG for the diagnosis and prediction of early AD. For example, EEG spectral studies have revealed that EEG diffuse slow waves are a major feature of AD. EEG studies of AD patients have shown that the reduced power in the alpha (8-15 Hz) band and the increased power in the delta (0.5-4 Hz) band are significant features of AD (Fröhlich et al., 2021). The increase in power in the theta (4-8 Hz) band and the decrease in power in the beta (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) band also indicate that they can be useful for detecting MCI to AD transitions (Maturana-Candelas et al., 2020). Recently, machine learning technology has been widely used in the analysis of brain imaging data, which has greatly promoted the development of cognitive neuroscience. Most of the research revolves around feature extraction and classifier optimization. In terms of feature extraction, Wen et al. (2020) first converted the EEG signals into multispectral images and then used a deep convolutional neural network learning model for EEG classification. Similarly, Ieracitano et al. (2019a) drew the power spectral density of the EEG into the form of a spectrogram, and converted the EEG signal classification into a CNN-based image classification problem. Ieracitano et al. (2019b) spliced the continuous wavelet transform features and bispectral features of EEG signals to achieve the fusion of the two types of features. The advantage of this algorithm is that the fused features can obtain higher accuracy than only using one type of feature. The disadvantage is that the correlation between features is not considered enough. At the same time, the dimension of fusion features is greatly increased, which is easy causing the over-fitting problem.
In terms of classification algorithms, Miltiadous et al. (2021) compared six classification algorithms for EEG analysis for frontotemporal dementia in AD and verified the effectiveness of these algorithms. This study provided solutions for the early diagnosis of frontotemporal dementia. Anuradha and Jamal (2021) detected the progression of AD by detecting abnormal behavior in EEG. The authors used a feed-forward artificial neural network as a classifier to perform EEG feature analysis on abnormal and normal subjects and obtained a classification accuracy of 94.4%. Ge et al. (2020) exploited the robust biomarkers in EEG, combined linear discriminant analysis as a classifier, and proposed a systematic identification framework based on signal processing and computer-aided techniques for the detection of AD. Araujo et al. (2022) developed an intelligent system that can distinguish various stages of AD through EEG signals. The system used wavelet packet to extract multi-band features of EEG signals and used multiple machine learning methods as classification models.
Electroencephalography signals can reflect the functional state of the brain and the activity of brain physiological structures. The difficulties in classifying EEG signals using machine learning algorithms are as follows: first, the amplitude of the EEG signals is usually around 50 µv. The EEG signals are very weak, and their background noise is usually very strong. Second, EEG signals have strong randomness. In the process of acquisition, EEG signals will not only be stimulated by the outside world but also produce interference signals due to their own blinking and other actions. Therefore, it is still a challenging task to use machine learning methods to identify AD based on EEG signals. To solve this problem, the researchers usually reduce the dimension of EEG high-dimensional data and extract a small amount of the most valuable compact information, which not only saves storage space and processing time but also enables learning a robust model (Lei et al., 2021). Subspace learning and low-rank representation can well achieve this goal. Subspace learning is a well-known dimension reduction method in machine learning. Its main goal is to adopt appropriate strategies to map high-dimensional original data into the lowdimensional subspace to reduce the data dimension. Low-rank representation (LRR) can effectively separate the noise in the EEG signals to restore clean data and obtain accurate subspace segmentation of data.
Inspired by the strong theory of subspace learning and low-rank representations, this paper proposes an EEG-based discriminant subspace low-rank representation learning algorithm (DSLRR) for AD recognition. On the one hand, based on the low-rank representation, DSLRR utilizes the supervised information and local manifold information by least squares regression (LSR) and graph discriminant embedding. On the other hand, DSLRR introduces principal component analysis (PCA) and global preserved constraints into the subspace of learning. The algorithm optimization adopts a strategy of alternating parameter updates using the inexact augmented Lagrange multiplier method. Our contribution is as follows: (1) The DSLRR algorithm combines subspace learning and low-rank representation in a flexible manner. (2) By introducing global graph embedding and PCA term, the data projection can preserve the global structure information of EEG data in the discriminant subspace. (3) The learned low-rank representation coefficient can effectively avoid the negative effects of the original data's redundant features and noise information. (4) By introducing LSR and graph discriminant embedding, the learned low-rank representation coefficient can explicitly contain the intrinsic local manifold structure and discriminant information of EEG data. The experiments on four EEG datasets verify that the DSLRR algorithm can be effectively used for the recognition of AD, MCI, and healthy control (HC).

BACKGROUND Electroencephalography Dataset for Alzheimer's Disease and Mild Cognitive Impairment Recognition
The EEG data were obtained from 109 participants recruited at the IRCCS Centro Neurolesi Bonino-Pulejo in Italy, including 23 HC, 49 AD, and 37 MCI (Fiscon et al., 2018). The age of men and women and the proportion of genders are shown in Figure 1. The EEG data collection time was from 2012 to 2013. The scalp electrode position was determined using the international 10-20 system, and EEG data from 19 electrodes were collected. The sampling frequency was 256 or 1,024, and the acquisition time of EEG signals was 300 s. To reduce the effect of the artifact, the EEG signals from 60 to 240 s were selected, and the adopted normalized sampling frequency was 256 Hz. Feature extraction adopted the fast Fourier transform, which divided 180 s of data into six epochs of 30 s, and extracted 16 Fourier coefficients. Therefore, 304 features (19 electrodes × 16 Fourier coefficients) were available for each sample.

Subspace Learning
We have a labeled dataset with n samples Y = y 1 , ..., y n ∈ R d×n , where y i represents the ith training sample, and its class label matrix isȲ = ȳ 1 , ...,ȳ n ∈ R C×n . The dimension of the sample is d, and n samples are divided into C classes. When the dimensionality of the original EEG data is high, the data computational and storage costs will be very large. Thus, a common solution is to project the high-dimensional data into a low-dimensional space (Lei et al., 2021). Let Q ∈ R d×C be the projection matrix, the projection data can be represented as Generally speaking, the premise of manifold subspace learning is that the data exists in high-dimensional space in the form of manifold embedding from low-dimensional space data. The key point of manifold learning is to ensure that low-dimensional data can reflect the inherent structural information contained in high-dimensional space . As a commonly used manifold learning algorithm, locality-preserving projection (LPP) preserves the local neighbor relationship of the data by using an adjacency graph and affinity matrix (Weng and Shen, 2008). The LPP algorithm consists of three steps.
Step 1 is to construct an adjacency graph. For example, we construct an adjacency graph using the k-nearest neighbor algorithm. The nearest neighbors of each point connected to it are known as neighbor nodes.
Step 2 is to assign weights to each edge. In the adjacency graph, the affinity matrix represents the similarity between sample points, which can generally be calculated using the two-value method, cosine distance or Gaussian kernel function. For example, the affinity matrix E constructed by the two-value method can be defined as follows: where N k (y i ) represents the k nearest neighbor nodes of y i .

Low-Rank Representation
Low-rank representation aims to exploit the sparsity of matrix singular values to model high-dimensional data in multi subspace (Li et al., 2017;Jiang et al., 2021). Given a dataset Y, the LRR algorithm regards the input data itself as a dictionary and uses the basis in the dictionary to linearly represent the sample points, while minimizing its rank. The optimization problem of LRR can be described as follows: where L ∈ R n×n is the representation coefficients of Y, which reflects the global correlation between the original data samples.
In theory, the coefficient matrix L obtained by the LRR should be a block diagonal matrix. That is to say, each block corresponds to a subspace, the number of blocks represents the number of data subspaces, and the size of the block corresponds to the dimension of the subspace. Eq. (2) is not a convex optimization problem due to its discrete. Using the nuclear norm instead of rank(L), Eq. (2) can be transformed into the convex optimization problem as: where |||| * is the nuclear norm. Considering the noise or sparse error in Y, LRR enhances the model's robustness by improving the correlation between the individual columns of L, and the problem of LRR can be written as: min where S ∈ R d×n is sparse component of Y. θ is the regularization parameter. Obviously, LRR decomposes the data Y into low-rank representation YL and sparse representation S. The former component YL generally represents the main features contained in Y, and the latter generally represents the redundant features and noise information contained in Y. In the clean data scenario, S represents the reconstruction error. Therefore, L can accurately indicate the subspace segmentation of Y, which ensures the robustness of the learned model. However, LRR ignores the role of local structure information in data and does not exploit the supervised information in the training data. Therefore, LRR cannot reflect the intra-class identity and inter-class dissimilarity in low-rank representation.

Discriminant Margin Term on Representation Coefficients
To learn the discriminant low-rank representations, we introduce graph discriminant embedding (Huang et al., 2018) into our algorithm, which combines supervised information to define intra-class and inter-class graph affinity matrices. We think if two EEG samples are closer in the original space, their representation coefficients will be close to each other. The compactness between samples of the same class and the separability between samples of different classes is the important knowledge in discriminant lowrank representations. To this end, we define affinity matrices E com and E sep to represent the similar relationship between intra-class and inter-class, respectively: whereN k () andÑ k () represent the k-nearest neighbor samples of intra-class and inter-class, respectively. The parameter t (t > 0) is the weight parameter used to adjust the correlation between two samples. We set t = 1 in this study.
Then we define the discriminant margin term ς 1 (L) on representation coefficients: where U = E com − E sep + εI, ε is a very small positive. Eq. (7) represents the intra-class compactness and the interclass dissimilarity in representation coefficients. Its essence is to excavate the local structural information representation coefficients. In addition, Eq. (7) can avoid the influence of the redundant information and noise of the original data.

Global Structure Term on Projection
We adopt the affinity matrix E to represent the correlation between two samples using supervised information. The element e ij in E is computed as: 1, if y i and y j are of the same class 0, otherwise To preserve the global discriminant information of the original data in the subspace, we introduce the global structure term on projection: where β is the regularization parameter.
The first factor i,j e ij Q T y i − Q T y j 2 2 in Eq. (9) is the global preserved component on projection. Obviously, when this component reaches the minimum, the distance of samples of the same class will be as close as possible in the projection subspace. The second component Tr(Q T YY T Q) in Eq. (9) is the PCA component on projection. Its goal is to ensure that the projecting data in the low-dimensional subspace can depict the inherent structure information contained in the original space.

Least Squares Regression Term
As an effective supervised learning method, LSR learns the linear projection that transforms the sample to the label space, and obtains the regression vector as the data representation in the label space (Zhao et al., 2022). Therefore, we try to find a projection matrix with the help of LSR in the low-rank representation. Different from the traditional projection method on the original data, the DSLRR algorithm only uses clean data representation to learn the projection matrix in the low-rank representation framework, which can not be affected by the redundant information of EEG data. This idea can be obtained as: where γ and η are regularization parameters. Equation (10) tries to minimize the least squares loss between the regression results and the corresponding regression target. In addition, in the low-rank representation framework, the compact representation of the data can be learned through subspace projection.

The Objective Function
We integrate Eqs (7), (9), and (10) into a learning model, and obtain the objective function of the DSLRR algorithm: where α and µ are regularization parameters. From Eq. (11), we can see that the DSLRR algorithm combines subspace learning and low-rank representation into a learning model. Based on low-rank representation learning, the compact and discriminant low-rank representation can be reinforced by graph discriminant embedding. Based on subspace learning, the discriminant projection can be obtained by LSR, global structure preserved, and PCA technologies.

Optimization
There are three unsolved parameters {Q, L, S} in Eq. (11). To make Eq. (11) separable, the relaxation matrix is introduced to represent L. Substitute the constraint V = Q T YL into Eq. (11), Eq. (11) can be re-written as: (12) We optimize three parameters by the inexact augmented Lagrange multiplier algorithm in an iterative optimization strategy (Kang et al., 2015). Eq. (12) has the following form: min Q,L,S, where δ is a trade-off parameter. The matrices τ a ∈ R d×n , τ b ∈ R d×n , τ c ∈ R d×n , and τ d ∈ R d×n are the Lagrange multipliers.
1) Optimize Q, while fixing the other parameters. Eq. (13) can be written as: (14) We can get the closed-solution of Q as: 2) Optimize , while fixing the other parameters. Eq. (13) can be written as: We use the singular value thresholding operator (Cai et al., 2010;Li et al., 2017) to solve Eq. (16). We employ the singular value decomposition algorithm on L + 1 δ τ c as L + 1 δ τ c = H , where H is the diagonal matrix with its element being a group of singular values { k }, 1 ≤ k ≤ p, p is the rank. The matrix can be computed by = H (1/δ) , in which (1/δ) = diag({ k − 1 δ } + ), where "+" means the positive part. 3) Optimize L, while fixing the other parameters. Eq. (13) can be written as: Let the first derivative of L in Eq. (16) be zero, we have, We can get the closed-solution of L as: 4) Optimize S, while fixing the other parameters. Eq. (13) can be written as: According to the theory of (Liu et al., 2013), we can obtain the S by where τ i is the ith column vector of the matrix τ a .

Testing
Given test EEG data Y test , we first compute its low-rank representation L test using Eq. (11), while setting parameters γ =0, α =0, and µ =0. Second, we construct the new training set YL and test set Y test L test . Third, we use the training set YL to train a classifier and build a classifier to predict the label of Y test L test . In this study, we used nearest neighbor (NN) algorithm as the classifier. The whole training and testing procedure for EEG data recognition are summarized in Algorithm 1. Optimize Q using Eq. (15) with , L, and S fixed, while setting γ =0, α =0, and µ =0; Optimize using Eq. (16) with Q, L, and S fixed, while setting γ =0, α =0, and µ =0; Optimize L using Eq. (22) with , Q, and S fixed, while setting γ =0, α =0, and µ =0; Optimize S using Eq. (24) with , L, and Q fixed, while setting γ =0, α =0, and µ =0;

Until Eq. (13) convergence
Obtain the new test data Y test L test ; // Train a classifier and predict the class label Train a classifier using training data YL (such as NN classifier, support vector machine); Test and output the class label of Y test L test using the trained classifier.
Due to the limited training EEG samples, we expand the EEG data with the data augmentation strategy. The number of EEG samples in HC, MCI, and AD is 69, 74, and 98, respectively. In this section, the experiments are conducted on four EEG datasets for AD and MCI recognition, namely, (1)  The ratio of the two classes of samples is 1:1. We randomly select 50 samples in each class for model training, and the rest samples are used for testing. We perform our experiments 10 times and record the classification performance in terms of accuracy, sensitivity, specificity, precision, F-measure, G-mean, and Jaccard. All experiments are conducted by MATLAB on a Windows machine.

Classification Results
The classification results in four EEG datasets are reported in Tables 1-4, where the best results are highlighted in bold. These four data sets are binary classification problems. According to the results in Tables 1-4, we can see that: (1) Alzheimer's disease is a population suffering from AD, which has shown clinical symptoms. The EEG signal differentiation between AD and healthy people is the most significant, and the difference between EEG features is more obvious. Therefore, the classification performance in the dataset of AD and HC is high. Although the symptoms of MCI are not as significant as those of AD, there is a certain probability of AD. The difference between the EEG features and those of healthy people is also significant, and the difference between EEG features is also obvious, so the classification performance in the dataset of MCI and HC is also high. In addition, AD and MCI are mixed into one class in the third dataset of HC and (AD+MCI), which   is significantly distinguishable from healthy EEG signals. Therefore, its classification performance is expectable. The classification accuracy of DSLRR algorithm in AD and HC is 97.74%. The classification accuracy of the DSLRR algorithm in MCI & and HC is 95.61%. The classification accuracy of the DSLRR algorithm in HC and MCI+AD is 98.42%. The classification accuracy of these three datasets is above 97.26%. The experimental results illustrate that DSLRR can better identify MCI and AD from HC.
(2) Compared with the first three datasets, the difference between EEG features between MCI and AD is relatively low. Therefore, the classification performance of each algorithm decreases to a certain extent in the MCI & AD dataset. However, we can see that the DSLRR algorithm still The bold values mean the best performance results. achieves the best values of accuracy, F-measure, G-mean, and Jaccard. On the one hand, through the joint learning of subspace and low-rank representation, the DSLRR algorithm can learn the robust and discriminant projection subspace. On the other hand, by making full use of Laplace manifold and LSR technologies, the DSLRR algorithm can exploit the structure knowledge and manifold structure information of EEG signals. Furthermore, the sum of the  columns of each low-rank coefficient matrix L of 1 has a positive effect on the classification.
(3) The LRR algorithm can describe the correlation of data, and the coefficient matrix is low rank. However, this algorithm doesn't consider the local structural characteristics of the data, and often cannot effectively exploit the discriminant information in the data. In this case, the LRR algorithm is not directly applicable to the EEG classification for AD recognition. The JSLC algorithm achieves good results in four datasets. JSLC is a low-rank representation model based on dictionary learning, which integrates discriminant information of samples into dictionary learning, and can also eliminate the influence of noise information on the classification model. This result shows that joint learning of low-rank representation and subspace learning is an effective means to solve EEG classification. The NRLRL algorithm conducts low-rank learning in the original data space. Its classification performance is lower than DSLRR in four datasets, which further shows that more data dimensions may not improve model performance. Due to the redundant information and noise in EEG data, it is effective to obtain the compact and discriminant feature representation through subspace learning and lowrank representation.

Ablation Experiment
The DSLRR algorithm integrates discriminant margin term, global structure term, and LSR term on the basis of the LRR algorithm. To verify the role of these terms, we performed ablation experiments on four EEG datasets. For discriminant margin term, its purpose is to use supervised information to establish graph embedding, to improve the distinguishing ability of the model. To verify its effect, we remove this item from Eq. (11), that is, set the parameter µ = 0. For global structure terms, their purpose is to preserve the structure information of data in subspace. To verify its effect, we remove this item from Eq. (11) by setting the parameter α = 0. For the LSR term, its purpose is to use the least square constraint to utilize the discriminant information in the data. Similarly, to verify its effect, we remove this item from Eq. (11), that is, set the parameter γ = 0. The classification accuracy, F-measure, and G-means of DSLRR with an ablation experiment in four EEG datasets are shown in Figures 2-4, respectively. From the results in Figure 2, we can see that if any one of three terms is removed from Eq. (11), the classification accuracy in the four EEG datasets has decreased to varying degrees. This is because each term has a corresponding contribution to the EEG classification task, which also illustrates the necessity of the coexistence of these three terms from another perspective. The results in Figures 3, 4 show that this conclusion is well verified. Therefore, the lack of any term will degrade the performance of the DSLRR algorithm.

Parameter Analysis
To show the convergence of the DSLRR algorithm, we plot its convergence curve in Figure 5. As shown in Figure 5, the DSLRR algorithm converges quickly in several iterations across four EEG datasets. The results show that the DSLRR algorithm is acceptable in the running time, which shows that the DSLRR algorithm has high practical worthiness. We plot the classification accuracy of the DSLRR algorithm with different k-nearest neighbors in Figure 6. Figure 6 visually shows that the classification is mildly sensitive to k. The DSLRR algorithm can achieve good classification accuracy when the parameter k is in the range of [5,7,9]. When k is <5 or k is greater than 9, the classification accuracy is slightly lower. Therefore, we can fix k = 7 in the experiment.

CONCLUSION
With the emergence of global aging, the prediction and diagnosis of AD have attracted extensive attention. In recent years, EEG technology has been developed and has become an important means to detect abnormal brain activity in patients with AD. To realize the early diagnosis of AD, we propose the DSLRR learning algorithm. The DSLRR algorithm inherits the advantages of low-rank representation, removes redundant information and noise, and improves the discriminant ability of low-rank representation through graph discriminant embedding. Meanwhile, based on subspace learning, the DSLRR algorithm introduces LSR and global structure preserving constraints to further improve the discriminative ability of the model. Extensive experimental results on real EEG data verify the effectiveness of the DSLRR algorithm.
In the future, we will continue to explore our work in the following aspects. First, the DSLRR algorithm is essentially a linear learning method. The brain is a nonlinear system with the ability of self-adaptation and self-regulation. Under some internal or external stimuli, the regulation and application functions of biological tissue will inevitably affect the electrophysiological signals, so that neurons have chaotic discharge phenomena, which present nonlinear characteristics. This makes the DSLRR algorithm unable to exert its performance in complex EEG data. To this end, we consider introducing a nonlinear learning model to improve the stability and accuracy of the DSLRR algorithm, so that it can be better suitable for various complex application scenarios. Second, the DSLRR algorithm is suitable for EEG classification using single-feature information. At present, the technologies of feature processing and feature exaction are more mature, and the obtained feature information is correspondingly more diverse. In the next stage, we will extend the proposed algorithm to multi-feature scenarios to form a richer AD recognition system. Third, with the popularization of EEG acquisition equipment, using the existing labeled samples to analyze the unlabeled samples in multiple domains is a difficult problem in EEG-based AD recognition. We will use transfer learning technology to extend our algorithm in the future, to further enhance the generalization of the algorithm.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. The EEG dataset analyzed in this study can be found in: https://github.com/tsyoshihara/Alzheimer-s-Classification-EEG.

AUTHOR CONTRIBUTIONS
TT, XG, and JX conceived and developed the model and wrote the manuscript. HL and GZ ran the experiment and analyzed the results. All authors read, edited, and approved the manuscript.

FUNDING
This work was supported in part by the Science and Technology Project of Changzhou city under grant CE20215032.