Relevant Characteristics Analysis Using Natural Language Processing and Machine Learning Based on Phenotypes and T-Cell Subsets in Systemic Lupus Erythematosus Patients With Anxiety

Anxiety is frequently observed in patients with systemic lupus erythematosus (SLE) and the immune system could act as a trigger for anxiety. To recognize abnormal T-cell and B-cell subsets for SLE patients with anxiety, in this study, patient disease phenotypes data from electronic lupus symptom records were extracted by using natural language processing. The Hospital Anxiety and Depression Scale (HADS) was used to distinguish patients, and 107 patients were selected to meet research requirements. Then, peripheral blood was collected from two patient groups for multicolor flow cytometry experiments. The characteristics of 75 T-cell and 15 B-cell subsets were investigated between SLE patients with- (n = 23) and without-anxiety (n = 84) groups by four machine learning methods. The findings showed 13 T-cell subsets were significantly different between the two groups. Furthermore, BMI, fatigue, depression, unstable emotions, CD27+CD28+ Th/Treg, CD27−CD28− Th/Treg, CD45RA−CD27− Th, and CD45RA+HLADR+ Th cells may be important characteristics between SLE patients with- and without-anxiety groups. The findings not only point out the difference of T-cell subsets in SLE patients with or without anxiety, but also imply that T cells might play the important role in patients with anxiety disorder.


INTRODUCTION
Systemic lupus erythematosus (SLE) is the most common complex autoimmune disease characterized by chaos in the immune system, dysfunctions, and disordered proportions of immune cells (1). Patients with autoimmune diseases appear to be associated with increased risks of psychotic disorders (2). The epidemiologic study has revealed striking links between several autoimmune diseases and psychosis (3). Compared with other autoimmune diseases, such as sicca syndrome (PSS), SLE patients are more likely to suffer from anxiety disorders (12 vs. 4%) (4). Anxiety appears as a general and major stress disorder for SLE patients, resulting in a worse prognosis, more serious mental diseases, and even suicide (5).
Increasing evidence shows that SLE patients with anxiety may be a cognitive dissonance caused by abnormal activations of immune systems (6), and can be characterized by high levels of pro-inflammatory cytokines (7), such as TNF, IFN-γ, IL-10, IL-6 (8,9). Some cytokines are not only increased significantly in the serum of anxious patients (10,11) but also in SLE patients with anxiety (12,13). However, cytokines are produced by a variety of cells including immune and many other types of cells, such as T cells, B cells, endothelial cells, etc. (14,15), which may not directly reflect the relationships between the disordered proportions of immune cells and mental illness. Research in psychoneuroimmunology has demonstrated that the status of immune system, especially the proportion of immune cells, could influence psychological stress (16), and the immune-brain interactions play vital roles in the initiation and development of psychiatric disorders (17). An increase of CD4 + T-cells and damage of the amygdala are found in anxious mice (14), and the decrease of CD3 + T-cells may indicate an improvement of cognition in SLE mice (13). Thus, the investigation for the associations between subsets of T-cell and anxiety in SLE patients could give rise to the study of the mechanism of SLE patients with psychotic disorders and ongoing inflammatory processes. Nevertheless, we also did a conventional analysis of the B-cell subsets.
In this study, we surveyed the distribution of 75 subpopulations of T-cell and 15 subpopulations of B-cell from 107 SLE patients using flow cytometry and investigated their differences between patients with-(n = 23) and without anxiety (n = 84). Moreover, machine learning methods were used to establish models combining clinical information, laboratory indicators, and disease phenotypes for further selecting important characteristics of SLE patients with anxiety (SLE-A group) and SLE patients without anxiety disorders (SLE-NA group). Our results demonstrated that several characteristics including BMI, fatigue, depression, unstable emotions, and CD27 + CD28 + Th/Treg, CD27 − CD28 − Th/Treg, and other characteristics which can be used as objective indexes for judging SLE patients with anxiety and make up for the deviation of subjective consciousness in scales survey.

PATIENTS AND METHODS
Study design and analysis plan flow diagram was shown in Figure 1.

Sample Collection
All procedures in this trial, including sample collection, processing, freezing, laboratory analysis, etc., were performed according to the principles of laboratory practice with established standard operating procedures and protocols in the research center of clinical medicine at the Affiliated Hospital of Nantong University. The study, including the human tissue collection, was approved by the Ethics Committee of the Affiliated Hospital of Nantong University (2017-K003), and written informed consent was obtained from all of the participants, according to the Declaration of Helsinki. The study was conducted during 2019-2020.
In total, 107 SLE patients, which met the diagnostic criteria of the American Society of Rheumatology (v1997, v2012), were enrolled in this study from the Affiliated Hospital of Nantong University, China. The exclusion criteria included: other autoimmune diseases and active infection (including hepatitis B or C virus, Epstein-Barr virus, human immunodeficiency virus, or Mycobacterium tuberculosis infection); patients suffering from other autoimmune diseases and other severe mood disorders; family history of genetic diseases; cognitive impairment or inability to understand the researchers' words; major personal or family events in the past 2 months. The hospital Anxiety and Depression Scale (HADS) was used to assess the mental status of the patients (anxiety status was diagnosed by the HADS scores≥ 8).

Clinical Information
The clinical information, such as age, gender, body mass index (BMI), course, etc. was collected by asking patients questions. Compared with the Self-Rating Anxiety/Depression Scale and Beck Depression Inventory (BDI) etc., HADS has the advantages of lower time cost and higher accuracy in assessing rheumatic mental illness (18,19) and has better reliability and validity in SLE patients with anxiety (20). It was used to assess the mental status of the patients (depression status was diagnosed by the HADS scores ≥ 8). The Pittsburgh Sleep Quality Index (PSQI) was employed to evaluate the quality of SLE patients (more than 5 points was assigned the status of sleep disorder). The Multidimensional Fatigue Inventory Scale (MFI-20) was employed to evaluate the fatigue of SLE patients (the higher the score, the more tired the patient felt). The Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) was employed to evaluate the activity of SLE (more than 4 points were assigned active status of disease). All clinical information was deposited in Table 1.

Laboratory Indicators
Laboratory indicators including routine blood tests, liver and renal functions biochemical examination, and complement 3/4, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), etc. were recorded. Since not all patient examination items were the same, after consulting a specialist, we deleted most of the missing indicators and unimportant indicators for patients. All laboratory indicators were deposited in Supplementary Table 1.

Disease Phenotypes
We sorted out the electronic records whose content referred to the part of the SLE Symptom Checklist (SCC) (21) that recorded the main phenotypes of 107 patients, and removed part of the obvious noise that interfered with the word segmentation statistics. Among them, three patients had no records. The R package named jiebaR, an efficient R language Chinese word segmentation package, was used to segment the electronic phenotype records of the 107 patients. The main phenotypes of SLE patients were included in a custom dictionary. We extracted the keywords whose parts of speech were nouns and adjectives in each group of each electronic record, counted the frequency of the two groups of keywords, and did a statistical difference analysis. All disease phenotypes were deposited in Table 2.

Flow Cytometry
For each SLE patient, 5-10 ml peripheral blood was collected using heparin sodium anticoagulant tube and transported to the laboratory under 4 • C. Peripheral blood mononuclear cells (PBMCs) were collected by Lymphoprep (Axis-Shield) density gradient centrifugation. Antibody information for types of immune cells used in the selection process of flow cytometry (FACS Fortessa from American BD company) is shown in Supplementary Table 4. Red blood cells were lysed with Red Blood Cell Lysis Buffer (American BD company). Fixation Buffer (American Biolegend company) was added to fix PBMCs and using Macs Buffer (1xPBS add 1%FBS, 2.5 ml EDTA come from American GIBCO company) washed out the excess reagent during the whole experiment. FlowJo V9 was used for data analysis. All information was deposited in Supplementary Tables 2, 3.

Statistical Analysis
All analyses were performed using R language if not specially declared. The differences between SLE-A and SLE-NA groups were elevated by an independent sample of T-test when the samples of different groups were from the same normal distribution (with the same variance), otherwise, the Mann-Whitney U test was used in the cases. The level of P < 0.05 was used to evaluate the statistical significance.

Model Construction
We matched clinical information, laboratory indicators, disease phenotypes, and cell subpopulation data separately to construct five types of original data sets. Then, the use of the original data sets was compared and the oversampling and the algorithm of Syntic priority oversampling technology (SMOTE) methods (22) which was used to make up for the imbalance in the number of cases included in the data set on various models, including Lasso regression (LR) (23), Random Forest (RF) (22) and XGBoost (24). When applying the oversampling and SMOTE method, the SLE-NA group to 84 cases (+10 cases) and the SLE-A group to 70 cases (±10 cases) were always controlled in proportion. Each data set was subjected to five-fold cross-validation, and finally the average of the five areas under receiver operating characteristic curves (AUC) values were used as the final judgment criterion to initially screen out the best combined data set, sampling method, and machine learning model. After that, the data set was rebalanced by the best sampling methods, then divided into training and testing data by a ratio of seven to three, and the final model was rebuilt. According to the characteristics weight ranking, the top 25 characteristics were selected. In processing data, we replaced the vacancy value of each data set with the median value. The main R packages are shown in Supplementary Table 5.

SLE Patients With Anxiety Had Higher BMIs and Associated Symptoms of Depression and Fatigue as Well as Unstable Emotions and Weak
In this study, 107 SLE patients were recruited during 2019-2020, 23 of which were coupled with anxiety disorders (SLE-A group). The incidence rate of anxiety in SLE patients showed slightly higher than other diseases (21 vs. 4%, Table 1), which was consistent with the previous study (4). Interestingly, no statistically significant differences were observed between SLE-A and SLE-NA (84 SLE patients without anxiety) groups in age, gender, education level, marital status, place of residence, SLEDAI score, and other laboratory indicators (see Supplementary Table 1). BMI, income, sleep disorders, depression, and fatigue were significantly different between the two groups (see Table 1). Among our SLE patients, the BMI of the SLE-A group was higher than that of the SLE-NA group (P < 0.05), the income was higher than that of the SLE-NA group, and they were more prone to sleep disorders and psychological problems such as depression and fatigue. There had disease phenotypes that were also signs of psychological problems. For example, the SLE-A group would feel more weak, feel uncomfortable in the eyes, have less appetite, and present unstable emotions (see Table 2), which might mean that SLE patients with anxiety may exist unique external manifestations.

Dramatical Alteration of γδ2T and 12 Subsets of Th and Treg Cell Subsets in SLE-A Patients
Previous studies demonstrated that the dysregulated immune response and an abnormal subpopulation of immune cells play a critical role in the mental complications of SLE patients. We wondered whether the subsets of T cells were involved in the anxiety disorder for SLE patients. The populations of major types of T cells, such as αβT and γδ1T cells, showed no statistically significant differences between the two groups, except γδ2T (see Supplementary Table 3 and Table 3). Supplementary Table 3). Notably, the abundance of CD27 + CD28 + and CD45RA + HLADR + Th cells in the SLE-A group was significantly lower than that in the SLE-NA group, while the proportion of CD27 − CD28 − , CD45RA − HLADR − , and CD45RA − CD27 − Th cells showed a significant increasing trend. In addition, we observed that the proportion of CD27 + CD28 + Treg cells in the SLE-A group was significantly reduced compared with the SLE-NA group. At the same time, a significant increase was noticed in the four subgroups of Treg cells in the SLE-A group, CD27 − CD28 − , CD45RA − HLADR − , CD45RA − CD27 − , and PD1 − CD28 − Treg cells. Interestingly, the CD27 + CD28 + Th/Treg subpopulations of the SLE-A group were significantly reduced, while the abundance of CD27 − CD28 − Th/Treg cells was greatly increased (see Table 3). We also investigated the subpopulations of the main types of B cells and found no differences between the two groups (see Supplementary Table 2).

BMI, Fatigue, Depression, Unstable
Emotions, CD27 + CD28 + Th/Treg, CD27 -CD28 -Th/Treg, CD45RA -CD27 -Th, and CD45RA + HLADR + Th Cells May Be Important Characteristics of SLE Patients With Anxiety Based on these observations, 13 T cell subsets were significantly different between the SLE-A and SLE-NA groups, and we further explored their ability to predict SLE patients with anxiety. Clinical information, laboratory indicators, and disease phenotypes were also considered. We combined different types of data sets respectively, used machine learning methods (including LR, RF, and XGBoost) and different sampling methods (oversampling and SMOTE algorithm) to choose the best data set, sampling method, and model. All data sets were crossvalidated five times on the model and the result was the average of the five AUC values. In the end, we found that XGBoost performed the best (AUC value was 0.88) in the data set that cell subsets combined with clinical information was balanced through oversampling. The same performances were the cell subpopulations combined with clinical information, laboratory indicators, and disease phenotypes, which was balanced through the SMOTE algorithm or oversampling, and the AUC values of XGBoost were also 0.88. The three results were the highest among all model results (see Table 4). We initially screened out the best sampling methods (oversampling and SMOTE algorithm), the best data set (cell subsets combined with clinical information and cell subsets combined with clinical information, laboratory indicators, and disease phenotypes), and the best model (XGBoost). The rebalanced data set was no longer five-fold cross-validation but a ratio of seven to three. We found that cell subsets combined with basic clinical information, laboratory indicators, and disease phenotypes, balanced through SMOTE, was shown to perform best (see Table 5). The AUC value of XGBoost was 0.922, which was much higher than the models established on the cell subpopulations data combined with the clinical information through oversampling (AUC value was 0.866) and cell subsets combined with basic clinical information, laboratory indicators, and disease phenotypes through oversampling (AUC value was 0.815). We selected the top 25 characteristics of the model. Finally, to list the characteristics related to SLE-A and SLE-NA group, there were 22 different characteristics by independent sampling with T-test, etc., and the range of 22 main difference characteristics was further reduced to 10 by XGBoost analysis. BMI, fatigue, depression, unstable emotions, and CD27 + CD28 + Th/Treg, CD27 − CD28 − Th/Treg, CD45RA − CD27 − Th, and CD45RA + HLADR + Th cells may be important characteristics of the differences (see Table 6).

DISCUSSIONS
Psychoneuroimmunology research has revealed strong associations between dysfunctions of the immune system and mental disorders (16). Several psychiatric disorders, such as schizophrenia, had been suggested to be classified as an autoimmune diseases based on the associations between the remitting-relapsing phenotype of the illness and activationrepression of immunological processes (25). Our findings showed that the incidence of SLE patients with anxiety was 22%, much higher than that of normal adults (10-14%) (26) and more likely to have symptoms of fatigue and depression. Anxiety, depression, and fatigue often co-exist as manifestations of each other (27,28). Previous studies have shown that fatigue is strongly associated with anxiety and depression in SLE and RA patients (29)(30)(31). Increasing evidence of the role of inflammation in mental illness and the link between autoimmune diseases and mental illness is helping to expand the field of immunopsychiatry and have an impact on patient treatment and outcomes. Among inflammatory phenotypes characterizing patients with depression and anxiety, the patients have autoimmune diseases to be at high risk (4,32,33). In particular, SLE patients with anxiety and depression have specificity disease phenotypes that involve multiple organs (34). This supports one point at least, that in SLE patients with anxiety, the management of depression, fatigue, and anxiety may not only be a separate entity but a whole way of looking at treatment better. Similar to patients with generalized anxiety disorder (35), SLE patients with anxiety could also experience unstable emotions. In addition, we found that almost all SLE patients were women, and the average BMI of the SLE-A group was abnormal or obese. It was also traceable that obesity is an important risk factor for anxiety and depression (36), and BMI is one of the predictors of severe anxiety symptoms in women, but this result did not appear in male controls (37). Interestingly, half of the SLE-NA group had lower income than the SLE-A group, which was mainly due to the lower number of patients in the SLE-A group, and the fact that some patients did not disclose their actual income. Statistically significant differences in 13 T cell subsets between the SLE-A and SLE-NA groups were found in our study (see Table 3). Among these immune cells, γδ2T cells are involved in the development of autoimmune diseases, including SLE (38). Our results suggested that γδ2T cells may also be involved in psychiatric disorders. Meanwhile, CD4 + CD27 + CD28 + Th and Treg cells were significantly reduced in SLE patients with anxiety. Previous studies have reported that the number of CD4 + CD27 + CD28 + cells in elderly patients with psychosis is lower than that in young patients (39). We found that the proportion of these cells might also be involved in SLE anxiety complications. Other immune components, such as CD27 − CD28 + , CD45RA − CD27 − , and CD45RA − HLADR − Th cells, might be risk factors for psychiatric disorders associated with autoimmune diseases. Cell subsets were entirely concentrated on CD4 + T cells rather than CD8 + T cells, which was similar to the results of the previous study (14).
Our final results were BMI, fatigue, depression, unstable emotions, and the proportion of CD27 + CD28 + Th/Treg, CD27 − CD28 − Th/Treg, CD45RA − CD27 − Th, and CD45RA + HLADR + Th cells (see Table 6). The overweight body shape and the negative emotions caused the extremely poor mental state, which may cause a lack of confidence and increase social fear and psychological burden. When patients are under long-term malignant psychological stress state continuously, their bodies will be in a state of systemic low-grade inflammation activation for a long time (40), which destroys the amygdala, the first unit of information processing, resulting in the destruction of the blood-brain barrier (BBB) and the enhancement of permeability (41). SLE patients with mood disorders appearing amygdala injury have also been reported previously (42). The number of CD27 + CD28 + Th/Treg and CD27 − CD28 − Th /Treg subsets were important characteristics in our results. We believed that CD27 − CD28 − Th/Treg and CD45RA − CD27 − Th were subsets of cells containing effector memory Th/Treg cells and effector Th cells (43)(44)(45)(46), while CD27 + CD28 + Th/Treg was generally considered as naive Th/Treg cells, and CD45RA + HLADR + Th was summarized as memory stem Th cells. These double negative subsets of cells which are at the end of T cell development were significantly increased in the SLE-A group, but double-positive subsets of cells which are at the beginning of T cell development were significantly decreased. Effector Treg cells, similar to Th cells, secrete cytokines including IL-10, IFN-γ, IL-6, IL-1β, and IL-35 (8,9). The positive correlation of IFN-γ SLE patients with anxiety, but not with depression (12). The levels of cytokines such as IFN-γ, IL-6, IL-1α, and IL-12 in patients with generalized anxiety disorder were higher than those in the control group except for serum IL-10 (10, 11). These peripheral cytokine signals continuously stimulate the endothelial cells of the BBB, entering the human brain, and excessive or long-term active inflammatory cytokines can disrupt the expression of pro-inflammatory and anti-inflammatory phenotypes of various nerve cells, thus inducing anxiety and depression-like behavior (47). Therefore, we speculated that long-term malignant psychological stress state may lead to BBB destruction in SLE patients with anxiety, and effector memory Th/Treg cells and effector Th/Treg cells increase the secretion of more IFN-γ and IL-6 into BBB, enhancing the central inflammatory response, and thus causing anxiety.

CONCLUSION
Although we used SMOTE to compensate for the imbalance of the data sets and used machine learning to further select important characteristics, it was undeniable that the results of our virtual data, randomly created based on computer algorithms, might have some deviations from the actual situation. However, the final results of this study were based on the first step, using statistical methods to analyze the original data to produce significant differences, thereby ensuring that the subsequent results have a more specific reference value for other studies. In short, our research made full use of clinical information, laboratory indicators, and disease phenotypes combined with cell subsets data to study the relationship between T cells and SLE and anxiety through machine learning. Our findings indicated that the T cell subsets closely related to SLE (CD27 + CD28 + Th/Treg, CD27 − CD28 − Th/Treg, CD45RA − CD27 − Th, and CD45RA + HLADR + Th) may be involved in SLE patients with anxiety. The development indirectly supplements the results of previous studies, that is, like anxiety mice, CD4 + T cells also play an equally important role (14). BMI, fatigue, depression, and unstable emotions also suggest that SLE patients with anxiety have complex and multiple psychological problems, which should be considered as a whole during the subsequent treatment.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Affiliated Hospital of Nantong University (2017-K003). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
Z-fG, X-mZ, and CD: conception and design. X-xG, YJ, TF, and YY: analysis and interpretation of data. Z-fG: project administration. Z-fG and X-xG: drafting or revising the article critically for important intellectual content. X-xG, YJ, TF, X-mZ, TL, YY, RL, WZ, J-xG, RZ, CD, and Z-fG: final approval of the version to be submitted. TL and YY: supervision. RL, WZ, J-xG, and RZ: validation. X-xG, TL, and YY: software. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are very grateful to Professor Qiong Zhang for his valuable suggestions, comments, and revision with the manuscript.