A predictive model for disease severity among COVID-19 elderly patients based on IgG subtypes and machine learning

Objective Due to the increased likelihood of progression of severe pneumonia, the mortality rate of the elderly infected with coronavirus disease 2019 (COVID-19) is high. However, there is a lack of models based on immunoglobulin G (IgG) subtypes to forecast the severity of COVID-19 in elderly individuals. The objective of this study was to create and verify a new algorithm for distinguishing elderly individuals with severe COVID-19. Methods In this study, laboratory data were gathered from 103 individuals who had confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection using a retrospective analysis. These individuals were split into training (80%) and testing cohort (20%) by using random allocation. Furthermore, 22 COVID-19 elderly patients from the other two centers were divided into an external validation cohort. Differential indicators were analyzed through univariate analysis, and variable selection was performed using least absolute shrinkage and selection operator (LASSO) regression. The severity of elderly patients with COVID-19 was predicted using a combination of five machine learning algorithms. Area under the curve (AUC) was utilized to evaluate the performance of these models. Calibration curves, decision curves analysis (DCA), and Shapley additive explanations (SHAP) plots were utilized to interpret and evaluate the model. Results The logistic regression model was chosen as the best machine learning model with four principal variables that could predict the probability of COVID-19 severity. In the training cohort, the model achieved an AUC of 0.889, while in the testing cohort, it obtained an AUC of 0.824. The calibration curve demonstrated excellent consistency between actual and predicted probabilities. According to the DCA curve, it was evident that the model provided significant clinical advantages. Moreover, the model performed effectively in an external validation group (AUC=0.74). Conclusion The present study developed a model that can distinguish between severe and non-severe patients of COVID-19 in the elderly, which might assist clinical doctors in evaluating the severity of COVID-19 and reducing the bad outcomes of elderly patients.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has given rise to a worldwide pandemic known as coronavirus disease 2019 .The trends of the pandemic vary among different countries and regions.Clinical experiences have shown that COVID-19 is a highly heterogeneous disease, representing a range of clinical severity, from asymptomatic and mild to severe pneumonia, acute respiratory distress syndrome (ARDS), and even death (1,2).The first report of SARS-CoV-2 infections in the population was from China.Initial findings from China suggested that older age is associated with a higher likelihood of experiencing and suffering from COVID-19.Immunological senescence and inflammation play a severe role in contributing to older patients who are more prone to severe outcomes of COVID-19 (1,3).
IgG antibodies, also known as immunoglobulin G, offer a prominent means of protection against contagious illnesses.Antigen-IgG immune complexes could be formed when IgG antibodies bind directly to pathogens.During an infection, the inflammatory response is directed by these complexes of the immune system.Following viral infection, the initiation of IgGmediated effector control happens as reactive antibodies bind to viral particles (4).Chakraborty et al. (5) found that a greater number of individuals with severe COVID-19 have increased levels of particular pro-inflammatory antibody variants.These variants are identified by the presence of IgG 3 and IgG 1 antibodies with F0N0 glycoform modification.
Data mining algorithms and predictive analysis are the theoretical core of machine learning analysis, which is to identify individual features of data from machine learning, establish models through science, and subsequently utilize new data through these models to forecast future data (6).Machine learning (ML) is of great value in medical research and a number of studies have utilized machine learning as a tool that can be used to predict COVID-19 (7)(8)(9)(10).Nevertheless, some studies require medical imaging such as CTs and X-rays, and the parameters are relatively complex, and the influences of ionizing radiation are unavoidable (11,12).In addition, there is currently a lack of prediction models that consider IgG subtypes in COVID-19 patients, with the majority of existing models concentrating on the severity of the disease in ordinary individuals rather than the elderly population (13,14).Table 1 summarizes recent work on COVID-19 by machine learning algorithms.
As age increases, the probability of infection and the mortality rate of COVID-19 also increased.The elderly are particularly vulnerable to COVID-19 infection due to their weakened immune systems and the presence of other chronic diseases such as hypertension and diabetes.This study question thus highlights the therapeutic significance of early identification of COVID-19-related fatalities in elderly people.Because the immune response has such a large influence, it is also important to investigate immunological antibodies to distinguish between nonsevere and severe COVID-19 instances and to provide unique treatment approaches.
Therefore, this study developed a model utilizing IgG subtypes and machine learning to help clinicians distinguish the severity of 2. This study focuses on elderly patients over the age of 60 rather than ordinary individuals.
3. In this study, five machine learning algorithms are compared to predict the severity of elderly COVID-19 patients, and the logistic regression model demonstrated the highest prediction performance among them.
The structure of this research has been organized as follows.Section 2 shows the methods including patient involvement and dataset selection.Section 3 presents screening variables and optimal machine learning models to predict the severity of COVID-19 in elderly patients.Section 4 discusses the results.Section 5 discusses the limitations.Section 6 summarizes the article and the prospect of the next step.

Ethics statement
This study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University with approval number 2023-KLS-034-01.

Patient involvement
According to the standards of the China Novel Coronavirus Infection Diagnosis and Treatment Program (Trial 10th Edition) and the clinicians' diagnoses (15), we conducted a search for patients of non-severe and severe COVID-19 (age ≥60 years) diagnosed from 1 to 16 January 2023, in Zhejiang Provincial Hospital of Chinese Medicine (Hubin).Two groups were formed for the elderly patients with COVID-19, namely, non-severe and severe groups.Included in the study were a combined total of 41 cases classified as non-severe and 62 cases classified as severe.Patients in the severe group progressed to severe or critical COVID-19 or pneumonia-related deaths while hospitalized, whereas patients in the non-severe group remained in non-severe states (mild or moderate COVID-19) while hospitalized.Furthermore, 22 cases from Zhejiang Provincial Hospital of Chinese Medicine (Qiantang and Xixi) were collected as an external validation cohort from 1 to 16 January 2023.
Mild pneumonia with respiratory tract infection, such as dry throat, sore throat, cough, and fever, was the main manifestation.Imaging findings show characteristics of COVID-19 pneumonia, and abnormal clinical symptoms can be observed in moderate pneumonia.Patients are determined to have severe pneumonia if they meet any of the following criteria (1): a notable rise in respiration rate, with RR ≥30/min; (2) oxygen saturation of 93% or lower while at rest; (3) a PaO 2 /FiO 2 ratio of 300 mmHg or less (1 mmHg = 0.133 kPa); and (4) significant advancement of pulmonary lesions by more than 50% within 24-48 h, as observed through pulmonary imaging.Critical pneumonia occurs when the disease progresses rapidly with any of the following criteria: (1) respiratory insufficiency requiring mechanical ventilation, (2) shock, and (3) a combination of organ failure and monitoring in the ICU setting.The exclusion criterion was other viral pneumonia.

Data collection
Detailed information on the baseline population characteristics (age, gender, and comorbidities) and clinical laboratory data of these patients were meticulously gathered from their electronic medical records.Laboratory data include routine blood examinations, C-reactive protein, coagulation indicators, cytokines, and IgG subtypes.After enrollment, 103 elderly individuals were randomly assigned to the training cohort (80%) and the testing cohort (20%).By setting a random seed (random seed=1), the present study can ensure the repeatability of the random process, allowing us to accurately reproduce research results when needed.The best model hyperparameters selected were by grid search and carried out fivefold cross-validation.In the fivefold cross-validation, the dataset was split into five parts of approximately equal size: one of the five parts for testing and the remaining four parts for training.Fivefold cross-validation was cycled through the process five times.The models were constructed in the training cohort using laboratory tests and machine learning techniques and subsequently verified in the testing cohort.The external validation cohort was validated against the final filtered-out optimal model.

Statistical analysis
Analyses were performed utilizing SPSS 26.0 and R 4.3.1 software.Frequencies and percentages were used to present categorical variables, while mean ± standard deviation or median and interquartile range (IQR) were used for continuous variables.The c 2 test was used to analyze count data, while independent samples t-test or Wilcoxon test were used to analyze measurement data.
Significant differences between severe and non-severe groups were identified through a univariate analysis, followed by the utilization of least absolute shrinkage and selection operator (LASSO) regression to select the factors associated with COVID-19 severity.By cohort seed, we selected 80% of the patients for deriving the optimal model (training cohort), whereas the other 20% of patients were allocated to the validation cohort.Subsequently, the present study established predictive models using meaningful factors identified through LASSO regression.In both the training and validation cohorts, calibration plots were utilized to graphically evaluate calibration, while a receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were employed to assess calibration.The interpretation of the feature ranking was done using Shapley additive explanations (SHAP) plots.Statistical significance was determined by considering a p-value<0.05.

Machine learning
For the development of an ML-based algorithm, the Deepwise & Beckman Coulter DxAI platform utilized an online statistics tool.The platform has the capability to automatically select machine learning models, display the analysis data and generate a page of analysis online.

Demographic characteristics
The present study first compared IgG subtypes between COVID-19 elderly patients and healthy individuals 60 years of age and older.As can be seen in Table 2, there were significant differences in four subtypes of IgG between the two groups (p<0.05).
In order to conduct a more in-depth investigation, this study explored the distribution of IgG subtypes among elderly COVID-19 patients, distinguishing between those with severe symptoms and those with non-severe symptoms.The demographic characteristics of these patients are summarized in Table 3.This study consisted of 41 (39.81%) classified as non-severe and 62 (60.19%) classified as severe.There were 43 men (69.35%) and 19 women (30.65%) in the severe group, while there were 22 men (53.66%) and 19 women (46.34%) in the non-severe group.As shown in Table 3, there were no statistical differences in the non-severe and severe groups by gender (p=0.106>0.05), which was comparable.In terms of age, the severe group had a significantly higher mean age compared to the non-severe group (median, 84.50: 75.00), with a highly significant difference between the two groups (p<0.001).Older men had a significantly higher rate of severe COVID-19 compared to women.The present research aligns with the findings reported by Jin et al. (16), who described worse outcomes and deaths in men with COVID-19.The most prevalent comorbidity among severe patients was hypertension (66.13%), followed by diabetes (32.26%).Additionally, coronary heart disease, anemia, tumors, and COPD were present in 20.97%, 16.13%, 12.90%, and 9.68% of severe patients, respectively.

Comparison of biomarkers between nonsevere and severe COVID-19 patients
During the process of comparing the two biomarkers, the present study included each subtype of IgG and made pairwise ratios, which were also compared to IgG Sum, yielding several new indicators.As shown in Table 4, except for IgG 1/IgG 4, LY #, and HGB, the severe COVID-19 group exhibited significantly elevated levels of IL-2, IL-6, IgG 2/IgG 1, IgG Sum/IgG 1, IgG 2/IgG Sum, CRP, PT, INR, DD, WBC, NE #, NLR, RDW, and PDW in comparison to the non-severe COVID-19 group (p<0.05).

The correlation between biomarkers and COVID-19 severity in two groups
The present study collected 46 features from elderly individuals diagnosed with COVID-19, and after excluding unrelated and redundant features, 18 features were retained for LASSO regression analysis.To screen for factors associated with the severity of COVID-19, an analysis using LASSO regression was conducted.The results of 103 elderly patients showed that age, IL-2, IgG Sum/IgG 1, DD, LY #, NLR, and PDW were considered to be relevant factors affecting the severe degree of COVID-19 (Figure 1).Additionally, the present study generated correlation heatmaps and determined feature importance using the correlation factors chosen through LASSO regression.

Areas under ROC
In Figure 2, the ROC curves and AUC are depicted, representing various biomarkers with significant differences between the two groups in predicting severe COVID-19 elderly patients.Among them, NLR was the most efficient of these (AUC=0.790),followed by DD and LY # (AUC=0.760).

Correlation heatmaps and feature importance of biomarkers
After analyzing the importance of various features, the present study ultimately selected four indicators based on the number of

Comparison of machine learning algorithms and identification of the optimal model
The AUCs of five machine learning algorithms for fivefold cross-validation on the training cohort are shown in Table 5.In the testing cohort, the results of five machine learning algorithms show AUCs of 0.735 for eXtreme gradient boosting (XGBoost), 0.866 for logistic regression, 0.781 for random forest, 0.812 for adaptive boosting (AdaBoost), and 0.856 for support vector machines (SVMs).The logistic regression model demonstrated the highest prediction performance among these models.

Analysis and assessment of machine learning model
On the basis of the results shown in Table 6 and Figure 4, it can be observed that the logistic regression model exhibited a strong discriminatory ability in distinguishing between two groups.In the  testing cohort, the model demonstrated AUC, accuracy, specificity, and positive predictive value exceeding 80% (Figures 4A, B).Moreover, the calibration curve demonstrated a strong correlation between actual and predicted probabilities, indicating excellent calibration of the model.According to Figures 4C, D, the DCA curve indicated a strong clinical benefit of the model.Figure 5A shows the relationship between the observed values of the four most relevant features that we selected and the SHAP values.As shown in Figure 5B, the logistic regression model interpretation of feature ranking, as per the SHAP algorithm, indicates that age, DD, IL-2, and IgG Sum/IgG 1 were the most influential characteristics for predicting outcomes of elderly patients.The greater the mean absolute Shapley value of the features, the greater the importance of the clinical features for the model prediction.Using SHAP force plots, the study can visualize the Shapley value for each feature as a force that increases (positive) or decreases (negative) its baseline predicted value.Figure 5 shows the individual force plots for severe patients with COVID-19 (Figure 5C) and non-severe patients with COVID-19 (Figure 5D).The probabilistic predicted value of the severe group was 0.759.The positive contribution value features in red represent pushing up the model score, while the negative contribution features in blue represent pushing down the model score.The length of the arrow helps to visualize the extent of the impact on the prediction.The longer the arrow, the greater the impact on the prediction of COVID-19 severity.

External validation of logistic regression model
A total of 22 elderly COVID-19 patients were collected from other two centers as an external validation cohort.The AUC of the newly built model was 0.74, as demonstrated in Figure 6, using the validation cohort from an external source.

Compared with different levels of clinicians
Using a logistic regression model, the present study compared the performance of four clinicians (including two junior clinicians and two senior clinicians) in predicting the severity of elderly COVID-19 patients.Figure 7 demonstrates the performance comparison between the logical regression model and the human diagnosis of elderly COVID-19 patients.Among the results, the logistic regression model had an accuracy rate of 0.875, which is significantly higher than that of senior clinicians (0.8375) and junior clinicians (0.7375).The newly built model also performed better than human classifiers in terms of F1-score, recall, and precision.

Discussion
COVID-19 is spreading throughout the world at a high speed.Although the majority of individuals have modest symptoms and a favorable prognosis, COVID-19 could progress to ARDS and possibly death.The risk of contracting COVID-19 is higher among the elderly, and they experience more severe symptoms compared to other age groups (17,18).Effective COVID-19 treatments are still lacking (19,20).Currently, several models have been suggested for forecasting the severity of COVID-19, with the majority concentrating on ordinary patients, while limited emphasis has been placed on elderly patients (13,21,22).Therefore, a predictive model for monitoring disease progression and forecasting the severity of COVID-19 in elderly individuals is urgently needed.
In recent years, machine learning has been developing rapidly, which has been widely used in predicting human diseases (23,24), recognizing medical images (25,26), and analyzing clinical laboratory data (27).ML can help humans efficiently process large amounts of clinical data and look for connections between different laboratory results.As medical laboratory practitioners, what are we trying to do through machine learning to help clinicians differentiate the severity of elderly COVID-19 patients?
In this study, age, IL-2, IgG Sum/IgG 1, and DD were identified and utilized in the development of the model.Through evaluation using the AUC value, calibration plot, and DCA plot, the model demonstrated good discrimination and calibration in predicting severe and non-severe cases of COVID-19 in elderly patients.This indicates a strong performance and higher clinical utility.Furthermore, the model performed effectively in both the testing  Patients with comorbidities have been shown to be more likely to present with severe pneumonia (28).The present study found no statistically significant difference in tumor, diabetes, hypertension, coronary heart disease, COPD, and anemia between the two groups of different severity (p >0.05).In this study, 63.11% of patients were male, which was similar to the proportion of men (67.68%) reported by Chen et al. ( 2).Additionally, it was observed that severe patients tended to be significantly older compared to non-severe patients.
Among the common laboratory abnormalities, this study observed an increased total leukocyte count, increased NE #, and decreased LY # in severe patients.Pneumonia progression in elderly individuals with COVID-19 was influenced by elevated NLR and age, as reported in a study (29).This corresponds with the results of the present study.The differences in NE #, LY #, and NLR were B A FIGURE 3 (A) Feature importance of seven parameters selected by LASSO regression.(B) Heatmap of correlation of four parameters, where one variable is plotted on the x-axis and the other on the y-axis for both severe elderly and non-severe elderly patients; antique white for positive correlation and black for negative correlation.statistically significant compared with the non-severe group (p<0.001),while the difference in total leukocyte count was statistically significant (p =0.011<0.05).RDW reflects the level of a size change between red blood cells; Lee et al. (30) found a potential association between it and the risk of death in COVID-19 patients, while the present study reveals that RDW was greater in severe elderly patients compared to non-severe individuals (median, 14.05: 13.20, p<0.01) and also suggest that elevated RDW levels are associated with adverse outcomes in elderly patients.Interestingly, it is worth noting that PDW was a significant indicator of severe cases of COVID-19.PDW is utilized to depict the distribution of PLT volume, and when PLT is excessively consumed, the bone marrow produces abundant immature PLT that is larger than mature PLT.PDW is also significantly associated with sepsis and other severe illnesses, which is closely linked to poor COVID-19 outcomes and death (31,32).In this study, the severe group showed a larger PDW, with a mean of 17.35 versus 16.88, which was significantly different from the non-severe group (p =0.001).
During the stage of systemic inflammation in COVID-19, there is a significant increase in inflammatory biomarkers like IL-2, IL-6, and CRP, which are dramatically enhanced.This stage represents the most severe manifestation of cytokine storms, and excessive inflammation may lead to multiple organ dysfunction (33)(34)(35).
According to recent research, IL-6 has been identified as a predictive factor for the early detection of COVID-19 patients who are at a heightened risk of experiencing worsening disease progression (36, 37).Elevated IL-2 levels observed in individuals with COVID-19 could potentially suggest the activation of T cells (38).In this study, the levels of IL-2, IL-6, and CRP in the severe group were significantly higher than those in the non-severe group (p<0.01).Research has indicated that individuals with severe COVID-19 often experience a high prevalence of coagulation abnormalities (39).Recent pathological results show that immune thrombosis in these patients gathers inflammatory cells such as lymphocytes and neutrophils, and the immune thrombosis can develop into serious complications, which are strongly associated with the severity of the disease and mortality rates (40-42).In the present study, the levels of DD and PT were markedly elevated in severe patients than in non-severe individuals (p<0.01),consistent with the findings of Huang et al. (41) and Wang et al. (43).Elderly patients exhibit a continual inflammatory response and compromised coagulation after being infected with SARS-CoV-2, as evidenced by elevated levels of coagulation and inflammatory markers.Severe patients exhibited a greater degree of inflammation.
Of all antibodies against post-infection immunization, the IgG antibodies were the most prominent signature.This antibody not only marks the later stages of infection but also remains in the body for at least 6 months (44).IgG 1 is the most common IgG subtype, and viral infection usually induces both IgG 1 and IgG 3 (45).There were a few studies that reported the emergence of IgM and IgG antibodies when the SARS-CoV-2 virus invaded and suggested the application of serologic tests in the diagnosis of COVID-19 (46,47).However, there is limited documentation regarding the IgG subtypes that are generated following SARS-CoV-2 infection (48,49).According to Husain et al. (39), it was discovered that there could be a prevalence of abnormalities in IgG subtypes among severely ill COVID-19 patients, which should be further examined, as it could serve as an indicator of disease severity and a potential target for therapy.In the present study, significant variations in IgG subcategories were observed between healthy individuals and elderly COVID-19 patients (p<0.05).The study included IgG Sum/IgG 1 in the LASSO regression, which indicated that its predictor of COVID-19 severity in elderly patients outperformed individual IgG subtypes.Another important finding of the study was that IgG Sum/IgG 1 showed extremely significant differences between the two groups compared to IgG subtypes alone (p =0.009).The data show that the IgG 1 level of severe patients is significantly lower than that of non-severe patients (median, 5,614.50:6,645.00),and IgG3 levels are higher than non-severe patients (median, 247.50: 203.00), and this variation could be attributed to the

Limitations
There were some limitations in this study.First of all, this study consisted of only 103 elderly individuals diagnosed with COVID-19.The sample size of 103 patients may be considered small.In  further research, we will expand more participants and diversify the sample size from multiple sources to improve the generalization and performance of the model in different settings.Second, this model was built and verified using data from China.Patients from diverse nations and races in future studies need to be included to confirm the results.Moreover, there may be some inevitable bias, and clinicians' assessment of disease severity may be subjective, potentially leading to some overlap between the severity groups.Finally, the present study might have resulted in variations in the outcomes of elderly COVID-19 patients from different hospitals at distinct time points during the peak of the COVID-19 outbreak in the current year.In the future, we will optimize the model and correct the defects of our model based on the present study.

Conclusion
In conclusion, a model based on machine learning for predicting the severity of COVID-19 was constructed.Four indicators (age, DD, IL-2, and IgG Sum/IgG 1) are filtered to construct the model.Five machine learning models (XGBoost, AdaBoost, SVM, logistic regression, and random forest) were used on the same dataset to predict the severity of elderly COVID-19 patients.The logistic regression model demonstrated the best prediction performance among them.In addition, the present study conducted external validation of the model using data from two other centers.This model demonstrates excellent discrimination and calibration, making it readily applicable in clinical practice, may predict outcomes as early as admission, and could assist clinicians in estimating COVID-19 severity and improving elderly patient outcomes.In further research, we will collect further data and conduct a multi-center study to enhance the generalization of the model.In addition, we are working on developing an online website or an applet plugin based on our model to facilitate its use by clinical practitioners.This will provide an efficient and user-friendly interface for doctors to input patient symptoms and get predictive results from the model.

1
FIGURE 1 Predictors selection using LASSO regression analysis and 10-fold cross-validation.(A) Bias selection of the tuning parameter (lambda) in LASSO regression based on the minimum standard (left dashed line) and 1-SE standard (right dashed line).(B) A joint plot was created based on the loglikelihood.In this study, the selection of predictive factors was based on the 1-SE standard (right dashed line), resulting in the selection of seven non-zero factors.LASSO, least absolute shrinkage and selection operator; SE, the standard error.

FIGURE 2 ROC
FIGURE 2ROC curves for different biomarkers in predicting severe COVID-19 elderly patients.

4
FIGURE 4 Performance of the prediction model.(A) The training cohort's ROC curve; (B) the testing cohort's ROC curve; (C) calibration curve analysis; (D) decision curve analysis.

5
FIGURE 5The logistic regression model utilizing the SHAP algorithm.(A) The SHAP value, which indicates the level of impact on the result, is represented on the abscissa for each feature.A sample is represented by each dot.As the color becomes more red, the feature's value increases, while a bluer color indicates a lower value.(B) The SHAP analysis revealed the ranking of feature importance.IL-2, interleukin 2; SHAP, Shapley additive explanations.(C) The SHAP force plot for severe patients with COVID-19.(D) The SHAP force plot for non-severe patients with COVID-19.

FIGURE 6 ROC
FIGURE 6ROC for external validation of logistic regression model.

FIGURE 7
FIGURE 7The overall performance of the logistic regression model versus human diagnosis in predicting the severity of elderly COVID-19 patients.

TABLE 1
Survey on existing machine learning algorithms.

TABLE 2
Comparison of IgG subtypes between elderly COVID-19 patients and healthy individuals.
elderly individuals affected by COVID-19.The feature importance between age, IL-2, IgG Sum/IgG 1, DD, LY, PDW, and NLR are shown in Figure 3A.Age, IL-2, IgG Sum/IgG 1, and DD are the top 4 of the seven indicators.Then, the correlations among four individual indicators are examined.As shown in Figure 3B, age, IL-2, IgG Sum/IgG 1, and DD showed a low correlation, which could prevent the model from overfitting.
a Patients with one of the following: tumor, hypertension, diabetes, COPD, anemia, or coronary heart disease.b Any type of tumor.COPD, chronic obstructive pulmonary disease.

TABLE 4
Comparison of biomarkers between non-severe and severe COVID-19 patients.

TABLE 5
Diagnostic efficacy of five classifiers in the training and testing cohorts for fivefold cross-validation.

TABLE 6
Diagnostic efficacy of logistic regression model in the training and testing cohorts for fivefold cross-validation.