A generalizable and easy-to-use COVID-19 stratification model for the next pandemic via immune-phenotyping and machine learning

Introduction The coronavirus disease 2019 (COVID-19) pandemic has affected billions of people worldwide, and the lessons learned need to be concluded to get better prepared for the next pandemic. Early identification of high-risk patients is important for appropriate treatment and distribution of medical resources. A generalizable and easy-to-use COVID-19 severity stratification model is vital and may provide references for clinicians. Methods Three COVID-19 cohorts (one discovery cohort and two validation cohorts) were included. Longitudinal peripheral blood mononuclear cells were collected from the discovery cohort (n = 39, mild = 15, critical = 24). The immune characteristics of COVID-19 and critical COVID-19 were analyzed by comparison with those of healthy volunteers (n = 16) and patients with mild COVID-19 using mass cytometry by time of flight (CyTOF). Subsequently, machine learning models were developed based on immune signatures and the most valuable laboratory parameters that performed well in distinguishing mild from critical cases. Finally, single-cell RNA sequencing data from a published study (n = 43) and electronic health records from a prospective cohort study (n = 840) were used to verify the role of crucial clinical laboratory and immune signature parameters in the stratification of COVID-19 severity. Results Patients with COVID-19 were determined with disturbed glucose and tryptophan metabolism in two major innate immune clusters. Critical patients were further characterized by significant depletion of classical dendritic cells (cDCs), regulatory T cells (Tregs), and CD4 + central memory T cells (Tcm), along with increased systemic interleukin-6 (IL-6), interleukin-12 (IL-12), and lactate dehydrogenase (LDH). The machine learning models based on the level of cDCs and LDH showed great potential for predicting critical cases. The model performances in severity stratification were validated in two cohorts (AUC = 0.77 and 0.88, respectively) infected with different strains in different periods. The reference limits of cDCs and LDH as biomarkers for predicting critical COVID-19 were 1.2% and 270.5 U/L, respectively. Conclusion Overall, we developed and validated a generalizable and easy-to-use COVID-19 severity stratification model using machine learning algorithms. The level of cDCs and LDH will assist clinicians in making quick decisions during future pandemics.


Introduction:
The coronavirus disease 2019 (COVID-19) pandemic has affected billions of people worldwide, and the lessons learned need to be concluded to get better prepared for the next pandemic.Early identification of high-risk patients is important for appropriate treatment and distribution of medical resources.A generalizable and easy-to-use COVID-19 severity stratification model is vital and may provide references for clinicians.
Methods: Three COVID-19 cohorts (one discovery cohort and two validation cohorts) were included.Longitudinal peripheral blood mononuclear cells were collected from the discovery cohort (n = 39, mild = 15, critical = 24).The immune characteristics of COVID-19 and critical COVID-19 were analyzed by comparison with those of healthy volunteers (n = 16) and patients with mild COVID-19 using mass cytometry by time of flight (CyTOF).Subsequently, machine learning models were developed based on immune signatures and the most valuable laboratory parameters that performed well in distinguishing mild from critical cases.Finally, single-cell RNA sequencing data from a published study (n = 43) and electronic health records from a prospective cohort study (n = 840) were used to verify the role of crucial clinical laboratory and immune signature parameters in the stratification of COVID-19 severity.
Results: Patients with COVID-19 were determined with disturbed glucose and tryptophan metabolism in two major innate immune clusters.Critical patients were further characterized by significant depletion of classical dendritic cells (cDCs), regulatory T cells (Tregs), and CD4 + central memory T cells (Tcm), along with increased systemic interleukin-6 (IL-6), interleukin-12 (IL-12), and lactate dehydrogenase (LDH).The machine learning models based on the level of cDCs and LDH showed great potential for predicting critical cases.The model performances in severity stratification were validated in two cohorts (AUC = 0.77 and 0.88, respectively) infected with different strains in different periods.The reference limits of cDCs and LDH as biomarkers for predicting critical COVID-19 were 1.2% and 270.5 U/L, respectively.

Introduction
The coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has affected a global population exceeding 770 million individuals, leading to approximately 7.0 million fatalities (1).Although COVID-19 no longer constitutes a public health emergency of international concern, the whole world should review the lessons learned to prepare for the next pandemic (2).Better allocation of limited health resources, prediction of disease trajectories, and improvement of patient outcomes are essential during this pandemic.Therefore, the identification of critical patients is helpful for clinical management.Patients with critical COVID-19 have poor short-and long-term outcomes, including high inhospital mortality and more post-acute COVID-19 syndromes (3).To improve preparedness and resilience to emerging threats, it is necessary to develop a generalizable COVID-19 severity stratification model, providing references for guiding the clinical management of the next pandemic.
Current COVID-19 stratification models are primarily based on a series of clinical manifestations, including vital signs, medical history, arterial blood gas results, laboratory tests, and chest imaging abnormalities (4,5).In 2020, an easy-to-use COVID-19 severity score model was developed using eight commonly available parameters, which showed excellent performance in the identification of high-risk patients (6).However, the pathophysiology of these markers, which can foretell the prognosis of COVID-19 remains unclear.COVID-19 is characterized by a dysfunctional immune response against SARS-CoV-2 (7,8).Immune-related biomarkers contribute to the understanding of disease progression and optimal treatments.Evidence suggests that severely ill patients show lymphocyte exhaustion (9)(10)(11), expansion of monocytes (12,13), and cytokine storm (high levels of interleukin-6 [IL-6], C-reactive protein [CRP], and interferons) (14).By combining clinical manifestations and immunological biomarkers, a pathophysiology-based model will provide novel perspectives for clinical severity stratification.
Overall, we aimed to establish a generalizable COVID-19 severity stratification model using machine-learning methods.We aimed to elucidate the key immune signatures of patients with critical COVID-19 using mass cytometry by time of flight (CyTOF).By combining immune signatures and clinical parameters, the machine learning model is expected to improve our understanding of critical COVID-19 and provide references for quick decision-making during future pandemics.

Study design
To prepare for the next COVID-19 pandemic, we established a clinical severity stratification model using machine learning with immune signatures.Three COVID-19 cohorts (one discovery cohort and two validation cohorts) and 16 age-and sex-matched healthy volunteers (negative for SARS-CoV-2 and virus-specific Immunoglobulin M [IgM] and Immunoglobulin G [IgG], as indicated by the reverse transcription-polymerase chain reaction [RT-PCR] test) were included in this study.According to the clinical severity classification criteria (Supplementary Table S1), which was modified from World Health Organization guidelines (2), patients in the discovery cohort were classified into mild and critical cases.We screened potential variables by longitudinally comparing the levels of anti-SARS-CoV-2 antibodies, inflammatory cytokines, plasma complement components, and cellular immune signatures between critical and mild cases.A self-designed 42parameter panel, including nine energy metabolism enzymes, was applied to phenotypic immune signatures using CyTOF technology.The most clinically relevant immune signatures and plasma parameters were introduced into machine learning.

Discovery cohort and sample collection
Patients who met the following inclusion criteria and were admitted to our surgical intensive care unit (ICU) between December 2021 and December 2022 were enrolled in the discovery cohort (n = 39, with 59 samples).Inclusion criteria were adults aged >18 years, first diagnosed with SARS-CoV-2 genome positivity using RT-PCR test in the previous 96 h, and sufficient remaining blood after regular laboratory tests on the first day post-admission.
The exclusion criteria were as follows: age < 18 years; pregnancy; breastfeeding; existence of any pre-existing and transmissible diseases, such as human immunodeficiency virus, tuberculosis, and syphilis; mental illnesses; or taking psychotropic drugs.Basic information included comorbidities, in-hospital mortality, Murry lung injury score, and length of mechanical ventilation (Table 1).
Longitudinal (on days 1, 3, and 7 post-admission) blood samples were collected for analysis.Briefly, 2 mL peripheral blood samples were collected and delivered immediately to the lab at 4°C to gain the plasma and peripheral blood mononuclear cells (PBMCs).To avoid omitting potentially important information, both the absolute cell counts and relative cell proportion to PBMCs at all sampling points were analyzed in the present study.

Validation cohort 1
To verify the key role of the most important immune subset (here, cDCs (C07)) in clinical severity stratification, we adopted public open data from Stephenson et al. (15).Briefly, single-cell data from mild (n = 26) and critical (n = 17) cases recruited from Addenbrooke's Hospital, Royal Papworth Hospital, and University College London (UCL) Hospital were downloaded from https:// covid19cellatlas.org/.The proportion of classical dendritic cells (cDCs) to PBMCs was filtered using the R package Seruat (4.0).According to the authors' description, all patients were SARS-CoV-2 antigen-positive without active hematological malignancy or cancer, known immunodeficiency, sepsis from any cause, or blood transfusion within 4 weeks.

Validation cohort 2
To verify the role of the most important systemic parameter (here, lactate dehydrogenase (LDH)) in clinical severity stratification, all the patients with complete clinical data admitted to other ICUs in our institution (Peking University Third Hospital) between December 2021 and December 2022 were retrospected (n = 840).Inclusion and exclusion criteria were the same with the discovery cohort.

Mass cytometry
PBMCs were isolated from peripheral blood using Ficoll density gradient centrifugation.To sort cell precipitates, they were combined with 5 mL of fluorescence-activated cell sorting (FACS) buffer (1×phosphate buffered saline supplemented with 0.5% bovine serum albumin) and centrifuged at 400×g for 5 min at 4°C.The supernatant was discarded and the cell precipitates were resuspended in FACS buffer.To examine the samples, the viability rate must be greater than 85%, and the number of cells must not be less than 3×10 6 .

CyTOF data acquisition
Before acquisition, PBMCs were washed twice with CSB and resuspended at a concentration of 1.1×10 6 cells/mL in the Cell Acquisition Solution (Fluidigm) containing 10% EQ Four Element Calibration Beads (Fluidigm).PBMCs were acquired using a Helios CyTOF Mass Cytometer (Fluidigm) equipped with a SuperSampler fluidics system (Victorian Airships), and data were collected as previously described.fcs files.

CyTOF data analysis
After acquisition, data were concatenated using the fcs concatenation tool from Cytobank and manually gated to retain live, singlet, and valid immune cells.CytoNorm was used in two steps according to the instructions provided in the R library CytoNorm to normalize the data (16).For the downstream analysis, the fcs files were loaded into R.The signal intensities for each channel were arcsinh-transformed with a cofactor of 5 (x_transf = asinh(x/5)).To visualize high-dimensional data, tdistributed stochastic neighbor embedding analysis (t-SNE) (17) and flow self-organizing map (FlowSOM) (18) algorithms were performed on all samples.Approximately 10,000 cell events in each sample were pooled and included in the t-SNE analysis, with a perplexity of 30 and theta of 0.5.The R t-SNE package for the Barnes Hut implementation of the t-SNE was used in this study.To study the developmental trajectory of natural killer (NK) cells and classical monocytes, dynamic immunometabolic states and cell transitions were analyzed using the Monocle algorithm (19).Data are displayed using the ggplot2 R package.

Machine learning strategies
Since the target variable (clinical severity) for model training was labelled data, provided by clinical experts.The supervised learning methods are more appropriate than unsupervised-, semisupervised-, and reinforcement learning methods.By comparing the advantages of different supervised methods (20-30) (Supplementary Table S3), we finally employed AdaBoost, Back Propagation, Gradient Boosting Decision Tree, Random Forest, and Support Vector Machine algorithms to construct classifiers for discriminating patients with critical COVID-19 from mild ones.The important immune and systemic features (cDCs and LDH) were introduced to the model as inputs.Five-fold cross-validation (with four folds for training and one-fold for validation) and external validation were performed.For five-fold cross-validation, all the training data were randomly split into five parts.Each part was considered as the training part and the others were used for validation.Here, we performed the five-fold cross-validation five The statistical analyses were performed using Prism v.9.0 (GraphPad Software).For comparison between two groups, the gender, mortality, and comorbidities were evaluated using chi-square test and other clinical characteristics were evaluated by the Unpaired twotailed Student's t-test and data without a normal distribution were evaluated by the Mann-Whitey U-test.Data are presented as median with interquartile range.times and the averaged values of AUC were adopted.For the external validation, Back Propagation algorithm was performed.

Statistical analysis
Statistical analyses were performed using the R software (v.4.0.4).The normality of patient data was tested using the Shapiro-Wilk normality test.Statistically significant differences between phenotypes were calculated using two-sided multiple Student's t-tests for variables with a normal distribution and Wilcoxon rank-sum tests for other variables.Spearman's correlation analysis was performed on significantly different clusters, cytokines, and clinical indicators to assess their correlations using the R package stats (4.1.0).Receiver operating characteristic (ROC) analysis was performed with the R package pROC (1.16.2), and a heatmap was generated with the R package ggplot2 (4.0.5).

Basic information and systemic inflammatory responses of the discovery cohort
A total of 39 individuals diagnosed with COVID-19 (15 mild and 24 critical cases) admitted to our ICU were included in cohort 1 as the discovery cohort to determine potential predictive parameters.As shown in Table 1, the basic information of the critical and mild cases was comparable.The Murry lung injury score, length of mechanical ventilation, and length of ICU stay were significantly high in critical cases (Table 1).Longitudinal comparisons of inflammatory cytokines, antibodies, and complement components revealed that systemic IL-6, IL-12, and LDH levels were important in distinguishing mild cases from critical cases.The variation trends in these parameters were consistent across all sampling points (Table 2).

Cellular immunometabolic characteristic of patients with COVID-19 differed from healthy volunteers
To acquire a full landscape of the immune signatures of PBMCs and identify the potentially important clusters for the stratification of COVID-19, we performed CyTOF analysis with a 42-parameter panel (consisting of 33 surface markers and 9 intracellular metabolic markers) (Supplementary information, Figure S1).The obtained data were subjected to a FlowSOM clustering algorithm and t-distributed stochastic neighbour embedding (t-SNE) analysis, which enabled the identification of distinct clusters representing different immune cell types.According to the dimensional reduction results of the marker expression level, 34 clusters were obtained (Figure 1A).Then, to provide reference for other similar studies, which may apply different panels, we further classified these 34 clusters into "eleven major immune cell populations" (CD4 + T, CD8 + T, gdT, DPT, DNT, pDC, cDC, NK, NKT, B, and Monocytes), which were often studied (Supplementary information, Table S4; Figure S2).
We found that the composition of PBMCs in patients with COVID-19 varied significantly from that in healthy volunteers.The total counts of PBMCs (in per millilitre of peripheral blood) and the counts of the main immune cell types, such as T, B, and NK cells, of patients with COVID-19 decreased significantly.However, the number of monocytes increased (Supplementary information, Figure S2).Comparison of the percentages of all defined 34 clusters further confirmed that, fifteen immune cell subsets were significantly differed between COVID-19 patients and healthy volunteers (Figures 1B, C).Most of these subsets were acquired immune cell subsets and were significantly decreased in COVID-19.In addition, variations in two major innate immune cell subsets (NK cells (C03) and classical monocytes (C12), with the average percentages more than 5% in healthy volunteers) were also found (Figures 1B, C).As the host innate immunity is the first line of defense, we further investigated these two subsets' metabolic status.As shown in Figure 2, the metabolic markers participating in the process of glucose (such as CS, GLS, PFKFB3, and PDk1_pS241) and tryptophan metabolism (IDO1 and KAT1) were significantly altered in both NK cells (C03) and classical monocytes (C12).The developmental trajectories further demonstrated that under COVID-19, NK cells gradually transformed from C01 to C03, namely, from a relative metabolic steady state to a disturbed state with decreased oxidative phosphorylation but boosted glycolysis and tryptophan catabolism (Figures 2A-C).For classical monocytes, C12 gradually transformed to C09, namely, to tryptophan exhaustion (Figures 2D-F).

Distinct cellular immune signatures of critical COVID-19 were identified compared with mild cases
As described in the Methods section, to identify the important clusters distinguishing critical cases from mild cases, we compared the cell counts and percentages of each cluster within PBMCs at all sampling points.In total, five candidate clusters were found, and the differences in cDCs (C07), Tregs (C20), CD4 + Tcm (C24), pDCs (C05), and DPT (C29) were shared by the results from all sampling points and the first day samples (Figure 3A).As the percentages of pDCs and DPT were below 0.5%, they were not considered in subsequent analyses.Next, we investigated whether these clusters were associated with clinical parameters and prognosis.The results demonstrated that the counts of cDCs, Tregs, and CD4 + Tcm were significantly decreased in the critical cases and patients who ultimately died (Figures 3B, C).Their levels were positively or negatively correlated with systemic parameters, lung injuries, and the length of mechanical ventilation (Figures 3D-F, and Supplementary information, Table S5).Within each severity group, the longitudinal analysis showed that the counts of these three clusters were not significantly different among different sampling points (Supplementary information, Figure S3).These findings indicated that altered cDCs, Tregs, and CD4 + Tcm were stable/sensitive predictive biomarkers because their level wouldn't be significantly influenced by sampling timing and/or transient condition relief.Specifically, cDCs was the most important cluster, negatively correlated with LDH and positively correlated with IL-2, IL-12, TNF-a, and MIP1-a (Figure 3E).Receiver operating characteristic analysis further revealed that the single variable cDCs was effective in predicting critical COVID-19 (Figure 3G).And the level of LDH was the most important systemic parameter because of its strong negative correlation with cDCs, Tregs, and CD4 + Tcm (Figure 3E).

Development and validation of clinical severity stratification models based on the immune signatures and plasma parameters of patients with critical COVID-19
Considering the potential of machine learning for disease severity stratification, we developed clinical severity stratification CyTOF analysis of peripheral immune cell subsets in patients with COVID-19 and healthy volunteers.models based on important key clusters (cDCs, Tregs, and CD4 + Tcm) and systemic parameters (LDH, IL-6, IL-12).As we expected, machine learning models with six parameters as inputs showed good effects in predicting clinical severity (Figure 4A).Among these parameters, cDCs and LDH were the most important immune signature and systemic signature, respectively (Figure 4B).The model using cDCs and LDH as individual input also performed well, with an average AUC of approximately 0.8 in the discovery cohort (Figures 4C, D).The validation of machine learning models with single input (with Back Propagation algorithm) further demonstrated that the clinical severity stratification model based on single cDCs had an AUC of 0.77 (Figure 4E).And the model based on systemic LDH had an AUC of 0.88 (Figure 4F).Notably, patients in validation cohort 1 were recruited in 2020 and infected with a different strain compared with the patients in the discovery cohort.
These results indicate that our models, based on single biomarker (cDCs or LDH), performed well in COVID-19 severity stratification, with good robustness and generalization.

Reference limits of cDCs and LDH as biomarkers for predicting critical COVID-19
To provide detail reference for clinicians in quick decisionmaking for the next pandemic, we analyzed the effect of cDCs and LDH in severity prediction in validation cohorts and tried to find out the optimal reference limits.In validation cohort 1 (adopted The predictive effects of cDCs and LDH on COVID-19 severity stratification.from Stephenson et al.'s published work (15)), the proportion of cDCs decreased in critically ill participants across the three UK centers (Supplementary information, Figures S4A-C).The percentage of cDCs showed good effects in predicting clinical severity (AUC = 0.74, Figure 4G).The optimal cutoff point was 1.2%, and the sensitivity was 0.93 (95% CI 0.70-0.99).In the validation cohort 2 (adopted from Peking University Third Hospital), similar with the findings in the discovery cohort, significant increase of LDH (Supplementary information, Figure S4D) and its predictive effect was found (AUC = 0.89, Figure 4H).The cutoff point was 270.5 U/L and the sensitivity was 0.92 (95% CI 0.86-0.95).Accordingly, the reference limits of cDCs and LDH for critical COVID-19 were less than 1.2% and more than 270.5 U/L.

Discussion
Since the beginning of the SARS-CoV-2 pandemic, numerous researchers have provided important perspectives on the underlying mechanisms of COVID-19 and have developed severity stratification models (31).To provide novel insights and better preparations for the next pandemic, we developed a severity stratification model with good generalizability based on the pathophysiology of COVID-19.Through integrative analysis of immune signatures and clinical manifestations in critical participants, we found that cDCs and systemic LDH levels were the most important factors that determined severity stratification (Figure3G).The key roles of the two indicators were validated using two cohorts.Notably, the machine learning models based on the level of cDCs and LDH showed great potential for predicting critical cases in cohorts infected with different strains ( Figures 4E, F).The reference limits of cDCs and LDH as biomarkers for predicting critical COVID-19 were 1.2% and 270.5 U/L, respectively (Figures 4G, H).
According to the current World Health Organization criteria, critical and severe COVID-19 are identified by a bundle of clinical features, including chest imaging characteristics, arterial blood gas parameters, and other clinical symptoms and signs (2).A progressive decrease in peripheral blood lymphocytes, an increase in IL-6, CRP, procalcitonin, and D-dimer are considered biomarkers for COVID-19 severity based on guidelines (32).In the present study, we detected that LDH showed great potential in the early identification of patients with critical COVID-19 (33)(34)(35)(36).Although LDH is considered a nonspecific biomarker of inflammation, its elevation is associated with poor outcomes, possibly reflecting the severity of lung damage (37,38).Furthermore, a large meta-analysis suggested that increased LDH levels following infection correlated with the post-acute respiratory sequelae of COVID-19, showing great potential in predicting long-term COVID-19 (39).
Certain profound immunity alterations took place during COVID-19 infection, and the depletion and dysfunction of lymphocytes were described as the most classical signatures of critical COVID-19 in most articles.Although we also observed decreased Tregs and CD4 + central memory T cells in critical cases, the counts of cDCs contributed the most to predict clinical severity.Several studies have demonstrated the reduction and dysfunction of cDCs in critical COVID-19 (40,41), our study was supported by these results and further emphasized its key role in severity stratification models.As highly efficient antigen-presenting cells, DC are the key link between innate and adaptive immunity.Several ongoing clinical trials have been assessing the safety and efficiency of DC-based vaccines against SARS-CoV-2 (42,43).DCs can activate T cell responses and save adjacent cells by secreting type I interferons (44).However, some limitations of DC-based vaccines, such as toxicity, allergenicity, and the possibility of DCs phenotype alterations, have not been resolved (42).Therefore, further studies on DCs as treatable traits are required.
Researches have demonstrated that comorbidities have an impact on the severity of COVID-19 in patients (45).SARS-CoV-2 is more likely to affect older men with comorbidities (46), and the presence of comorbidity is more common in patients with severe COVID-19 (45) than mild patients.Patients with diabetes, cardiovascular diseases, and respiratory diseases, are more likely to present more severe symptoms and complications (33,47).However, our patients with COVID-19 were all from specialty ICU, who tended to be with a poor underlying functional status and with more comorbidities (Table 1).Accordingly, our conclusions may not be as applicable to those without comorbidities or with a healthy status.This is a limitation of our study, and future studies are encouraged to address this issue.
In summary, we established a severity stratification model for COVID-19 based on integrative analysis of immune signatures and clinical laboratory parameters.This machine-learning model was validated in two cohorts infected with different strains, demonstrating its generalizability and robustness.We hope that our analysis will be beneficial for the early identification of high-risk patients with COVID-19 and provide some references for the next pandemic.

2
FIGURE 2 Cellular immunometabolic characteristics of COVID-19-specific immune subsets.(A) Monocle 2 trajectory analysis of NK cells.The monocle plot displays NK cells color-coded by different NK cell clusters.The arrow indicates the pseudotime trajectory of NK cells from a healthy state to COVID-19 infection.C01 was localized at the beginning of the pseudotime trajectory, whereas C03 was at the end of the trajectory.(B) Boxplots showing the density of the cellular metabolic markers (CS, GLS, IDO, KAT1, and PFKFB3) of C01 and C03.(C) Monocle 2 trajectory analysis of cellular metabolic markers of NK cells.Each dot represents one cell and colors represent the expression levels of indicated markers.(D) Monocle 2 trajectory analysis of classical monocytes.The monocle plot displays classical monocytes color-coded by different classical monocytes clusters.The arrow indicates the pseudotime trajectory of classical monocytes from healthy state to COVID-19 infection.(E) Boxplots showing the density of the cellular metabolic markers (CS, GLS, IDO, KAT1, and PFKFB3) of the C12 and C09.(F) Monocle 2 trajectory analysis of cellular metabolic markers of classical monocytes.Each dot represents one cell, and colors represent the expression levels of indicated markers.The center, box and whiskers of the boxplot represent the median, IQR and 1.5 × IQR, respectively.The t-test was used for normally distributed data and the Mann-Whitney U-test was used for non-normally distributed data.

3
FIGURE 3 Immune and clinical characteristics of patients with critical COVID-19.(A) The candidate clusters distinguishing patients with critical COVID-19 from mild ones.(B, C) Boxplots depicting the cell counts of significantly differed clusters between patients with mild and critical COVID-19 (B), and between survived and dead patients (C).(D) Heatmap showing Spearman's correlations between the counts of critical COVID-19 key immune clusters and clinical laboratory parameters in all samples.Colors represent Spearman's correlation coefficient.(E, F) Scatterplots showing correlations between the counts of critical COVID-19 key immune clusters and critical clinical laboratory parameters (E), Murray scores, and length of mechanical ventilation days (F).(G) ROC analysis predicting COVID-19 severity using the counts of critical COVID-19-specific clusters and the level of LDH.The center, box and whiskers of the boxplot represent the median, IQR and 1.5 × IQR, respectively.The t-test was used for normally distributed data and the Mann-Whitney U-test was used for non-normally distributed data.
(A) Performances of COVID-19 severity stratification models based on the six candidate indicators (C07, C20, C24, LDH, IL-6, and IL-12) using five different machine learning algorithms in the discovery cohort.(B) The bar charts showing the contributions of six indicators in Ada, RF, and GBDT, as well as the averaged contributions of the six indicators across the three models.(C, D) Performances of COVID-19 severity stratification models based on the counts of cDCs (C07) (C) and the level of LDH (D) using five different machine learning algorithms in the discovery cohort.Each dot represents an AUC value of 5-fold cross-validation, and the bar shows the averaged AUC values from 5 runs.(E, F) Performances of COVID-19 severity stratification models based on the cDCs (C07) in validation cohort 1 (E), and LDH in validation cohort 2 (F) by Back Propagation algorithm.(G, H) ROC analysis of cDCs (G) and LDH (H) for the COVID-19 stratification in the validation cohorts.