- 1Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou, China
- 2Department of Clinical Laboratory, Institute of Translational Medicine, Renmin Hospital of Wuhan University, Wuhan, China
- 3Department of Critical Care Medicine, Renmin Hospital of Wuhan University, Wuhan, China
- 4Department of Statistics, University of Wisconsin, Madison, WI, United States
- 5School of Economics and Management, and MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation, University of Chinese Academy of Sciences, Beijing, China
- 6AMSS Center for Forecasting Sciences, Chinese Academy of Sciences, Beijing, China
Background: Sepsis biomarker research over the past 30 years has been plagued by the use of wrong animal models and inappropriate patient selections, leading to the failure of translating findings into precision medicine. Thousands of sepsis-related gene biomarkers have been published, but this excess hinders medical advancement because (1) an overwhelming number of genes make targeted drug development and precision medicine unfeasible; (2) many biomarkers lack cross-cohort validation, rendering them clinically unhelpful. Our goal is to identify a highly informative, single-digit set of sepsis biomarkers to advance precision medicine.
Methods: We conducted large-scale research on heterogeneous populations, including patients with sepsis, severe sepsis, and septic shocks, and collected plasma samples from 32 sepsis patients and 18 healthy controls at Renmin Hospital of Wuhan University, China. RNA was isolated using the HYCEZMBIO Serum/Plasma RNA Kit, and RT-qPCR was performed on the Roche Light Cycler 480 platform. An AI-based max-logistic competing classifier was applied across 11 cohorts with thousands of samples, using both self-designed and public datasets to identify the most critical sepsis biomarkers.
Results: Our analysis highlights CKAP4, FCAR, and RNF4 as key genetic drivers in sepsis-related variations. In whole blood, NONO is crucial for immune response, while in plasma, PLEKHO1 and BMP6 reveal further genetic heterogeneities. Pediatric patients also exhibit significant contributions from RNASE2 and OGFOD3. These genes form the most effective miniature set of biomarkers.
Conclusion: Achieving 99.42% accuracy across cohorts, this miniature set outperforms larger published gene sets. These findings provide critical insights for personalized risk assessment, targeted drug development, and tailored treatments for both adult and pediatric sepsis patients.
Introduction
Sepsis is a major global health issue characterized by life-threatening organ failures and high mortality rates. In 2017, a global study estimated 48.9 million sepsis cases and 11 million sepsis-related deaths worldwide (1). That same year, the WHO highlighted sepsis as a critical health priority, predicting its incidence to rise with the aging global population (2). Survivors often face long-term impairments and increased mortality rates post-discharge (3). Sepsis is particularly burdensome in low to middle-income countries, contributing to over 40% of all-cause mortality in some areas (1).
Sepsis results from an abnormal host response to infection, leading to multiple organ dysfunction (MODS) or multiple organ failure (MOF). Many patients suffer from long-term complications even the acute infection is treated with antibiotics (1). Unlike localized infections, sepsis causes a systemic, maladaptive response, often resulting in remote multiorgan failures (4). Factors like infection source and pathogen- or patient-specific variables influence its manifestation (5). Research indicates a typical pathophysiological pattern in sepsis, with over 80% of the transcriptomic response in leukocytes being independent of the infection source or pathogen (6). This suggests a shared transcriptomic pattern among sepsis patients, offering a potential target for personalized strategies. Although an updated international consensus on sepsis was introduced in 2017, the genomic pathology remains unclear (7). However, advances in sequencing technology and analytical platforms bring hope for future genomic studies on sepsis.
Traditional transcriptomic studies have used classic approaches like fold changes or conventional machine learning to identify differentially expressed genes (DEGs) between sepsis patients and healthy individuals (8–16). These approaches often result in large panels of DEGs, sometimes exceeding the number of samples. For instance, pioneering genomic studies at Cincinnati Children’s Hospital identified over 1,000 DEGs in pediatric sepsis patients (14, 15). Later studies on adult cohorts from Europe, America, and Australia refined the DEGs to fewer than 100 genes, though they still lack consensus on the reported DEGs (8, 13, 16). This inconsistency hinders the development of genetic treatments for sepsis, and the wrong animal models and the inappropriate selections of patients contribute to the failure of 30 years in sepsis researches (17). Thus, it is crucial to identify a concise and more accurate set of DEGs.
The evolution of quantitative medical research, fueled by advancements in computing power, brings artificial intelligence (AI) to the forefront. However, current AI models often function as black boxes, with their computational processes remaining opaque (18, 19). Additionally, AI training is often biased due to high costs limiting it to high-income settings, and the underrepresentation of pediatric data due to ethical challenges (20). A newly developed machine learning model shows promise in addressing above issues (21). This model, which has demonstrated advanced capabilities in cancer DEG recognition and subtype classification (22), aims to integrate new genomic evidence of sepsis into a concise and interpretable biological framework. Utilizing the max-logistic competing risk factors framework, this model can accurately identify a small set of critical DEGs and explain their interactions. It has proven effective in modeling various cancers and COVID-19 (22–25).
This study examined twelve datasets, including 1876 samples (1,572 sepsis and 304 control), covering diverse socioeconomic and ethnic groups, including pediatric patients highly susceptible to sepsis mortality (8–16). Among the first 11 datasets containing heterogeneous populations, including whole blood, plasma, adults, and pediatrics, three panels of five or fewer DEGs, with a common three-gene core, were identified. The first panel, consisting of four genes from adult cohorts’ whole blood gene expression data (n = 1,413), included CKAP4, FCAR, RNF4, and NONO, achieving near-perfect classification accuracy. The second panel, from pediatric cohorts (n = 287), included RNASE2 and OGFOD3 in addition to the core genes, achieving 100% accuracy. The third panel, from adult plasma samples (n = 106), included PLEKHO1 and BMP6 alongside the core genes, also achieving 100% accuracy. The twelfth dataset is a gene expression profiling dataset of peripheral blood mononuclear cells (PBMCs), which differs from the whole blood and plasma cohorts. Additional genes are needed to reach nearly 100% accuracy.
These gene panels demonstrated exceptional sensitivity and specificity, achieving 100% accuracy in 9 out of 11 datasets and over 95% in the remaining two. This new biological model offers a robust tool for identifying sepsis gene variations, representing the highest-performing sepsis biomarkers in the literature. The findings suggest that many previously published genes may be redundant or misleading, emphasizing the need for concise and precise biomarkers to advance precision medicine in sepsis (17, 22–26).
Materials and methods
Our study differs from orthodox clinical studies in the literature, which focused on rigorous experiment design while only running basic analysis on standard software. In most cases, the basic analysis was sufficient to answer the questions in completed studies in the literature. However, we argue that there is information that is overlooked, and we can extract it from public data with more advanced methods. In this section, we present both our new experimental protocol and advanced analytical method. The schematic flow of our study design is presented in Figure 1.

Figure 1. Schematic workflow of new AI-driven discovery of most effective biomarkers for sepsis patients.
Public data acquisition
This study utilized 10 public datasets from the National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) database, using keywords “sepsis,” “septic shock,” and “homo sapiens.” Two datasets (GSE9692, GSE13904) were from pediatric cohorts in the US, while the others were adult cohorts from the US, Australia, France, Spain, and Germany. A validation dataset was self-collected from a Chinese cohort. An overview of these datasets is provided in Table 1.
The first seven datasets consisted of whole blood samples from adult patients. The first dataset, GSE65682, was from a North American adult cohort with 761 severe pneumonia/sepsis samples and 41 healthy controls. Blood was collected from ICUs, and RNA was isolated using the PaxGene Blood RNA kit (Qiagen, Netherlands) and analyzed on the Affymetrix Human Genome U219 Array platform (8). The second dataset, GSE28750 (9), the third dataset, GSE57065 (10), the fourth dataset, GSE95233 (10), the fifth dataset, GSE69528 (11), the sixth dataset, GSE131761 (12), the seventh dataset, GSE154918 (13), were from heterogeneous cohorts with different platforms and technologies. The details of data information are stated in the Supplementary material submitted together with the main file.
Two datasets (the eighth and the ninth) were from pediatric patients’ whole blood samples. The eighth dataset, GSE13904, was from a North American children’s cohort with 209 sepsis samples and 18 controls. RNA was isolated using the PaxGene Blood RNA kit (PreAnalytiX, United States) and analyzed on the Affymetrix Human Genome U133 Plus 2.0 Array platform (14). The ninth dataset, GSE9692, also from a North American children’s cohort, included 30 sepsis samples and 14 controls (15).
The tenth dataset, GSE49757, consisted of plasma samples from a North American adult cohort with 37 sepsis and 19 control samples. RNA was isolated using the RNeasy Mini kit (Qiagen, Netherlands) and analyzed on the Illumina HumanHT-12 V4.0 expression beadchip platform (16).
Among these datasets, seven from adult whole blood samples allowed much more advanced cohort-to-cohort cross-validation (different from traditional cross-validation in the literature); the two pediatric datasets could also be cross-validated. A validation cohort was collected from Renmin Hospital of Wuhan University, Wuhan, China, comprising 32 sepsis samples and 18 healthy controls to verify findings from the only public plasma dataset. RNA was isolated using the HYCEZMBIO Serum/Plasma RNA Kit (HuiYuCheng Biotechnology, China), and gene copy counting was performed using rt-QPCR on the Roche Light Cycler 480 platform.
A total of 1,806 samples (1,518 sepsis and 288 healthy controls) were collected from diverse settings and ethnicities. This comprehensive analysis addressed the challenges of data source heterogeneity and varying study objectives, which are often overlooked in the current literature. The sepsis patient cohort includes two datasets that encompass three distinct sepsis statuses: sepsis, severe sepsis, and septic shock. This enables a deeper exploration of sepsis progression by evaluating whether genes identified in sepsis versus healthy controls also carry information relevant to disease progression, specifically by comparing severe sepsis to sepsis, and septic shock to sepsis, with sepsis serving as the control in both cases.
Validation data acquisition
Since there was only one dataset with plasma samples, we collected another cohort to validate the findings in plasma samples. The self-collected cohort was collected at Renmin Hospital of Wuhan University, which is the designated tertiary 3A (highest level) hospital of Hubei Province and provides healthcare services to patients from diverse geographic and socioeconomic backgrounds. Any patient presenting to the Department of ICU of Wuhan University Renmin Hospital with suspected severe infection or sepsis/septic shock was recognized as a prospective participant. Blood samples of prospective participants would be collected upon administration; usually, multiple tubes would be collected and sent to the Department of Clinical Laboratory for routine diagnostic testing. Patients who were positively diagnosed with sepsis via the SOFA scoring system would be approached for consent to participate in the study (7). The Department of Clinical Laboratory would release the patient samples for research proceedings when written consent was obtained. Control samples were acquired from non-sepsis volunteers.
The enrolled participants included 32 sepsis patients (21 sepsis and 11 septic shock) and 18 healthy controls. The detailed patient characteristics and gene expression values are presented in Table 2. We noted that the ration of women in the self-collected cohort may limit the generalizability of the findings. Although some prospective epidemiology studies have reported differences in sepsis incidence and mortality in different genders, there is no consensus on whether gender is a risk factor.
We used real-time quantitative polymerase chain reaction (rt-qPCR) for the gene expression study. All six genes identified in the adult plasma samples (NONO, CKAP4, RNF4, FCAR, PLEKHO1, and BMP6) were examined. We performed all experimental procedures in strictly sanitized environments, from sample preparation to gene copy counting. The protocol used for the experiments is as follows: First, we processed 3 to 5 mL of whole blood samples into plasma. Next, for RNA isolation, we processed 200 μL of participants’ plasma with the HYCEZMBIO Serum/Plasma RNA Kit (HuiYuCheng Biotechnology, China). We performed the optional centrifuge cycles; otherwise, we followed the manufacturer’s recommended protocol closely during the RNA isolation process. We performed reverse transcription with the HiScript III 1st Strand cDNA Synthesis Kit (Vazyme Biotechnology, China). We used 6 μL of RNA suspended in nuclease-free water for each participant in the reverse transcription. We closely followed the manufacturer’s recommended protocol throughout the reverse transcription procedure. Finally, we performed the rt-qPCR protocol with the Taq Pro Universal SYBR qPCR Master Mix (Vazyme Biotechnology, China) on the Roche Light Cycler 480 platform (Roche, United States). We closely followed the test kits’ recommended protocol. The program we designed for the experiment included 40 cycles of denaturing at 95°C for 10 s and heating at 60°C for 30 s. The referent used for gene copy counting was β-actin; the primers corresponding to each gene and β-actin are presented in Table 3.
Analytical method
We summarize the model in this section, with detailed explanations in the Supplementary material. The AI-based analytical model employed in our study represents a generalized form of logistic linear models. The dependent variable is binary, taking the values: (1) “sepsis” or “healthy,” (2) “severe sepsis” or “sepsis,” and (3) “septic shock” or “sepsis.” The independent variables are risk factors represented by multiple genes.
The key innovation that distinguishes our model from many traditional ones is that the risk factors are not individual genes but linear combinations of several genes. This enhancement allows the model to capture gene interactions, with the signs (+/−) of these combinations providing insight into the regulatory relationships between genes. Additionally, we developed a framework, outlined by seven rules (found in the Supplementary material), to define critical differentially expressed genes (DEGs) that guide our model’s analysis. The model evaluates every possible linear combination of genes, ultimately identifying the combination of critical DEGs with the highest risk association to the “sepsis” outcome.
Given that our study spans 11 cohorts, including patients with sepsis, severe sepsis, septic shock, and healthy controls, several pertinent questions arise: (1) How to select control group? (2) How was the analytic process that identified the transcriptional profiles established? (3) Were the cohorts pooled? (4) Were the cohorts compared? (5) How was the initial model developed?
For question (1), controls in our analysis consist of either healthy individuals or sepsis patients, with the latter serving as controls when comparing severe sepsis and septic shock cases. In published studies, the primary focus is often distinguishing sepsis patients from healthy controls, with high diagnostic accuracy reported for specific genes. However, differentiating between healthy individuals and severe sepsis patients is not particularly challenging; a clinician with moderate expertise can easily recognize the difference. What is truly needed is the ability to identify distinct subsets within the sepsis population that may respond differently to targeted therapy. Although these observations hold generally true, our selection of healthy controls differs significantly from prior research. First, as discussed in the Introduction, thousands of published genes from prior studies often fail to replicate across cohorts. Second, our identified gene set is the smallest, most informative group of biomarkers for sepsis. Third, their use in diagnosing sepsis is only a fraction of their potential utility. Fourth, this minimal gene set completely (100%) captures the genetic variability between sepsis patients and healthy controls. Fifth, it reflects genetic differences between sepsis, severe sepsis, and septic shock populations. Sixth, this set may respond uniquely to specific treatments, whereas larger gene sets may be too broad to offer focused therapeutic guidance.
For question (2), the analytical process is described in detail through the solution of the objective function (s5) outlined in the Supplementary material, followed by the corresponding procedures.
For question (3), the 11 cohorts were not pooled. Since the samples were derived from diverse ethnic groups, age categories, and experimental platforms, pooling would introduce batch effects, which could compromise the integrity of the inferences despite attempts at correction. Our max-logistic classifiers address batch effects by treating the cohorts independently in a cohort-to-cohort cross-validation framework, a method far more advanced than traditional cross-validation approaches in the literature. Relevant details are provided in the Supplementary material and our prior publications (22).
For question (4), the cohorts were not directly compared. However, we applied the most rigorous critical gene rules available in the literature (22).
For question (5), the computational procedure for model development is thoroughly documented in the Supplementary material and our previous publications (22).
Results
Identification of critical DEGs
After analyzing 11 datasets, we identified three panels with eight critical DEGs, all sharing a core set of CKAP4, FCAR, and RNF4. Panel one, which adds NONO, showed nearly perfect classification in seven datasets from sepsis patients’ whole blood samples, achieving 100% accuracy, sensitivity, and specificity in five and over 95% in the other two. Notably, FCAR alone had perfect classification in three datasets. Panel two includes RNASE2 and OGFOD3 and performed flawlessly in two pediatric cohorts. Panel three adds PLEKHO1 and BMP6, showing perfect classification in two sepsis plasma datasets. While literature links NONO, FCAR, BMP6, RNF4, and RNASE2 to sepsis or Systemic Inflammatory Response Syndrome (SIRS), their interactions and the roles of PLEKHO1, OGFOD3, and CKAP4 in sepsis are less documented. Findings from our self-collected dataset reinforced these results.
Identification of classifiers based on DEGs: (1) sepsis versus healthy
The new AI-type models (max-logistic competing risk factors) were trained to identify classifiers that discriminate sepsis samples from healthy controls (Figure 2). In the max-logistic competing risk factor model, each competing factor ( ) is a linear combination of DEGs. The final classifiers that discriminate a sepsis sample from a control sample are presented in Table 4. The risk probability can be calculated by applying the logistic function, as shown in Equation 1. Each represents a gene-sepsis association and reflects how genes interact. Individual samples may have multiple , representing the competing risk factors for that patient. The final model reports the maximum of all (CFmax = max(CF i)) for each sample. The risk probability estimated by the final model’s classifier of both sepsis and control groups is visualized in Figure 3. The model classifies samples with above 50% risk as “sepsis” (or “severe sepsis,” or “septic shock”), while those less than 50% as “healthy” (or “sepsis”), we can evaluate the model’s performance by comparing the model output with the sample’s original classification.

Figure 2. Estimated risk probability based on the final classifiers for sepsis patients and control subjects in different cohorts. These plots describe the risk probability for each subject in a cohort based on their maximum competing risk factor. Sepsis patients and control subjects are represented by red and blue dots, respectively. The orange dash-line represent the p = 0.5 probability threshold for our classifiers to separate “septic” and “healthy” classes. We can observe that the final classifiers has excellent accuracy in most of the cohorts. GSE9692 is labeled with (A); GSE13904 is labeled with (B); GSE28750 is labeled with (C); GSE49757 is labeled with (D); GSE57065 is labeled with (E); GSE65682 is label with (F); GSE69528 is labeled with (G); GSE95233 is labeled with (H); GSE131761 is labeled with (I); GSE154918 is labeled with (J); and the self-collected cohort is labeled with (K).

Figure 3. Diagnostic view of different competing risk factors. In this figure, we present the gene–gene interactions and signatures for different competing risk factors. Scatter plots labeled with the same letter contain patients from the same cohort. Subjects with a higher risk of a sepsis classification are in red, while subjects with a lower risk are in blue. We can see clear separation in all cohorts, indicating the high accuracy of our classifier model. For cohorts that required less than 3 genes in the competing risk factor classifiers, we used the 3-gene core to present the diagnostic view. GSE9692 is labeled with (A). (A-1) to (A-3) are Competing risk factor 1 (CF1), Competing risk factor 2 (CF2), and 3-gene-core (core), respectively. GSE13904 is labeled with (B). (B-1) to (B-3) are CF1, CF2, and core, respectively. GSE49757 is labeled with (D). (D-1) to (D-3) are CF1, CF2, and core, respectively. GSE65682 is labeled with (F). (F-1) to (F-3) are CF1, CF2, and core, respectively. GSE69528 is labeled with (G). (G-1, G-2) are CF1 and core, respectively. GSE154928 is labeled with (J). The self-collected cohort is labeled with (K). (K-1, K-2) are CF1 and core, respectively. GSE28750, GSE 57065, GSE95233, and GSE131761 only required 1 gene in their competing risk classifiers are labeled with (C,E,H,I) respectively.
Using CF2 in the 1st (GSE65682) dataset in Table 4 as an example, we have
and .
In the 2nd (GSE28750), 3rd (GSE57065), 4th (GSE95233), 5th (GSE69528), 6th (GSE131761) dataset, and the 11th (self-collected) dataset, a single classifier had sufficient high power (close to or reaching 100% accuracy, 100% sensitivity, and 100% specificity) to discriminate sepsis samples from healthy controls. We note that for the 2nd, 3rd, and 6th datasets, FCAR alone was enough to act as a classifier to discriminate sepsis samples and healthy controls. Two classifiers were needed to achieve similar high power for the 8th (GSE13904), 10th (GSE49757), and 1st (GSE65682) due to insufficient sensitivity of CF1 in those datasets. We note that for the 1st dataset, CF1 was created to adjust for a unique patient, and the result was that CF1 for the 1st dataset had poor overall sensitivity. We are unsure if this patient had a distinctly different genetic profile from other patients in the same cohort or if there was a recording error. The 8th (GSE9692) dataset showed three classifiers each to reach high classification power with 100% accuracy. All but two datasets (5th GSE69528 and 7th GSE154918) had a maximum classifier (CFmax) with 100% accuracy, 100% sensitivity, and 100% specificity.
Identification of classifiers based on DEGs: (2) severe sepsis versus sepsis
As discussed in the Introduction, numerous sepsis-related genes have been reported in the literature but often lack cohort-to-cohort cross-validation. Our study successfully cross-validated a miniature gene set across 11 cohorts using healthy populations as controls, demonstrating their high informativeness and reliability as sepsis biomarkers.
To test whether this gene set remains informative in the progression from sepsis to severe sepsis, we analyzed plasma data from GSE49757, comprising 20 severe sepsis samples and 15 sepsis samples. We found that a combination of four genes (NONO, CKAP4, PLEKHO1, and BMP6) achieved a differentiation accuracy of 97.14%, with a sensitivity of 100% and a specificity of 93.33%. This indicates that the miniature gene set retains its intrinsic value, regardless of the control used. These results further confirm that this gene set is applicable for studying sepsis progression.
Identification of classifiers based on DEGs: (3) septic shock versus sepsis
We used 19 septic shock samples and 20 sepsis samples from whole blood data in GSE154918 to test the miniature gene set. The combination of four genes (NONO, CKAP4, RNF4, and FCAR) achieved a differentiation accuracy of 92.31%, with a sensitivity of 94.74% and a specificity of 90.00%. Using our new plasma data (11 septic shock samples and 21 sepsis samples), the combinations of five genes (NONO, CKAP4, FCAR, PLEKHO1, BMP6) reached an overall accuracy of 93.75%, with a sensitivity of 81.82% and a specificity of 100.00%. Once again, these results demonstrate that the miniature gene set retains its intrinsic value, regardless of the control used. These findings further confirm the utility of this gene set for studying the progression of sepsis.
Interpretation of gene variations and the clinical syndrome of sepsis reflected by the classifiers
In the formulas for classifiers, we can observe + and – coefficient signs for different genes. For genes in a classifier, + indicates that upregulation of that gene increases the risk of that patient being classified as “sepsis,” while − indicates that downregulation of that gene increases the risk of that patient being classified as “sepsis.” We note that many published works did not discuss the fitted coefficient signs, so their corresponding genes’ actual functions remain unclear.
The whole blood samples of adult patients
We note that for the seven datasets (GSE65682, GSE28750, GSE57065, GSE95233, GSE69528, GSE131761, and GSE154918) that recorded RNA expression collected in whole blood samples, their classifiers shared the same panel of 4 genes (CKAP4, RNF4, FCAR, and NONO) with the same core of three genes: CKAP4, RNF4, and FCAR. We also note that the classifiers of all but one of 1,413 these samples shared the same + or − signs for the core of three genes; the 1 sample that stood out was unique in its gene expression patterns (alternatively might be recording error), and a particular classifier was built to accommodate the difference. We argue that the consistency shown in the classifiers of different cohorts reflects an underlying pattern in the pathology of sepsis on the genomic level. In the case of these samples, a pattern consisting of upregulation of CKAP4 and FCAR along with downregulation of RNF4 decisively discriminates between sepsis patients and healthy samples. Such phenomena are also reflected in Figure 4 (survival probabilities).

Figure 4. Survival analysis for sepsis patients. These Kaplan–Meier plots are produced with survival data from GSE65682. The red curve and blue curve represent patients with high expression and low expression, respectively. We define high as greater than the 70th quantile, low as less than 30th quantile, in the final analysis we included 456 out of 760 patients in the cohort. The patterns are consistent with the fitted model coefficient signs for GSE65682 in Table 4. We see that the higher the NONO (RNF4) expression values are, the longer the survival time will be; the lower the CKAP4 expression values will be, the longer the survival time will be. Interestingly, FCAR does not lead to a significant pattern, which is also consistent with the model fitting as in Table 4 (GSE65682) CF2 does not contain FCAR in its gene–gene interactions. These observations may lead to clues about sepsis genetic treatments and drug developments.
The plasma samples of adult patients
Furthermore, we note that in 2 datasets (GSE49757 and self-collected) that recorded RNA expression of sepsis plasma samples, an opposite/reversed pattern could be observed for the same core of three genes. In these samples, the downregulation of CKAP4 and FCAR and the upregulation of RNF4 increase the risk of a sample being classified as “sepsis.”
The whole blood samples of pediatric patients
Finally, in the two pediatric datasets (GSE9692 and GSE13904), where RNA expression data was recorded from whole blood samples, the same expression pattern was observed with adult counterparts in the core of three genes. Also, a uniform pattern of downregulation of gene OGFOD3 increases the risk of sepsis.
Homogeneities across whole blood, plasma, adult patients, and pediatric patients
Overall, we report that a change in CKAP4, FCAR, and RNF4 expression patterns is a key feature of sepsis on the genomic level. In whole blood expression patterns, a consistent upregulation of CKAP4 and FCAR with downregulation of RNF4 showed a remarkably strong association with sepsis, reaching 100% accuracy, 100% sensitivity, and 100% specificity across different cohorts from a variety of socioeconomic backgrounds and ethnicity groups. The same pattern with the three genes (CKAP4, FCAR, and RNF4) could be extended to pediatric cohorts, while an opposite pattern was observed in adult plasma samples.
Heterogeneities between whole blood and plasma
We note that RNA expressions recorded from whole blood samples and plasma samples are expected to differ. They need to be balanced among cell-free RNAs and cell-led RNAs to remain healthy. We suspect that the opposite/reversed pattern may be attributed to the location where these genes are differentially expressed in a sepsis patient. We note that previous literature reported that CKAP4, RNF4, and FCAR had an association with immunity response. We hypothesize that the variations in gene expression in sepsis patients occur in specific cells in the blood and then reflect a pattern of upregulation of genes CKAP4 and FCAR in whole blood samples along with downregulation of RNF4. We hypothesize that this pattern coincides with an overwhelmed immune system. On the contrary, these cells are removed in plasma, and we observe different gene–gene interactions compared to those in the whole blood, with some coefficient signs changing in opposite directions. In addition, two other genes, PLEKHO1 and BMP6, are identified to interact with the miniature three genes. Such a phenomenon reveals that these three genes are pivotal in the progression of sepsis. They need to be balanced among cell-free RNAs and cell-led RNAs to remain sepsis-free. This finding is new in the literature and can potentially lead to new sepsis therapies.
Heterogeneities between adult and pediatric
Looking at GSE9692 in Table 4, we can immediately see that pediatric sepsis patients have simpler gene–gene interactions in whole blood compared to those of adult sepsis patients. The pediatric patients’ gene–gene interactions are from a pair of genes, e.g., CKAP4 and OGFOD3, RNF4 and OGFOD3, or CKAP4 and RNF4, to achieve 100% accuracy, while the adult patients need three genes. This observation may coincide with pediatric patients being more vulnerable to infection. In GSE13904, another gene, RNASE2, shows its pivotal function in sepsis infection, although it was not critical in adult patients.
From sepsis to severe sepsis
We observe a clear relationship between gene expression and sepsis severity based on the plasma analysis from GSE49757 and the coefficient signs for the genes NONO, CKAP4, PLEKHO1, and BMP6 (as shown in Table 4). Specifically, increased expression of NONO and BMP6 correlates with a greater likelihood of severe sepsis. Conversely, lower expression of CKAP4 and PLEKHO1 also associates with increased sepsis severity, suggesting a complex, possibly non-linear interaction in their regulatory role. Among all sepsis patients, the expression patterns of CKAP4 and PLEKHO1 do not show significant linearity, further supporting the idea that certain genes contribute uniquely to disease progression depending on their interaction networks and expression thresholds. These results underscore the importance of interpreting gene expression profiles contextually, as the same gene may exhibit varied impacts across different stages of sepsis, necessitating a deeper exploration into the roles of NONO and BMP6 in immune response and cellular stress mechanisms.
From sepsis to septic shock
The transition from sepsis to septic shock, as analyzed using GSE154918, highlights critical changes in the expression of the genes NONO, CKAP4, RNF4, and FCAR (refer to Table 4 for coefficients). Here, we observe that elevated expression of NONO, CKAP4, and FCAR strongly correlates with an increased risk of progression to septic shock. In contrast, RNF4 behaves inversely, where decreased expression is linked to heightened severity, suggesting a possible protective or compensatory role when downregulated. This gene signature, showing similarities to the patterns seen from healthy control to sepsis, suggests shared molecular pathways between the early onset and the severe stages of the condition. Furthermore, this alignment emphasizes the potential for certain genes, like NONO and CKAP4, to serve as robust markers across the full spectrum of sepsis severity.
Interestingly, while the plasma sample analysis provides valuable insights, the whole blood analysis offers even greater biological relevance. Whole blood contains a more comprehensive representation of the immune response and inflammatory signaling, making it more informative than plasma alone. This enhanced depth of information may provide additional layers of understanding of the pathophysiological shifts occurring from sepsis to septic shock, enabling more precise biomarker identification and therapeutic targeting.
Peripheral blood mononuclear cells insights
In this section, we extend our analysis to a gene expression profiling dataset of peripheral blood mononuclear cells (PBMCs), i.e., GSE9960 (27). This dataset includes 54 sepsis patients and 16 healthy controls. Among 54 sepsis patients, the distributions are: 9 sepsis from non-infectious causes of systemic inflammatory response syndrome, 11 Gram-positive, 18 Gram-negative sepsis, and 10 mixed infections.
Using genes reported in Table 4, we can achieve ≥80% accuracy, sensitivity, and specificity. When including the core gene RNF4 in Table 4 and four additional genes (HTR2C, AC126474.2, CHCHD4, 244479_at), we can achieve 95.71% accuracy, 96.30% sensitivity, and 93.75% specificity using the following two max-logistic classifiers and their combination.
Here, HTR2C (5-Hydroxytryptamine Receptor 2C) is a Protein Coding gene. CHCHD4 (Coiled-Coil-Helix-Coiled-Coil-Helix Domain Containing 4) is a Protein Coding gene. AC126474.2 is an lncRNA gene. The probe set ID 244479_at has not yet been associated with a gene symbol.
Our analysis shows that the gene expression values obtained from whole blood, plasma, and PBMC samples exhibit significant differences, primarily influenced by sample composition, cell type proportions, and RNA degradation. As a result, selecting the appropriate sampling method is crucial. PBMC is more suitable for immunological studies, plasma is ideal for cell-free RNA research, and whole blood provides a more comprehensive but noisier expression profile.
Discussion
Sepsis continues to be a major global health burden, with millions of cases each year and high mortality rates, particularly in intensive care units (ICUs) (1, 2). Although significant strides have been made in understanding the pathophysiology of sepsis, the long-term survival and quality of life of sepsis survivors remain critical challenges (3, 28, 29). Recent advances in transcriptomic profiling have provided a promising avenue for understanding the genomic underpinnings of sepsis and guiding the development of more targeted therapies (30). However, several inherent limitations in current transcriptomic studies of sepsis—such as small sample sizes, gene–gene interaction complexities, and inadequate differentiation between disease stages—limit their clinical utility. Numerous studies have targeted these limitations. Still, critical genetic biomarkers for sepsis remain unidentified, largely due to flawed animal models and patient selection over the past 30 years (17). Our study seeks to address this gap. In the meantime, subgroups of sepsis can significantly impact analysis results (31). Unlike traditional classification approaches, the max-logistic classifier inherently accounts for subgroups by employing competing classifiers. This method has been mathematically proven to be robust against study population heterogeneity in previous research on lung and colorectal cancers (22, 23).
In response to these challenges, we developed an AI-driven model that bridges classical methods and advanced machine learning to identify key differentially expressed genes (DEGs) involved in sepsis progression. Our model addresses the limitations of existing AI approaches, including the “black box” nature of many machine learning algorithms and the biases inherent in training processes (18, 20, 32). Unlike most AI models, which primarily emphasize inductive reasoning, our approach balances deductive, inductive, and abductive reasoning, allowing for more transparent and biologically interpretable results. This hybrid reasoning approach has been detailed in our previous work (21, 33), and the model has been validated in studies of various cancers and infectious diseases (22–25).
In our study, gene–gene interaction is not defined in the conventional biological sense, such as physical interaction, pathway co-membership, or co-expression. Instead, we introduce a mathematically grounded definition rooted in the max-logistic competing factor model. In this framework, gene–gene interactions are interpreted through the lens of competing combinations of genes, where specific subsets work together within a mathematical structure to compete for predictive power in distinguishing sepsis states.
These interactions are characterized by their coefficients (signs and magnitudes), combinatorial grouping, and their ability to outcompete other gene combinations in classification tasks. The presence of a gene in a dominant competing factor, along with the sign and strength of its coefficient, reflects its functional role in synergy or antagonism with other genes within that factor. This approach resembles quantum models in physics—where the outcome is determined by the configuration and interaction of components within a complex system—not simply by pairwise association.
While this perspective may differ from classical biological interaction models, it provides a reproducible, interpretable, and high-accuracy framework for capturing functional relationships among genes in disease progression, especially across heterogeneous populations. We hope this mathematical view complements and inspires further biological investigation into the mechanisms underlying these interactions.
Key findings and control group considerations
One of the most important aspects of our study is the rigorous selection of control groups. Previous sepsis transcriptomic studies have predominantly compared sepsis patients to healthy controls, often with high diagnostic accuracy for specific gene sets. However, distinguishing healthy individuals from critically ill patients admitted to the ICU is relatively straightforward from a clinical perspective. The real challenge lies in identifying subpopulations within sepsis patients—such as those with severe sepsis or septic shock—that may respond differently to targeted treatments. By including these more clinically relevant subsets in our analysis, we aimed to address this gap in the literature.
Our model identified a set of eight critical DEGs, divided into three panels, which exhibit consistent patterns across different stages of sepsis. These patterns are particularly informative when comparing sepsis to more advanced conditions such as severe sepsis and septic shock. The inclusion of patients with severe sepsis and septic shock allowed us to more finely tune the gene expression profiles associated with worsening disease. For instance, the gene NONO showed varying expression trends depending on disease severity, with upregulation correlating with increased severity in certain populations, while downregulation had a similar effect in others. This underscores this gene’s complex, context-dependent role in sepsis pathology (Table 4).
Analytic transparency and model development
Our AI model operates by systematically identifying linear combinations of genes that maximize the risk prediction of sepsis and its severe forms. Unlike traditional analyses, which often pool cohorts or fail to account for batch effects, we treated each of the 11 cohorts independently to avoid such issues. The diversity of the cohorts, which included over 1800 samples from different ethnic populations, age groups, and experimental platforms, presented a significant challenge. However, our max-logistic classifiers allowed for robust cohort-to-cohort cross-validation without the need for data pooling, thereby reducing the risk of biases due to batch effects (22).
Furthermore, our model development process was guided by a set of seven rules designed to ensure the identification of concise, precise, and generalizable DEGs. These rules exceed previous standards in the literature by emphasizing the importance of gene–gene interactions and the biological relevance of identified DEGs. By defining critical DEGs based on these stringent criteria, we have identified eight genes that demonstrate high sensitivity and specificity across diverse cohorts (34, 35). These findings provide a strong foundation for future applications of our model in precision medicine and risk stratification for sepsis patients.
Biological relevance of identified DEGs
As discussed in the Introduction, the field of sepsis research has produced a vast array of published sepsis-related genes. This abundance of identified genes poses a significant challenge for precision medicine, as it complicates the development of targeted therapies. Our study addresses this issue by significantly reducing the gene set to a minimum single-digit level, which enhances the feasibility of precision therapeutic targeting. Importantly, we emphasize that the genes in our panel do not act in isolation; rather, they function interactively, demonstrating a synergistic relationship. This is a crucial advancement because while other researchers have observed the individual effects of some of these genes, our focus is on how their combined interactions drive sepsis pathophysiology. This gene–gene synergy underlines the novel approach of our research, offering a more comprehensive understanding of sepsis mechanisms and potential interventions. Below, we summarize the findings of the literature on the individual effects of three key genes in our set.
Each gene in our panels plays a critical role in sepsis pathophysiology. For example, FCAR encodes the Fc alpha receptor, which mediates immune responses by binding to immunoglobulin A (IgA) and promoting the release of pro-inflammatory cytokines (36–39). This receptor is essential in the immune defense against bacterial infections, particularly in the early stages of sepsis, where the immune system oscillates between hyperactivation and immunosuppression (40–43). The upregulation of FCAR in certain cohorts strongly distinguished sepsis patients from healthy controls, highlighting its potential as both a diagnostic biomarker and a therapeutic target (44, 45).
Similarly, RNF4, a RING finger E3 ubiquitin ligase, plays a pivotal role in the ubiquitination and proteasomal degradation of polysumoylated proteins. This process is critical in regulating inflammation, metabolism, and cell death—key mechanisms involved in sepsis pathogenesis (46–49). The downregulation of RNF4 in patients with septic shock suggests a potential protective role, where reduced degradation of substrates like PARP1 might mitigate excessive inflammatory responses, a hypothesis that warrants further investigation.
CKAP4, another gene identified in our panels, is a type II transmembrane protein that has been implicated in various inflammatory diseases. Recent studies have also highlighted its role as a receptor for the SARS-CoV-2 spike protein, linking it to both viral pathogenesis and thrombosis (50–53). Given its interaction with NF-κB, a key regulator of inflammation, the upregulation of CKAP4 in sepsis patients could provide a novel link between inflammatory signaling pathways and the progression of sepsis.
Although the individual roles of these three genes have been acknowledged in the progression of sepsis, it is critical to stress that none of them should be examined in isolation. The defining feature of our work is the demonstration of synergistic effects among the genes in our set, distinguishing our findings from previous studies, which have largely concentrated on individual gene fold changes. This singular focus on individual changes can be misleading, as it fails to capture the dynamic interactions that occur at the gene network level. The strength of our work lies in its exploration of these interactions, providing a more accurate and holistic view of sepsis biology. Future research should explore whether the downstream protein levels and pathways regulated by these genes contribute to sepsis severity, opening new avenues for potential therapeutic strategies.
Ethnic variations and limitations
Our study also uncovered significant ethnic variations in gene expression patterns, further emphasizing sepsis pathophysiology’s complexity. For example, we observed that the downregulation of the gene NONO was linked to an increased risk of sepsis in Thai and Chinese cohorts, whereas in Australian and German cohorts, the opposite was true—upregulation of NONO was associated with a heightened sepsis risk. These findings strongly suggest that genetic or environmental factors may shape the way sepsis manifests in different populations, and crucially, these variations are not isolated to individual genes but are evident in the synergistic gene–gene interactions we have highlighted. The population-specific signatures of gene synergy warrant further investigation to deepen our understanding of the molecular mechanisms driving sepsis in diverse populations.
Additionally, our model provided indications of potential subtypes within the same cohort, potentially reflecting different infection sources or underlying biological mechanisms. While this is an intriguing finding, more data is necessary to fully confirm the subtypes and its implications for developing personalized treatment strategies. Identifying such subtypes could pave the way for more tailored approaches to sepsis care, accounting for both genetic background and the nature of the infection, thus enhancing the precision of medical interventions.
Despite the strengths of our model, several limitations must be acknowledged. The classification of sepsis has evolved over the past two decades, resulting in some inconsistencies in cohort categorization. Additionally, due to data availability, our study sourced data from the US, Europe, and China, which may limit the generalizability of our findings to much broader populations. Future studies should aim to include more diverse cohorts and explore the causality of gene expression changes in sepsis through longitudinal genomic analyses.
Conclusion
Our study provides a novel and stringent approach to identifying critical DEGs in sepsis, utilizing an AI-driven model that outperforms existing methods in terms of accuracy, sensitivity, and specificity. While further validation is necessary, particularly in the form of wet lab studies, our findings offer new insights into the molecular mechanisms underlying sepsis and lay the groundwork for the development of precision diagnostic tools and targeted therapies. Sepsis remains one of the most pressing challenges in global health, and our work represents an important step toward overcoming this formidable disease.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.
Ethics statement
The studies involving humans were approved by the Clinical Research Institution Review (IRB) Committee and Ethics Review Committee of Wuhan University Renmin Hospital (WDRY2022-K105). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
QS: Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Visualization, Writing – original draft, Writing – review & editing. JH: Data curation, Investigation, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. YZ: Data curation, Investigation, Resources, Validation, Visualization, Writing – review & editing. ZLi: Investigation, Resources, Validation, Visualization, Writing – review & editing. ZLv: Data curation, Resources, Writing – review & editing. CZ: Conceptualization, Methodology, Resources, Validation, Visualization, Writing – review & editing. CL: Conceptualization, Methodology, Resources, Supervision, Validation, Visualization, Writing – review & editing. HS: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing. LZ: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing. ZZ: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the National Natural Science Foundation of China (72442027, 71991471, 82372232, 81902141), the Natural Science Foundation of Hubei Province (2022CFB127), the Traditional Chinese Medicine project of Hubei Province’s Health Commission (ZY2023M022), the Interdisciplinary Innovative Talents Foundation from Wuhan University Renmin Hospital (JCRCFZ-2022-023).
Acknowledgments
The authors appreciate editors and four reviewers for their insightful comments and the authors who made their data public available.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1521827/full#supplementary-material
References
1. Rudd, KE, Johnson, SC, Agesa, KM, Shackelford, KA, Tsoi, D, Kievlan, DR, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017: analysis for the global burden of disease study. Lancet. (2020) 395:200–11. doi: 10.1016/S0140-6736(19)32989-7
2. World Health Organization. Global report on the epidemiology and burden of sepsis: current evidence, identifying gaps and future directions. Geneva: World Health Organization (2020). Available at: https://www.who.int/publications/i/item/9789240010789
3. Cecconi, M, Evans, L, Levy, M, and Rhodes, A. Sepsis and septic shock. Lancet. (2018) 392:75–87. doi: 10.1016/S0140-6736(18)30696-2
4. Kalantar, KL, Neyton, L, Abdelghany, M, Mick, E, Jauregui, A, Caldera, S, et al. Integrated host-microbe plasma metagenomics for sepsis diagnosis in a prospective cohort of critically ill adults. Nat Microbiol. (2022) 7:1805–16. doi: 10.1038/s41564-022-01237-2
5. van der Poll, T, Shankar-Hari, M, and Wiersinga, WJ. The immunology of sepsis. Immunity. (2021) 54:2450–64. doi: 10.1016/j.immuni.2021.10.012
6. Burnham, KL, Davenport, EE, Radhakrishnan, J, Humburg, P, Gordon, AC, Hutton, P, et al. Shared and distinct aspects of the Sepsis transcriptomic response to fecal peritonitis and pneumonia. Am J Respir Crit Care Med. (2017) 196:328–39. doi: 10.1164/rccm.201608-1685OC
7. Singer, M, Deutschman, CS, Seymour, CW, Shankar-Hari, M, Annane, D, Bauer, M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.2016.0287
8. Scicluna, BP, Klein Klouwenberg, PM, van Vught, LA, Wiewel, MA, Ong, DS, Zwinderman, AH, et al. A molecular biomarker to diagnose community-acquired pneumonia on intensive care unit admission. Am J Respir Crit Care Med. (2015) 192:826–35. doi: 10.1164/rccm.201502-0355OC
9. Sutherland, A, Thomas, M, Brandon, RA, Brandon, RB, Lipman, J, Tang, B, et al. Development and validation of a novel molecular biomarker diagnostic test for the early detection of sepsis. Crit Care. (2011) 15:R149. doi: 10.1186/cc10274
10. Tabone, O, Mommert, M, Jourdan, C, Cerrato, E, Legrand, M, Lepape, A, et al. Endogenous retroviruses transcriptional modulation after severe infection, trauma and burn. Front Immunol. (2018) 9:3091. doi: 10.3389/fimmu.2018.03091
11. Pankla, R, Buddhisa, S, Berry, M, Blankenship, DM, Bancroft, GJ, Banchereau, J, et al. Genomic transcriptional profiling identifies a candidate blood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol. (2009) 10:R127. doi: 10.1186/gb-2009-10-11-r127
12. Martinez-Paz, P, Aragon-Camino, M, Gomez-Sanchez, E, Lorenzo-López, M, Gómez-Pesquera, E, Fadrique-Fuentes, A, et al. Distinguishing septic shock from non-septic shock in postsurgical patients using gene expression. J Infect. (2021) 83:147–55. doi: 10.1016/j.jinf.2021.05.039
13. Herwanto, V, Tang, B, Wang, Y, Shojaei, M, Nalos, M, Shetty, A, et al. Blood transcriptome analysis of patients with uncomplicated bacterial infection and sepsis. BMC Res Notes. (2021) 14:76. doi: 10.1186/s13104-021-05488-w
14. Wong, HR, Cvijanovich, N, Allen, GL, Lin, R, Anas, N, Meyer, K, et al. Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med. (2009) 37:1558–66. doi: 10.1097/CCM.0b013e31819fcc08
15. Cvijanovich, N, Shanley, TP, Lin, R, Allen, GL, Thomas, NJ, Checchia, P, et al. Validating the genomic signature of pediatric septic shock. Physiol Genomics. (2008) 34:127–34. doi: 10.1152/physiolgenomics.00025.2008
16. Khaenam, P, Rinchai, D, Altman, MC, Chiche, L, Buddhisa, S, Kewcharoenwong, C, et al. A transcriptomic reporter assay employing neutrophils to measure immunogenic activity of septic patients’ plasma. J Transl Med. (2014) 12:65. doi: 10.1186/1479-5876-12-65
17. Cavaillon, JM, Singer, M, and Skirecki, T. Sepsis therapies: learning from 30 years of failure of translational research to propose new leads. EMBO Mol Med. (2020) 12:e10128. doi: 10.15252/emmm.201810128
18. Bathaee, Y. The artificial Intelligence black box and the failure of intent and causation. Harvard J Law Technol. (2018) 31:889–38.
19. Savage, N. Breaking into the black box of artificial intelligence. Nature. (2022). doi: 10.1038/d41586-022-00858-1
20. Norstad, M, Outram, S, Brown, JEH, Zamora, AN, Koenig, BA, Risch, N, et al. The difficulties of broad data sharing in genomic medicine: empirical evidence from diverse participants in prenatal and pediatric clinical genomics research. Genet Med. (2022) 24:410–8. doi: 10.1016/j.gim.2021.09.021
21. Cui, QR, and Zhang, ZJ. Max-linear competing factor models. J Bus Econ Stat. (2018) 36:62–74. doi: 10.1080/07350015.2015.1137761
22. Liu, YJ, Xu, YQ, Li, XX, Chen, M, Wang, X, Zhang, N, et al. Towards precision oncology discovery: four less known genes and their unknown interactions as highest-performed biomarkers for colorectal cancer. Npj precis. Oncologia. (2024) 8:1310.1038/s41698-024-00512-1. doi: 10.1038/s41698-024-00512-1
23. Zhang, ZJ. Functional effects of four or fewer critical genes linked to lung cancers and new subtypes detected by a new machine learning classifier. J Clin Trials. (2021) S14:1.
24. Zhang, ZJ. Lift the veil of breast cancers using 4 or fewer critical genes. Cancer Inform. (2022) 21:11769351221076360. doi: 10.1177/11769351221076360
25. Zhang, ZJ. The existence of at least three genomic signature patterns and at least seven subtypes of COVID-19 and the end of the disease. Vaccines-Basel. (2022) 10:761. doi: 10.3390/vaccines10050761
26. Kawasaki, T. Update on pediatric sepsis: a review. J Intensive Care. (2017) 5:47. doi: 10.1186/s40560-017-0240-1
27. Tang, BM, McLean, AS, Dawes, IW, Huang, SJ, and Lin, RCY. Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Crit Care Med. (2009) 37:882–8. doi: 10.1097/CCM.0b013e31819b52fd
28. Mayr, FB, Yende, S, and Angus, DC. Epidemiology of severe sepsis. Virulence. (2014) 5:4–11. doi: 10.4161/viru.27372
29. Gaieski, DF, Edwards, JM, Kallan, MJ, and Carr, BG. Benchmarking the incidence and mortality of severe sepsis in the United States. Crit Care Med. (2013) 41:1167–74. doi: 10.1097/CCM.0b013e31827c09f8
30. Wang, Z, Gerstein, M, and Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. (2009) 10:57–63. doi: 10.1038/nrg2484
31. Yang, J, Zhang, B, Hu, C, Jiang, X, Shui, P, Huang, J, et al. Identification of clinical subphenotypes of sepsis after laparoscopic surgery. Laparosc Endosc Robot Surg. (2024) 7:16–26. doi: 10.1016/j.lers.2024.02.001
32. Dias, R, and Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. (2019) 11:70. doi: 10.1186/s13073-019-0689-8
33. Cui, QR, Xu, YQ, Zhang, ZJ, and Chan, V. Max-linear regression models with regularization. J Econ. (2021) 222:579–600. doi: 10.1016/j.jeconom.2020.07.017
34. Power, M, Fell, G, and Wright, M. Principles for high-quality, high-value testing. Evid Based Med. (2013) 18:5–10. doi: 10.1136/eb-2012-100645
35. Zhao, B, Erwin, A, and Xue, B. How many differentially expressed genes: a perspective from the comparison of genotypic and phenotypic distances. Genomics. (2018) 110:67–73. doi: 10.1016/j.ygeno.2017.08.007
36. Kerr, MA. The structure and function of human IgA. Biochem J. (1990) 271:285–96. doi: 10.1042/bj2710285
37. Monteiro, RC, and Van De Winkel, JG. IgA Fc receptors. Annu Rev Immunol. (2003) 21:177–204. doi: 10.1146/annurev.immunol.21.120601.141011
38. Woof, JM, and Kerr, MA. The function of immunoglobulin a in immunity. J Pathol. (2006) 208:270–82. doi: 10.1002/path.1877
39. Bakema, JE, and van Egmond, M. The human immunoglobulin a fc receptor FcαRI: a multifaceted regulator of mucosal immunity. Mucosal Immunol. (2011) 4:612–24. doi: 10.1038/mi.2011.36
40. Chiamolera, M, Launay, P, Montenegro, V, Rivero, MC, Velasco, IT, and Monteiro, RC. Enhanced expression of fc alpha receptor I on blood phagocytes of patients with gram-negative bacteremia is associated with tyrosine phosphorylation of the FcR-gamma subunit. Shock. (2001) 16:344–8. doi: 10.1097/00024382-200116050-00004
41. Shen, L, Collins, JE, Schoenborn, MA, and Maliszewski, CR. Lipopolysaccharide and cytokine augmentation of human monocyte IgA receptor expression and function. J Immunol. (1994) 152:4080–6. doi: 10.4049/jimmunol.152.8.4080
42. Ben Mkaddem, S, Rossato, E, Heming, N, and Monteiro, RC. Anti-inflammatory role of the IgA fc receptor (CD89): from autoimmunity to therapeutic perspectives. Autoimmun Rev. (2013) 12:666–9. doi: 10.1016/j.autrev.2012.10.011
43. Rossato, E, Ben Mkaddem, S, Kanamaru, Y, Hurtado-Nedelec, M, Hayem, G, Descatoire, V, et al. Reversal of arthritis by human monomeric IgA through the receptor-mediated SH2 domain-containing phosphatase 1 inhibitory pathway. Arthritis Rheumatol. (2015) 67:1766–77. doi: 10.1002/art.39142
44. Wehrli, M, Cortinas-Elizondo, F, Hlushchuk, R, Daudel, F, Villiger, PM, Miescher, S, et al. Human IgA fc receptor FcαRI (CD89) triggers different forms of neutrophil death depending on the inflammatory microenvironment. J Immunol. (2014) 193:5649–59. doi: 10.4049/jimmunol.1400028
45. Bernard, GR, Vincent, JL, Laterre, PF, LaRosa, SP, Dhainaut, JF, Lopez-Rodriguez, A, et al. Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med. (2001) 344:699–709. doi: 10.1056/nejm200103083441001
46. Kumar, R, and Sabapathy, K. RNF4-a paradigm for SUMOylation-mediated ubiquitination. Proteomics. (2019) 19:e1900185. doi: 10.1002/pmic.201900185
47. Wasyluk, W, and Zwolak, A. PARP inhibitors: an innovative approach to the treatment of inflammation and metabolic disorders in Sepsis. J Inflamm Res. (2021) 14:1827–44. doi: 10.2147/jir.S300679
48. Yu, JT, Hu, XW, Yang, Q, Shan, RR, Zhang, Y, Dong, ZH, et al. Insulin-like growth factor binding protein 7 promotes acute kidney injury by alleviating poly ADP ribose polymerase 1 degradation. Kidney Int. (2022) 102:828–44. doi: 10.1016/j.kint.2022.05.026
49. Krastev, DB, Li, S, Sun, Y, Wicks, AJ, Hoslett, G, Weekes, D, et al. The ubiquitin-dependent ATPase p97 removes cytotoxic trapped PARP1 from chromatin. Nat Cell Biol. (2022) 24:62–73. doi: 10.1038/s41556-021-00807-6
50. Kimura, H, Fumoto, K, Shojima, K, Nojima, S, Osugi, Y, Tomihara, H, et al. CKAP4 is a Dickkopf1 receptor and is involved in tumor progression. J Clin Invest. (2016) 126:2689–705. doi: 10.1172/jci84658
51. Li, K, Yao, L, Wang, J, Song, H, Zhang, YH, Bai, X, et al. SARS-CoV-2 spike protein promotes vWF secretion and thrombosis via endothelial cytoskeleton-associated protein 4 (CKAP4). Signal Transduct Target Ther. (2022) 7:332. doi: 10.1038/s41392-022-01183-9
52. Li, X, Wang, J, Zhu, S, Zheng, J, Xie, Y, Jiang, H, et al. DKK1 activates noncanonical NF-κB signaling via IL-6-induced CKAP4 receptor in multiple myeloma. Blood Adv. (2021) 5:3656–67. doi: 10.1182/bloodadvances.2021004315
Keywords: gene interaction, biomarkers, AI, disease detection, progression
Citation: Su Q, Huang J, Zhang Y, Liu Z, Lv Z, Zhang C, Ling C, Su H, Zhan L and Zhang Z (2025) AI-driven discovery of minimal sepsis biomarkers for disease detection and progression: precision medicine across diverse populations. Front. Med. 12:1521827. doi: 10.3389/fmed.2025.1521827
Edited by:
Nozomi Takahashi, University of British Columbia, CanadaReviewed by:
Zhongheng Zhang, Sir Run Shaw Hospital, ChinaMaria Paparoupa, University Medical Center Hamburg-Eppendorf, Germany
Rujipat Samransamruajkit, Chulalongkorn University, Thailand
Feng Shen, Affiliated Hospital of Guizhou Medical University, China
Copyright © 2025 Su, Huang, Zhang, Liu, Lv, Zhang, Ling, Su, Zhan and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hanwen Su, aGFud2Vuc3VAd2h1LmVkdS5jbg==; Liying Zhan, emhhbmxpeWluZ0B3aHUuZWR1LmNu; Zhengjun Zhang, emp6QHN0YXQud2lzYy5lZHU=