Abstract
Background:
Tuberculosis (TB) remains a significant global health challenge, necessitating reliable biomarkers for differentiation between latent tuberculosis infection (LTBI) and active tuberculosis (ATB). This study aimed to identify blood-based biomarkers differentiating LTBI from ATB through multicohort analysis of public datasets.
Methods:
We systematically screened 18 datasets from the NIH Gene Expression Omnibus (GEO), ultimately including 11 cohorts comprising 2,758 patients across 8 countries/regions and 13 ethnicities. Cohorts were stratified into training (8 cohorts, n = 1,933) and validation sets (3 cohorts, n = 825) based on functional assignment.
Results:
Through Upset analysis, LASSO (Least Absolute Shrinkage and Selection Operator), SVM-RFE (Support Vector Machine Recursive Feature Elimination), and MCL (Markov Cluster Algorithm) clustering of protein–protein interaction networks, we identified S100A12 and S100A8 as optimal biomarkers. A Naive Bayes (NB) model incorporating these two markers demonstrated robust diagnostic performance: training set AUC: median = 0.8572 (inter-quartile range 0.8002, 0.8708), validation AUC = 0.5719 (0.51645, 0.7078), and subgroup AUC = 0.8635 (0.8212, 0.8946).
Conclusion:
Our multicohort analysis established an NB-based diagnostic model utilizing S100A12/S100A8, which maintains diagnostic accuracy across diverse geographic, ethnic, and clinical variables (including HIV co-infection), highlighting its potential for clinical translation in LTBI/ATB differentiation.
1 Introduction
Tuberculosis (TB) remains a leading global cause of morbidity and mortality, ranking as the top fatal infectious disease before the COVID-19 pandemic, surpassing even HIV/AIDS (Chen et al., 2024; An et al., 2025; Zhuang et al., 2024b). Despite being diagnosable, preventable, and treatable, persistent diagnostic challenges contribute to its high disease burden (Fortún and Navas, 2022). Current diagnostic approaches primarily rely on tuberculin skin tests (TST, Diaskintest, C-Tb, EC-test) and interferon-gamma release assays (IGRAs: T-SPOT.TB, QFT-GIT, QFT-Plus, LIASON QFT-Plus, LIOFeron TB/LTBI) (Gong and Wu, 2021; Li et al., 2024; Li et al., 2023). While these methods effectively distinguish active TB (ATB) from healthy controls (HCs), they lack precision in differentiating latent TB infection (LTBI) from ATB (Peng et al., 2024; Cheng et al., 2023; Wang et al., 2024; Jiang et al., 2023a; Jiang et al., 2023c; Jiang et al., 2023d).
To address this gap, the World Health Organization (WHO) has outlined target product profiles for novel diagnostics requiring: (1) non-sputum sampling (e.g., blood), (2) > 80% sensitivity in HIV co-infected patients, (3) > 66% sensitivity in pediatric culture-positive TB, and (4) operational simplicity [Global Programme on Tuberculosis and Lung Health (GTB), 2014]. This has spurred investigations into blood-based biomarkers using microarray technologies (Lu et al., 2019; Natarajan et al., 2022; Shao et al., 2021), complemented by emerging approaches in epigenetics (Esterhuyse et al., 2015), urinary metabolomics (Deng et al., 2021), Raman spectroscopy (Kaewseekhao et al., 2020), sputum proteomics/microbiomics (HaileMariam et al., 2021), NMR-based metabolomics (Izquierdo-Garcia et al., 2020), and machine learning-driven multi-marker profiling (Wang et al., 2024; Robison et al., 2019).
Nevertheless, critical limitations persist. Few studies have validated biomarkers in cohorts exceeding 2,000 cases, with scant evaluation in HIV co-infected or pediatric populations. Most proposed markers lack clinical trial validation (Jiang et al., 2023e; Jiang et al., 2023b), and while histological data mining shows promise, few studies leverage advanced computational methods (e.g., machine/deep learning) to enhance biomarker reliability.
To overcome these constraints, we conducted the largest GEO-based multicohort analysis to date (n = 2,758 across 8 countries/regions), integrating machine learning with single-cell validation. This study systematically explores LTBI/ATB diagnostic biomarkers through the rigorous reuse of NIH GEO datasets, aiming to advance translational TB research.
2 Methods
2.1 Cohort acquisition and curation
We systematically queried the NIH Gene Expression Omnibus (GEO) using: ((“tuberculosis” [MeSH Terms] OR tuberculosis [All Fields]) OR TB [All Fields]) AND “Homo sapiens” [porgn] AND “GDS” [Filter].
2.1.1 Inclusion criteria
Studies involving whole or peripheral blood samples from patients with ATB (n = 11).
2.1.2 Exclusion criteria
Studies focused on vaccines or cell cultures, two-sample arrays, non-blood samples, datasets excluding S100 genes (e.g., GSE144127), inconsistencies in data format, or unavailable matrices (n = 7).
The final cohorts included 2,758 patients from 8 countries/regions and 13 ethnicities (Table 1). LTBI and ATB classifications were based on the original study protocols, with household contacts categorized as LTBI (non-progressors) versus ATB (progressors). Given the heterogeneity of the 11 included cohorts and differences in sequencing platforms, we did not integrate all expression profiles but instead processed each cohort’s expression data individually. Feature selection and model development were also performed separately for each dataset.
Table 1
| Classification of data sets by purpose | Name of datasets | Availability of GEO2R analysis | Total number of patients | Number of ATB Patients | Number of LTBI Patients | Number of patients enrolled | Organization sources | Number of DEGs obtainable by GEO2R analysis |
|---|---|---|---|---|---|---|---|---|
| Discovery | GSE37250 | Yes | 537 | 195 | 167 | 362 | Blood | 113 |
| GSE39939 | Yes | 157 | 79 | 14 | 93 | Blood | 284 | |
| GSE39940 | Yes | 334 | 111 | 54 | 165 | Blood | 264 | |
| GSE101705 | Yes | 44 | 28 | 16 | 44 | Blood | 1,126 | |
| GSE112104 | Yes | 51 | 29 | 21 | 50 | Blood | 3,389 | |
| GSE19491 | Yes | 498 | 75 | 69 | 144 | Blood | 190 | |
| GSE28623 | Yes | 108 | 46 | 25 | 71 | Blood | 821 | |
| GSE40553 | Yes | 204 | 166 | 38 | 204 | Blood | 151 | |
| Validation | GSE94438 | Yes | 434 | 101 | 327 | 428 | Blood | 53 |
| GSE79362 | Yes | 355 | 110 | 245 | 355 | Blood | 31 | |
| GSE84076 | Yes | 36 | 6 | 16 | 22 | Blood | 26 | |
| Excluded | GSE144127 | Yes | 628 | 301 | 13 | 314 | Blood | 20 |
| GSE83456 | Yes | 202 | / | / | / | Blood | / | |
| GSE62147 | Yes | 52 | / | / | / | Blood | / | |
| GSE41055 | Yes | 27 | / | / | / | Blood | / | |
| GSE34608 | Yes | 24 | / | / | / | Blood | / | |
| GSE84152 | Yes | 470 | / | / | / | Blood | / | |
| GSE107995 | No | 414 | / | / | / | Blood | / |
Basic information about the datasets.
2.2 Cohort stratification
Differential expression analysis (LTBI vs. ATB) identified genes with |logFC| ≥ 1 and adjusted p ≤ 0.05. Training set selection prioritized cohorts with consistent DEG numbers (8 cohorts, n = 1,933), while the validation set comprised outliers (3 cohorts, n = 825).
2.3 Training set analysis pipeline
Stable differential genes (SDGs) were defined as genes recurrently dysregulated in >50% of training cohorts, identified via Upset analysis. Feature selection was refined using two machine learning approaches: Least Absolute Shrinkage and Selection Operator (LASSO) regression and Support Vector Machine Recursive Feature Elimination (SVM-RFE). Protein–protein interaction (PPI) networks for SDGs were constructed using the STRING database, and functional modules were clustered via the Markov Cluster Algorithm (MCL)1. The diagnostic performance of gene clusters was evaluated through receiver operating characteristic (ROC) curves, with nested one-way ANOVA comparing sensitivity, specificity, positive/negative predictive values, and AUC metrics. Six machine learning models (Naïve Bayes, SVM, Elastic Net, LASSO, Logistic Regression, Ridge Regression) were iteratively tested to optimize diagnostic accuracy.
2.4 Validation set assessment
The validated diagnostic model was rigorously evaluated in three independent cohorts (n = 825) to ensure generalizability. ROC curves were generated to assess diagnostic performance metrics, including AUC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The statistical significance of gene expression differences between LTBI and ATB groups was tested using the Mann–Whitney U test with a threshold of p < 0.05. Expression patterns were further validated against clinical metadata to ensure biological relevance.
2.5 Machine learning frameworks
2.5.1 LASSO regression
The Least Absolute Shrinkage and Selection Operator (LASSO) regression was implemented using the glmnet R package. The algorithm applied L1 regularization to minimize the residual sum of squares, iteratively shrinking non-informative coefficients to zero. Ten-fold cross-validation was performed to optimize the penalty parameter (λ), and features retained at the minimum cross-validated error were selected for downstream analysis.
2.5.2 SVM-RFE
Support Vector Machine Recursive Feature Elimination (SVM-RFE) utilized the Caret and kernlab packages. A radial basis function kernel was employed, and recursive feature elimination was conducted through five-fold cross-validation. Features were ranked by their contribution to the classification margin, with the least important features iteratively removed until an optimal subset was identified.
2.6 Network analysis and functional clustering
Protein–protein interaction (PPI) networks were constructed using the STRING database (version 11.5) with a combined interaction score threshold >0.4. The Markov Cluster Algorithm (MCL) was applied to partition the network into functional modules. Inflation parameters were automatically optimized to balance cluster granularity. FRIENDS analysis, implemented via custom scripts, calculated node centrality metrics (degree, betweenness, closeness) to identify hub genes within the network.
2.7 Statistical evaluation metrics
Nested one-way ANOVA was performed using GraphPad Prism 9.5.0 to assess hierarchical variance components across diagnostic metrics. The analysis tested interactions between sensitivity/specificity and PPV/NPV, as well as between cutoff values and AUC. Assumptions of normality (Shapiro–Wilk test) and homoscedasticity (Levene’s test) were verified prior to analysis. ROC curves were generated using the ROCR and pROC packages, with optimal cutoff values determined by maximizing Youden’s index (J = sensitivity + specificity − 1).
2.8 External validation resources
2.8.1 CIBERSORT immune profiling
The CIBERSORT algorithm2 was executed with the LM22 leukocyte gene signature matrix. Bulk RNA-seq data were normalized using quantile normalization, and 1,000 permutations were performed to estimate immune cell proportions. Results were filtered for p < 0.05 to ensure confidence in deconvolution accuracy.
2.8.2 Single-cell validation
The Broad Institute’s Single Cell Portal3 was queried for tuberculosis-related single-cell RNA-seq datasets. Gene expression patterns were visualized across cell types using embedded tools, with specificity confirmed by comparing expression levels in myeloid cells (monocytes, macrophages) versus lymphoid populations.
2.8.3 GenDoma pathway analysis
GenDoma4 was accessed to map candidate biomarkers to disease pathways, regulatory networks (miRNA-gene, lncRNA-gene), and functional annotations. Enrichment analysis utilized Fisher’s exact test with Benjamini-Hochberg correction for multiple comparisons (q < 0.05).
2.9 Computational tools and workflow
Raw microarray data were preprocessed using GEOquery for dataset retrieval and limma for background correction and quantile normalization. Probe-to-gene annotation was performed with hgu133plus2.db for Affymetrix platforms. Network visualizations were generated using Cytoscape (v3.9.1) for PPI networks and ggplot2 for ROC curves. All code and reproducibility workflows are archived in Supplementary material 1.
3 Result
3.1 Dataset screening and stratification
Eleven GEO datasets were analyzed, with eight assigned to the training set (GSE37250, GSE39939, GSE39940, GSE101705, GSE112104, GSE19491, GSE28623, GSE40553) and three to the validation set (GSE94438, GSE79362, GSE84076). Seven datasets were excluded due to non-blood samples or technical limitations (Table 1). Differential expression analysis (adjusted p ≤ 0.05, |logFC| ≥ 1) revealed substantial variability in DEG counts across cohorts, ranging from 26 (GSE84076) to 3,389 (GSE112104). Volcano plots and tabulated results (Figure 1; Table 1) highlight this heterogeneity, with GSE101705 and GSE112104 exhibiting the highest DEG counts (1,126 and 3,389, respectively).
Figure 1
3.2 Identification of stable differential genes (SDGs)
Upset analysis of DEGs across eight training cohorts identified 55 SDGs recurrently intersected in >50% of datasets (Figure 2). These included immune-related genes (e.g., S100A12, S100A8, GBP5), inflammatory mediators (CXCR5, ELANE), and metabolic regulators (CYP1B1, MGST1). Hierarchical clustering of expression profiles (Figure 3) demonstrated consistent upregulation of S100A12 and S100A8 in ATB versus LTBI across training cohorts.
Figure 2
Figure 3
3.3 Machine learning-driven feature refinement and functional module discovery via PPI and MCL clustering
LASSO regression and SVM-RFE reduced the 55 SDGs to 47 high-confidence candidates (Figure 4A). In the results of PPI analysis, the interaction network maps between the proteins corresponding to the 47 SDG are shown in Figure 4B; based on the MCL clustering algorithm (the inflation parameter was set to 3), 31 of the 47 proteins were clustered into 9 classes (Figure 4B). Cluster 1 consisted of 6 genes (ANXA3, GPR84, MCEMP1. MMP9, S100A12, S100A8), Cluster 2 consisted of 6 genes (GBP1, GBP5, IFI27, IFIT3, PLSCR1, RSAD2), Cluster 3 consisted of 4 genes (AIM2, CXCR5, NAIPNLRC4), Cluster 4 consisted of 3 genes (BPI, DEFA4, ELANE), Cluster 5 consisted of 3 genes (C1QA, FCGBPSERPING1), Cluster 6 consisted of 3 genes (FCARFCGR1A, FCGR1B), Cluster 7 consisted of 2 genes (LCN2, VNN1), Cluster 8 consisted of 2 genes (COL17A1.PLOD2) and Cluster 9 consisted of 2 genes (CYP1B1, MGST1).
Figure 4
The Sens/Spec/PPV/NPV of each of the nine clusters were obtained, and cluster 1 was found to have the highest diagnostic efficacy after descending the order of the clusters (Figure 5A). Cluster 1 contains six genes, and three genes, GPR84, S100A12, and S100A8, had higher Sens/Spec/PPV/NPV than three genes, ANXA3, MCEMP1, and MMP9, and therefore three genes, GPR84, S100A12, and S100A8, were included in the subsequent analysis (Figure 5B). The Sens/Spec/PPV/NPV of the six models constructed by three-gene signatures with a single biomarker, respectively, are NB (Average = 0.8490) > SVM (Average = 0.8360) > ENR (Average = 0.8338) > LASSO (Average = 0.8266) > MLR (Average = 0.8255) > Ridge (Average = 0.8251) > None (Average = 0.7458), indicating that the constructed model can significantly improve the prediction efficacy (Figure 5C). To further optimize the gene signature from the perspective of diagnostic efficacy, four combinations of Sens/Spec/PPV/NP for three genes were compared, GPR84 + S100A12 + S100A8 (Average = 0.8541) > S100A12 + S100A8 (Average = 0.8525) > GPR84 + S100A12 (Average = 0.8456) > GPR84 + S100A8 (Average = 0.8438, Figure 5D). For AUC/Cutoff, S100A12 + S100A (Average = 0.7897) > GPR84 + S100A12 (Average = 0.7788) > GPR84 + S100A8 (Average = 0.7801) > GPR84 + S100A12 + S100A8 (Average = 0.7440, Figure 5E). Because the 2 gene signature of S100A12 + S100A8 has been consistently ranked in the top two in terms of diagnostic efficacy, S100A12 + S100A8 is considered the optimal combination. The Sens/Spec/PPV/NPV of the six models constructed based on 2 gene signatures with gene signature were, respectively, LASSO (Average = 0.7769) > NB (Average = 0.7732) > MLR (Average = 0.7699) > Ridge (Average = 0.7696) > ENR (Average = 0.7611) > SVM (Average = 0.7532) > None (Average = 0.7205, Figure 5F). NB is regarded as the best model construction method because it is firmly in the top two in both the 3-gene signature and 2-gene signature model construction.
Figure 5
3.4 Biomarker validation across cohorts
Mann–Whitney tests confirmed significant upregulation of S100A12 and S100A8 in ATB versus LTBI across six training cohorts (Figure 6). Validation cohorts showed variable performance (Figure 6): GSE94438 exhibited significant differential expression (p < 0.05), while GSE79362 and GSE84076 lacked consistency, potentially reflecting cohort-specific confounders (e.g., HIV co-infection).
Figure 6
3.5 Subgroup-specific diagnostic performance
ROC analysis revealed variability across demographic and clinical subgroups (Figure 7; Table 2). The model achieved near-perfect discrimination (AUC = 1.0000) in UK-born individuals (GSE19491) and children in GSE112104. On the contrary, the 2-gene signature performed poorly in GSE79362 (AUC = 0.4610). Geographic, ethnic, and HIV status influenced accuracy: South Africa (GSE19491 = 0.8258, GSE39940 = 0.9041, GSE40553 = 0.5875, GSE37250 = 0.8730), Malawi (GSE37250 = 0.8732, GSE39940 = 0.8747), London (GSE19491 = 0.8042), Asian (GSE19491_South Asian = 0.8571, GSE19491_asian other = 0.8333) and Black (GSE19491 = 0.8044) cohorts showed robust prediction performance, while HIV-negative individuals (GSE37250 = 0.907, GSE39939 = 0.8297, GSE39940 = 0.8635) outperformed HIV co-infected patients (GSE37250 = 0.8490).
Figure 7
Table 2
| No. | GSE name | Tag | Classification | AUC | Cutpoint | Sens | Spec | PPV | NPV |
|---|---|---|---|---|---|---|---|---|---|
| #1 | GSE79362 | Not used in subgroup analysis | Validation | 0.4610 | 0.2885 | 0.9090 | 0.1430 | 0.3230 | 0.7780 |
| #2 | GSE84076 | Not used in subgroup analysis | 0.8438 | 0.1881 | 1.0000 | 0.6880 | 0.5450 | 1.0000 | |
| #3 | GSE94438 | Not used in subgroup analysis | 0.5719 | 0.2324 | 0.3960 | 0.7830 | 0.3600 | 0.8080 | |
| 25% Percentile | 0.5165 | 0.2103 | 0.6525 | 0.4155 | 0.3415 | 0.7930 | |||
| Median | 0.5719 | 0.2324 | 0.9090 | 0.6880 | 0.3600 | 0.8080 | |||
| 75% Percentile | 0.7079 | 0.2213 | 0.8263 | 0.7355 | 0.4525 | 0.9040 | |||
| #4 | GSE19491 | BCG | Training | 0.8311 | 0.6761 | 0.7330 | 0.8550 | 0.8460 | 0.7470 |
| #5 | GSE28623 | Gender | 0.8530 | 0.8398 | 0.7390 | 0.9600 | 0.9710 | 0.6670 | |
| #6 | GSE37250 | Geographical location/HIV | 0.8764 | 0.5390 | 0.7950 | 0.8260 | 0.8420 | 0.7750 | |
| #7 | GSE39939 | Geographical location/HIV | 0.8689 | 0.9220 | 0.7220 | 0.9290 | 0.9830 | 0.3710 | |
| #8 | GSE39940 | Geographical location/HIV | 0.8614 | 0.8822 | 0.6310 | 0.9630 | 0.9720 | 0.5590 | |
| #9 | GSE40553 | Geographical location/HIV | 0.5721 | 0.8197 | 0.4100 | 0.7890 | 0.8950 | 0.2340 | |
| #10 | GSE101705 | Not used in subgroup analysis | 0.7076 | 0.6433 | 0.7140 | 0.6880 | 0.8000 | 0.5790 | |
| #11 | GSE112104 | Gender | 0.9254 | 0.2813 | 0.9670 | 0.8100 | 0.8790 | 0.9440 | |
| 25% Percentile | 0.8002 | 0.6172 | 0.6933 | 0.8048 | 0.8450 | 0.5120 | |||
| Median | 0.8572 | 0.7479 | 0.7275 | 0.8405 | 0.8870 | 0.6230 | |||
| 75% Percentile | 0.8708 | 0.8504 | 0.7530 | 0.9368 | 0.9713 | 0.7540 | |||
| #12 | GSE19491_BCG+ | BCG? | Subgroup | 0.8359 | 0.7268 | 0.6980 | 0.8860 | 0.8820 | 0.7050 |
| #13 | GSE19491_born in UK | Born in UK? | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
| #14 | GSE19491_not born in UK | Born in UK? | 0.8027 | 0.5296 | 0.7110 | 0.8510 | 0.7620 | 0.8140 | |
| #15 | GSE19491_London | Geographical location? | 0.8042 | 0.4051 | 0.8530 | 0.7110 | 0.7250 | 0.8440 | |
| #16 | GSE37250_Malawi | Geographical location? | 0.8732 | 0.5236 | 0.8630 | 0.7610 | 0.8380 | 0.7940 | |
| #17 | GSE39940_Malawi | Geographical location? | 0.8747 | 0.6992 | 0.6580 | 0.9400 | 0.8930 | 0.7830 | |
| #18 | GSE19491_South africa | Geographical location? | 0.8258 | 0.4040 | 0.8000 | 0.8060 | 0.7270 | 0.8620 | |
| #19 | GSE39940_South africa | Geographical location? | 0.9041 | 0.9818 | 0.7530 | 1.0000 | 1.0000 | 0.1820 | |
| #20 | GSE40553_South africa | Geographical location? | 0.5875 | 0.7765 | 0.3790 | 0.8420 | 0.8930 | 0.2810 | |
| #21 | GSE37250_South africa | Geographical location? | 0.8730 | 0.5401 | 0.7530 | 0.8650 | 0.8430 | 0.7830 | |
| #22 | GSE19491_South asian | Ethnicity? | 0.8571 | 0.8075 | 0.8000 | 0.8570 | 0.9230 | 0.6670 | |
| #23 | GSE19491_white | Ethnicity? | 0.8941 | 0.8784 | 0.7650 | 1.0000 | 1.0000 | 0.5560 | |
| #24 | GSE19491_asian other | Ethnicity? | 0.8333 | 0.3743 | 0.8330 | 0.9000 | 0.8330 | 0.9000 | |
| #25 | GSE19491_black | Ethnicity? | 0.8044 | 0.4797 | 0.7430 | 0.8000 | 0.7430 | 0.8000 | |
| #26 | GSE19491_female | Gender? | 0.7714 | 0.2392 | 0.9290 | 0.5500 | 0.5910 | 0.9170 | |
| #27 | GSE28623_female | Gender? | 0.9397 | 0.8784 | 0.8570 | 1.0000 | 1.0000 | 0.8330 | |
| #28 | GSE112104_female | Gender? | 0.9551 | 0.4425 | 0.9170 | 0.9230 | 0.9170 | 0.9230 | |
| #29 | GSE19491_male | Gender? | 0.8951 | 0.7826 | 0.8300 | 0.8620 | 0.9070 | 0.7580 | |
| #30 | GSE28623_male | Gender? | 0.7760 | 0.7995 | 0.7200 | 0.9000 | 0.9470 | 0.5620 | |
| #31 | GSE112104_male | Gender? | 0.8889 | 0.2940 | 1.0000 | 0.6250 | 0.8570 | 1.0000 | |
| #32 | GSE37250_HIV+ | HIV? | 0.8490 | 0.5855 | 0.7450 | 0.8100 | 0.8200 | 0.7310 | |
| #33 | GSE37250_HIV- | HIV? | 0.9070 | 0.4695 | 0.8870 | 0.8310 | 0.8600 | 0.8620 | |
| #34 | GSE39939_HIV- | HIV? | 0.8297 | 0.8805 | 0.6540 | 0.9290 | 0.9710 | 0.4190 | |
| #35 | GSE39940_HIV- | HIV? | 0.8635 | 0.8314 | 0.6430 | 0.9630 | 0.9570 | 0.6750 | |
| #36 | GSE112104_children | Children? | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
| #37 | GSE19491_adult | Children? | 0.8166 | 0.6698 | 0.7300 | 0.8410 | 0.8440 | 0.7260 | |
| #38 | GSE112104_adult | Children? | 0.8873 | 0.3329 | 0.9410 | 0.6670 | 0.8000 | 0.8890 | |
| 25% Percentile | 0.8212 | 0.4560 | 0.7250 | 0.8080 | 0.8265 | 0.6900 | |||
| Median | 0.8635 | 0.6698 | 0.8000 | 0.8620 | 0.8820 | 0.7940 | |||
| 75% Percentile | 0.8946 | 0.8195 | 0.8750 | 0.9345 | 0.9520 | 0.8755 | |||
Evaluation of the diagnostic efficacy of a simple Bayesian model with a two-gene signature.
3.6 Immune cell correlates of biomarkers and single-cell expression validation
CIBERSORT-based immune infiltration analysis was performed on all eight datasets, and S100A12 and S100A8 were screened against 64 immune cells with p < 0.05 in the Mantel test results, and a stable correlation between the three types of cells (CD4+ T cells, neutrophils, and NK cells) and 2 gene signature was observed after taking the intersection (Figure 8). The intersection of CD4+ T cells, neutrophils, and NK cells showed a stable correlation (Figure 8).
Figure 8
To verify in which cells the two genes S100A12 and S100A8 are highly expressed, we further validated the expression of the two genes using a single-cell dataset. First, 10,006 cells from 2 non-human primates at 6 weeks after infection with Mycobacterium tuberculosis (MTB)5 were used to observe the expression of S100A12 and S100A8 genes (Figures 9A–D). S100A12 was expressed at a high level in Mast cells, and S100A8 was expressed at a high level in Club cells (also known as bronchiolar exocrine cells), Fibroblast cells, Macrophage cells, and Neutrophil cells.
Figure 9
Next, 109,584 cells from 4 non-human primates at 10 weeks after infection with MTB6 were used to observe the expression of two genes, S100A12 and S100A8 (Figures 9E–H). S100A12 was expressed at high levels in Macrophage and Neutrophil cells, and S100A8 was expressed at high levels in Fibroblast cells, Macrophage cells, and Neutrophil cells.
Further, we used 18,915 cells from human lung tissue ACE2 + co-infected with MTB and HIV7 was performed to observe the expression of two genes, S100A12 and S100A8 (Figures 9I–L). S100A12 and S100A8 were expressed at high levels in Ciliated Cell cells and Pneumocyte cells.
3.7 Network enrichment and functional annotation
STRING-FRIENDS analysis expanded the S100A12/A8 (2 gene signature) interactome to include S100A9, CDH1, AGER (RAGE receptor), and signaling adaptors (GRB2, PTPN11) (7 gene signature, Figure 10A). Functional enrichment tied these 2 genes to Calprotectin complex (Strength = 3.69), S100A9 complex (Strength = 3.69), Neutrophil aggregation, and Aquaporin 9 (Strength = 3.59), and S100A8 complex (Strength = 3.59, Figure 10B). FRIENDS analysis further revealed robust associations between 7 genes and Neutrophil aggregation, and Aquaporin 9 (Strength = 3.4 in GO Process/3.22 in STRING clusters), Toll-like receptor 4 bindings (Strength = 3.15), MET activates PTPN11 (Strength = 3.05), Calprotectin complex (Strength = 3.32), S100A9 complex (Strength = 3.32), and S100A8 complex (Strength = 3.35, Figure 11).
Figure 10
Figure 11
3.8 Multi-omics contextualization via GenDoma
GenDoma revealed 353 interactions for S100A12/A8, including drug targets (e.g., tetracyclines), transcription factors (NF-κB), and disease pathways (Figures 12A,B). Literature mining highlighted their overexpression in blood dendritic cells (CD1C + B), monocytes (CD14 + CD16+), and lung basal cells (Table 3), with neutrophil depletion studies implicating S100A8/A9 in TB progression control.
Figure 12
Table 3
| Tissue | Cell | Biomarker | Gene | Protein ID | PMID |
|---|---|---|---|---|---|
| Blood | CD1C + _B dendritic cell | S100A12 | S100A12 | P80511 | 28428369 |
| Peripheral blood | CD14 + CD16 + monocyte | S100A12 | S100A12 | P80511 | 29361178 |
| Kidney | Neutrophil | S100A12 | S100A12 | P80511 | 30093597 |
| Fetal kidney | Monocyte | S100A12 | S100A12 | P80511 | 30093597 |
| Blood | CD1C + _B dendritic cell | S100A8 | S100A8 | P05109 | 28428369 |
| Umbilical cord blood | Lymphoid-primed multipotent progenitor cell | S100A8 | S100A8 | P05109 | 29167569 |
| Bone marrow | Monocyte derived dendritic cell | S100A8 | S100A8 | P05109 | 29313948 |
| Esophagus | Secretory progenitor cell | S100A8 | S100A8 | P05109 | 29802404 |
| Kidney | Neutrophil | S100A8 | S100A8 | P05109 | 30093597 |
| Fetal kidney | Monocyte | S100A8 | S100A8 | P05109 | 30093597 |
| Lung | Basal cell | S100A8 | S100A8 | P05109 | 30069046 |
| Disease | Description | Gene | Protein ID | PMID |
|---|---|---|---|---|
| Tuberculosis | Depletion of neutrophils or S100A8/A9 deficiency resulted in improved MTB control during chronic but not acute TB. | S100A8 | P05109 | 32134742 |
Literature enrichment analysis of genes.
4 Discussion
To our knowledge, this study represents the first attempt to distinguish LTBI from ATB using a novel approach based on S100A12 and S100A8. In our study, we undertook an extensive analysis of blood transcriptomic data from 2,758 patients across 11 cohorts to identify stable differential genes that could serve as potential biomarkers for distinguishing LTBI from ATB. We focused on the S100A12 and S100A8 gene pair, which exhibited notable upregulation in ATB patients compared to those with LTBI. Our findings demonstrate the robustness of these gene signatures in diagnostic applications, as machine learning models incorporating these biomarkers achieved a significant AUC of 0.8572, indicating high predictive accuracy. Furthermore, our analysis revealed correlations between these biomarkers and immune cell populations, shedding light on their potential roles in the immune response during TB infection. These insights not only enhance our understanding of TB pathogenesis but also pave the way for future therapeutic developments aimed at improving patient outcomes (Dannenberg et al., 2000; Mitterhauser and Wadsak, 2014; Russell, 2007).
The differential expression analysis conducted across various cohorts has underscored the potential of S100A12 and S100A8 as biomarkers for distinguishing between ATB and LTBI. The identification of 55 SDGs reveals significant variability in gene expression profiles across diverse datasets, with S100A12 and S100A8 consistently exhibiting upregulation in ATB cases relative to LTBI. This notable observation indicates that these genes may serve as reliable biomarkers, enhancing diagnostic accuracy and informing treatment strategies. The variability of gene expression counts across cohorts ranging from 26 to 3,389 highlights the challenges in establishing a universal biomarker profile. However, the consistent upregulation of S100A12 and S100A8 across training cohorts suggests their potential role in the pathophysiology of TB, warranting further exploration into their mechanisms of action and clinical applicability (Li et al., 2023).
The S100 protein family, particularly S100A12 and S100A8, has garnered attention due to their roles in inflammation and immune response (Gonzalez et al., 2020). These proteins are secreted by activated immune cells and are involved in various inflammatory pathways (Donato et al., 2013). S100A8/A9 heterodimers regulate neutrophil adhesion via CD11b upregulation during MTB infection (Scott et al., 2020), while S100A12 amplifies inflammation through AGER receptor signaling (Cole et al., 2001). Studies have demonstrated that S100A12 and S100A8 are potential biomarker for disease severity and prognosis in some diseases, such as Idiopathic Pulmonary Fibrosis (Li et al., 2022), Rheumatoid Arthritis (Roszkowski et al., 2022), Blau syndrome (Wang et al., 2018), Chronic Spontaneous Urticaria (Zhou et al., 2019), active lupus nephritis (Davies et al., 2020), and dilated cardiomyopathy (Yu et al., 2024). While S100A12/S100A8 are widely studied in these diseases, their specificity to TB remains an open question. In this study, we found that the correlation between their expression levels and immune cell populations, particularly CD4+ T cells, neutrophils, and natural killer (NK) cells, provides insights into the immune landscape in ATB versus LTBI. Understanding the dynamics between these biomarkers and immune cell infiltration could reveal critical pathways for therapeutic intervention (Li et al., 2023; Zhuang et al., 2024a). The immune profile of ATB patients, characterized by increased neutrophil activity and altered CD4+ T cell responses, suggests that S100A12 and S100A8 may have immune modulatory roles, influencing the inflammatory response and disease progression. Future research directions should focus on elucidating the mechanistic pathways through which these S100 proteins interact with immune cells, potentially leading to novel therapeutic strategies targeting immune responses in TB (Gonzalez et al., 2020; Donato et al., 2013).
Functional interaction and pathway analysis further illuminate the biological significance of S100A12 and S100A8 in TB. The STRING-FRIENDS analysis indicates their involvement in pathways such as neutrophil aggregation and the calprotectin complex (Yang et al., 2024; Heilmann et al., 2019), which are essential for the host’s response to MTB infection. These findings suggest that S100A12 and S100A8 not only act as biomarkers but may also serve as targets for therapeutic intervention (Huoshen et al., 2025). The identification of additional interactions within these pathways opens avenues for drug development aimed at modulating the inflammatory response and enhancing host defense mechanisms. Considering the role of neutrophil aggregation in tuberculosis pathogenesis, targeting these pathways could potentially improve clinical outcomes for patients suffering from active disease (Heida et al., 2017).
Machine learning models utilizing the S100A12 and S100A8 gene signatures demonstrated significant predictive accuracy, with a median AUC 0.8572 in training datasets and 0.8635 in subgroup analysis, indicating their potential utility in clinical diagnostics for early detection of LTBI. The performance of various machine learning approaches highlights the importance of feature selection and model optimization in enhancing diagnostic efficacy (Li et al., 2023; Du et al., 2024). Notably, the Naïve Bayes model exhibited superior performance, suggesting its applicability in diverse clinical settings, which met WHO target product profile requirements [Global Programme on Tuberculosis and Lung Health (GTB), 2014] by (1) utilizing peripheral blood samples, (2) maintaining high sensitivity in HIV co-infected patients (AUC = 0.8490), and (3) achieving excellent discrimination in high-burden low-and middle-income country (LMIC) settings (South Africa (GSE19491 = 0.8258, GSE39940 = 0.9041, GSE40553 = 0.5875, GSE37250 = 0.8730), Malawi (GSE37250 = 0.8732, GSE39940 = 0.8747), and Asian (GSE19491_South Asian = 0.8571, GSE19491_Asian other = 0.8333)). Furthermore, subgroup analyses revealed demographic influences, with reduced prediction efficacy in males (AUC = 0.7760 ~ 0.8951 vs. Female AUC = 0.7714 ~ 0.9551) and improved performance in children individuals (GSE112104_children AUC = 1.0000 vs. Adult AUC = 0.8166 ~ 0.8873), highlighting the need for population-specific validation. The implications of these findings underscore the need for ongoing research to refine machine learning applications in TB diagnostics, paving the way for more accurate and timely identification of patients at risk for progression from LTBI to ATB (Zhao et al., 2015).
However, the validation of these biomarkers across different cohorts revealed variability in expression levels, emphasizing the complexity of biomarker validation in diverse populations (Li et al., 2023). While significant upregulation of S100A12 and S100A8 was observed in specific cohorts, inconsistent results in others may reflect demographic and clinical factors that influence biomarker expression. This variability underscores the necessity for standardized cohort definitions and careful consideration of the characteristics influencing biomarker validation. Future studies should aim to address these challenges, enhancing the robustness of biomarker discovery and validation efforts in tuberculosis research (Mester et al., 2024).
The limitations of this study primarily stem from the lack of wet lab validation, which hinders the confirmation of the identified biomarkers’ functionality. Additionally, the variability in sample size across datasets may affect the robustness of the findings and their generalizability to broader populations. The inconsistent definitions of LTBI and ATB across cohorts further complicate the analysis, leading to potential biases in classification and interpretation of results (Zhao et al., 2015; Mester et al., 2024; Zhou et al., 2023). Moreover, comorbid conditions (such as diabetes mellitus) on LTBI and the exclusion of specific cohorts may overlook critical demographic and clinical factors that could influence biomarker expression, limiting the applicability of our conclusions (Zhou et al., 2023; Kumar and Babu, 2023). Addressing these limitations through standardized definitions, enhanced sample diversity, and future mechanistic studies will be essential for validating the clinical utility of S100A12 and S100A8 in TB diagnostics.
5 Conclusion
In conclusion, this study successfully highlights the potential of S100A12 and S100A8 as promising biomarkers for differentiating between ATB and LTBI. The findings not only enhance diagnostic accuracy but also provide insights into the underlying immune mechanisms involved in TB infection. Furthermore, the integration of machine learning models demonstrates the feasibility of employing these biomarkers in clinical settings, paving the way for improved therapeutic strategies. Future research should focus on refining biomarker validation through comprehensive cohort analyses and mechanistic studies, ultimately contributing to better patient outcomes in tuberculosis management.
Statements
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.
Ethics statement
Ethical approval was not required for the studies involving humans because ethical approval waivers have been obtained where all data are derived from public databases. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements because ethical approval waivers have been obtained where all data are derived from public databases.
Author contributions
FJ: Formal analysis, Methodology, Software, Writing – original draft. YaL: Formal analysis, Software, Writing – original draft. LL: Formal analysis, Methodology, Writing – original draft. RN: Methodology, Writing – original draft. YA: Methodology, Writing – original draft. YuL: Methodology, Writing – original draft. LZ: Conceptualization, Writing – review & editing. WG: Conceptualization, Funding acquisition, Supervision, Writing – review & editing, Visualization.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This study was funded by the National Key Research and Development Program of China (Grant No. 2024YFC2311201).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Gen AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2025.1584360/full#supplementary-material
- LTBI
Latent tuberculosis infection
- TBI
Tuberculosis infection
- ATB
Active tuberculosis
- LASSO
Least Absolute Shrinkage Selection Operator
- SVM-RFE
Support Vector Machines Recursive Feature Elimination
- MCL
Markov Cluster Algorithm
- PPI
Protein–Protein Interaction
- NB
Naive Bayes
- AUC
Area Under Curve
- IQR
Inter-Quartile Range
- HIV
Human Immunodeficiency Virus
- TB
Tuberculosis
- AIDS
Acquired immunodeficiency syndrome
- MTB
Mycobacterium tuberculosis
- PTB
Pulmonary tuberculosis
- IGRAs
Interferon-gamma release assays
- TST
Tuberculin Skin Testing
- HC
Health control
- WHO
World Health Organization
- NMR
Nuclear Magnetic Resonance
- NIH GEO
National Institutes of Health Gene Expression Omnibus
- DEG
Differential expression gene
- SDG
Stable differential gene
- ROC
Receiver Operating Characteristic
- ANOVA
Analysis of variance
- SVM
Support vector machines
- ENR
Elastic Net Regression
- MLR
Multiple Logistic Regression
- RR
Ridge Regression
- RFE
Recursive Feature Elimination
- FDR
False Discovery Rate
- DAMP
Danger-associated molecular pattern
- TLR4
Toll-like receptor 4
- AGER
Late glycosylation end product receptor
- ROS
Reactive oxygen species
- LMIC
Low-and middle-income country
Glossary
Footnotes
1.^https://cran.r-project.org/web/packages/MCL/MCL.pdf
2.^https://cibersort.stanford.edu/
3.^https://singlecell.broadinstitute.org/
4.^https://ai.citexs.com/homePath
5.^https://singlecell.broadinstitute.org/single_cell/study/SCP1749/cellular-ecology-of-m-tuberculosis-granulomas-4-week-dataset#study-summary
6.^https://singlecell.broadinstitute.org/single_cell/study/SCP257/cellular-ecology-of-m-tuberculosis-granulomas-10-week-dataset#study-summary
7.^https://singlecell.broadinstitute.org/single_cell/study/SCP814/human-lung-hiv-tb-co-infection-ace2-cells?genes=LHFPL2&cluster=umap&spatialGroups=--&annotation=cell_type__ontology_label--group--study&subsample=all&tab=distribution#study-summary
References
1
AnY.NiR.ZhuangL.YangL.YeZ.LiL.et al. (2025). Tuberculosis vaccines and therapeutic drug: challenges and future directions. Mol. Biomed.6:4. doi: 10.1186/s43556-024-00243-6
2
ChenZ.WangT.DuJ.SunL.WangG.NiR.et al. (2024). Decoding the WHO global tuberculosis report 2024: a critical analysis of global and Chinese key data. Zoonoses5:5. doi: 10.15212/zoonoses-2024-0061
3
ChengP.JiangF.WangG.WangJ.XueY.WangL.et al. (2023). Bioinformatics analysis and consistency verification of a novel tuberculosis vaccine candidate HP13138PB. Front. Immunol.14:1102578. doi: 10.3389/fimmu.2023.1102578
4
ColeA. M.KimY. H.TahkS.HongT.WeisP.WaringA. J.et al. (2001). Calcitermin, a novel antimicrobial peptide isolated from human airway secretions. FEBS Lett.504, 5–10. doi: 10.1016/s0014-5793(01)02731-4
5
DannenbergA. M.BishaiW. R.ParrishN.RuizR.JohnsonW.ZookB. C.et al. (2000). Efficacies of BCG and vole bacillus (Mycobacterium microti) vaccines in preventing clinically apparent pulmonary tuberculosis in rabbits: a preliminary report. Vaccine19, 796–800. doi: 10.1016/s0264-410x(00)00300-5
6
DaviesJ. C.MidgleyA.CarlssonE.DonohueS.BruceI. N.BeresfordM. W.et al. (2020). Urine and serum S100A8/A9 and S100A12 associate with active lupus nephritis and may predict response to rituximab treatment. RMD Open6:e001257. doi: 10.1136/rmdopen-2020-001257
7
DengJ.LiuL.YangQ.WeiC.ZhangH.XinH.et al. (2021). Urinary metabolomic analysis to identify potential markers for the diagnosis of tuberculosis and latent tuberculosis. Arch. Biochem. Biophys.704:108876. doi: 10.1016/j.abb.2021.108876
8
DonatoR.CannonB. R.SorciG.RiuzziF.HsuK.WeberD. J.et al. (2013). Functions of S100 proteins. Curr. Mol. Med.13, 24–57. doi: 10.2174/156652413804486214
9
DuJ.SuY.QiaoJ.GaoS.DongE.WangR.et al. (2024). Application of artificial intelligence in diagnosis of pulmonary tuberculosis. Chin. Med. J.137, 559–561. doi: 10.1097/cm9.0000000000003018
10
EsterhuyseM. M.WeinerJ.3rdCaronE.LoxtonA. G.IannacconeM.WagmanC.et al. (2015). Epigenetics and proteomics join transcriptomics in the quest for tuberculosis biomarkers. mBio6, e01187–e01115. doi: 10.1128/mBio.01187-15
11
FortúnJ.NavasE. (2022). Latent tuberculosis infection: approach and therapeutic schemes. Rev. Esp. Quimioter.35, 94–96. doi: 10.37201/req/s03.20.2022
12
Global Programme on Tuberculosis and Lung Health (GTB) (2014). High priority target product profiles for new tuberculosis diagnostics: Report of a consensus meeting. Geneva: World Health Organization.
13
GongW.WuX. (2021). Differential diagnosis of latent tuberculosis infection and active tuberculosis: a key to a successful tuberculosis control strategy. Front. Microbiol.12:745592. doi: 10.3389/fmicb.2021.745592
14
GonzalezL. L.GarrieK.TurnerM. D. (2020). Role of S100 proteins in health and disease. Mol. Cell Res.1867:118677. doi: 10.1016/j.bbamcr.2020.118677
15
HaileMariamM.YuY.SinghH.TekluT.WondaleB.WorkuA.et al. (2021). Protein and microbial biomarkers in sputum discern acute and latent tuberculosis in investigation of pastoral Ethiopian cohort. Front. Cell. Infect. Microbiol.11:595554. doi: 10.3389/fcimb.2021.595554
16
HeidaA.KoboldA. C. M.WagenmakersL.van de BeltK.van RheenenP. F. (2017). Reference values of fecal calgranulin C (S100A12) in school aged children and adolescents. Clin. Chem. Lab. Med.56, 126–131. doi: 10.1515/cclm-2017-0152
17
HeilmannR. M.XenoulisP. G.MüllerK.StavroulakiE. M.SuchodolskiJ. S.SteinerJ. M. (2019). Association of serum calprotectin (S100A8/A9) concentrations and idiopathic hyperlipidemia in miniature schnauzers. J. Vet. Intern. Med.33, 578–587. doi: 10.1111/jvim.15460
18
HuoshenW.ZhuH.XiongJ.ChenX.MouY.HouS.et al. (2025). Identification of potential biomarkers and therapeutic targets for periodontitis. Int. Dent. J.75, 1370–1383. doi: 10.1016/j.identj.2024.10.006
19
Izquierdo-GarciaJ. L.Comella-Del-BarrioP.Campos-OlivasR.Villar-HernándezR.Prat-AymerichC.De Souza-GalvãoM. L.et al. (2020). Discovery and validation of an NMR-based metabolomic profile in urine as TB biomarker. Sci. Rep.10:22317. doi: 10.1038/s41598-020-78999-4
20
JiangF.HanY.LiuY.XueY.ChengP.XiaoL.et al. (2023a). A comprehensive approach to developing a multi-epitope vaccine against Mycobacterium tuberculosis: from in silico design to in vitro immunization evaluation. Front. Immunol.14:1280299. doi: 10.3389/fimmu.2023.1280299
21
JiangF.LiuY.XueY.ChengP.WangJ.LianJ.et al. (2023b). Developing a multiepitope vaccine for the prevention of SARS-CoV-2 and monkeypox virus co-infection: a reverse vaccinology analysis. Int. Immunopharmacol.115:109728. doi: 10.1016/j.intimp.2023.109728
22
JiangF.PengC.ChengP.WangJ.LianJ.GongW. (2023c). PP19128R, a multiepitope vaccine designed to prevent latent tuberculosis infection, induced immune responses in silico and in vitro assays. Vaccines11:11. doi: 10.3390/vaccines11040856
23
JiangF.SunT.ChengP.WangJ.GongW. (2023d). A summary on tuberculosis vaccine development-where to go?J. Pers. Med.13:408. doi: 10.3390/jpm13030408
24
JiangF.WangL.WangJ.ChengP.ShenJ.GongW. (2023e). Design and development of a multi-epitope vaccine for the prevention of latent tuberculosis infection. Med. Adv.1, 361–382. doi: 10.1002/med4.40
25
KaewseekhaoB.NuntawongN.EiamchaiP.RoytrakulS.ReechaipichitkulW.FaksriK. (2020). Diagnosis of active tuberculosis and latent tuberculosis infection based on Raman spectroscopy and surface-enhanced Raman spectroscopy. Tuberculosis121:101916. doi: 10.1016/j.tube.2020.101916
26
KumarN. P.BabuS. (2023). Impact of diabetes mellitus on immunity to latent tuberculosis infection. Front. Clin. Diabetes Healthcare4:1095467. doi: 10.3389/fcdhc.2023.1095467
27
LiY.HeY.ChenS.WangQ.YangY.ShenD.et al. (2022). S100A12 as biomarker of disease severity and prognosis in patients with idiopathic pulmonary fibrosis. Front. Immunol.13:810338. doi: 10.3389/fimmu.2022.810338
28
LiL. S.YangL.ZhuangL.YeZ. Y.ZhaoW. G.GongW. P. (2023). From immunology to artificial intelligence: revolutionizing latent tuberculosis infection diagnosis with machine learning. Mil. Med. Res.10:58. doi: 10.1186/s40779-023-00490-8
29
LiL.ZhuangL.YangL.YeZ.NiR.AnY.et al. (2024). Machine learning model based on SERPING1, C1QB, and C1QC: a novel diagnostic approach for latent tuberculosis infection. iLABMED2, 248–265. doi: 10.1002/ila2.65
30
LuY.WangX.DongH.WangX.YangP.HanL.et al. (2019). Bioinformatics analysis of microRNA expression between patients with and without latent tuberculosis infections. Exp. Ther. Med.17, 3977–3988. doi: 10.3892/etm.2019.7424
31
MesterP.KellerD.KunstC.RäthU.RuschS.SchmidS.et al. (2024). High serum S100A12 as a diagnostic and prognostic biomarker for severity, multidrug-resistant Bacteria superinfection and herpes simplex virus reactivation in COVID-19. Viruses16:1084. doi: 10.3390/v16071084
32
MitterhauserM.WadsakW. (2014). Imaging biomarkers or biomarker imaging?Pharmaceuticals7, 765–778. doi: 10.3390/ph7070765
33
NatarajanS.RanganathanM.HannaL. E.TripathyS. (2022). Transcriptional profiling and deriving a seven-gene signature that discriminates active and latent tuberculosis: An integrative bioinformatics approach. Genes13:616. doi: 10.3390/genes13040616
34
PengC.JiangF.LiuY.XueY.ChengP.WangJ.et al. (2024). Development and evaluation of a promising biomarker for diagnosis of latent and active tuberculosis infection. Infect. Dis. Immunity4, 10–24. doi: 10.1097/ID9.0000000000000104
35
RobisonH. M.EscalanteP.ValeraE.ErskineC. L.AuvilL.SasietaH. C.et al. (2019). Precision immunoprofiling to reveal diagnostic signatures for latent tuberculosis infection and reactivation risk stratification. Integr. Biol.11, 16–25. doi: 10.1093/intbio/zyz001
36
RoszkowskiL.JaszczykB.PlebańczykM.CiechomskaM. (2022). S100A8 and S100A12 proteins as biomarkers of high disease activity in patients with rheumatoid arthritis that can be regulated by epigenetic drugs. Int. J. Mol. Sci.24:710. doi: 10.3390/ijms24010710
37
RussellD. G. (2007). Who puts the tubercle in tuberculosis?Nat. Rev. Microbiol.5, 39–47. doi: 10.1038/nrmicro1538
38
ScottN. R.SwansonR. V.Al-HammadiN.Domingo-GonzalezR.Rangel-MorenoJ.KrielB. A.et al. (2020). S100A8/A9 regulates CD11b expression and neutrophil recruitment during chronic tuberculosis. J. Clin. Invest.130, 3098–3112. doi: 10.1172/jci130546
39
ShaoM.WuF.ZhangJ.DongJ.ZhangH.LiuX.et al. (2021). Screening of potential biomarkers for distinguishing between latent and active tuberculosis in children using bioinformatics analysis. Medicine100:e23207. doi: 10.1097/md.0000000000023207
40
WangJ.JiangF.ChengP.YeZ.LiL.YangL.et al. (2024). Construction of novel multi-epitope-based diagnostic biomarker HP16118P and its application in the differential diagnosis of Mycobacterium tuberculosis latent infection. Mol. Biomed.5:15. doi: 10.1186/s43556-024-00177-z
41
WangL.RoséC. D.FoleyK. P.AntonJ.Bader-MeunierB.BrissaudP.et al. (2018). S100A12 and S100A8/9 proteins are biomarkers of articular disease activity in Blau syndrome. Rheumatology57, 1299–1304. doi: 10.1093/rheumatology/key090
42
YangD.ChenY.YuY.ChenX. (2024). Identification of genes and key pathways associated with the pathophysiology of lung Cancer and atrial fibrillation. Altern. Ther. Health Med.30, 68–75
43
YuY.ShiH.WangY.YuY.ChenR. (2024). A pilot study of S100A4, S100A8/A9, and S100A12 in dilated cardiomyopathy: novel biomarkers for diagnosis or prognosis?ESC Heart Failure11, 503–512. doi: 10.1002/ehf2.14605
44
ZhaoX.PanS.LiuC. (2015). Effect of S100 calcium binding protein A12 on the pathogenesis of preeclampsia. Zhonghua Fu Chan Ke Za Zhi50, 183–187. doi: 10.3760/cma.j.issn.0529-567x.2015.03.004
45
ZhouG.GuoX.CaiS.ZhangY.ZhouY.LongR.et al. (2023). Diabetes mellitus and latent tuberculosis infection: an updated meta-analysis and systematic review. BMC Infect. Dis.23:770. doi: 10.1186/s12879-023-08775-y
46
ZhouQ. Y.LinW.ZhuX. X.XuS. L.YingM. X.ShiL.et al. (2019). Increased plasma levels of S100A8, S100A9, and S100A12 in chronic spontaneous Urticaria. Indian J. Dermatol.64, 441–446. doi: 10.4103/ijd.IJD_375_18
47
ZhuangL.YangL.LiL.YeZ.GongW. (2024a). Mycobacterium tuberculosis: immune response, biomarkers, and therapeutic intervention. MedComm5:e419. doi: 10.1002/mco2.419
48
ZhuangL.ZhaoY.YangL.LiL.YeZ.AliA.et al. (2024b). Harnessing bioinformatics for the development of a promising multi-epitope vaccine against tuberculosis: the ZL9810L vaccine. Decoding Infect. Transmis.2:100026. doi: 10.1016/j.dcit.2024.100026
Summary
Keywords
active tuberculosis, latent tuberculosis infection, diagnostic model, biomarkers, multicohort analysis
Citation
Jiang F, Liu Y, Li L, Ni R, An Y, Li Y, Zhang L and Gong W (2025) Genome-wide expression in human whole blood for diagnosis of latent tuberculosis infection: a multicohort research. Front. Microbiol. 16:1584360. doi: 10.3389/fmicb.2025.1584360
Received
27 February 2025
Accepted
18 April 2025
Published
09 May 2025
Volume
16 - 2025
Edited by
Wei Wang, Jiangsu Institute of Parasitic Diseases (JIPD), China
Reviewed by
Carmen Judith Serrano, Mexican Social Security Institute, Mexico
Le Liu, Southern Medical University, China
Updates
Copyright
© 2025 Jiang, Liu, Li, Ni, An, Li, Zhang and Gong.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Wenping Gong, gwp891015@whu.edu.cnLingxia Zhang, 1707025046@stu.sqxy.edu.cn
†These authors have contributed equally to this work
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.