Immunometabolic Signatures Predict Risk of Progression to Active Tuberculosis and Disease Outcome

There remains a pressing need for biomarkers that can predict who will progress to active tuberculosis (TB) after exposure to Mycobacterium tuberculosis (MTB) bacterium. By analyzing cohorts of household contacts of TB index cases (HHCs) and a stringent non-human primate (NHP) challenge model, we evaluated whether integration of blood transcriptional profiling with serum metabolomic profiling can provide new understanding of disease processes and enable improved prediction of TB progression. Compared to either alone, the combined application of pre-existing transcriptome- and metabolome-based signatures more accurately predicted TB progression in the HHC cohorts and more accurately predicted disease severity in the NHPs. Pathway and data-driven correlation analyses of the integrated transcriptional and metabolomic datasets further identified novel immunometabolomic signatures significantly associated with TB progression in HHCs and NHPs, implicating cortisol, tryptophan, glutathione, and tRNA acylation networks. These results demonstrate the power of multi-omics analysis to provide new insights into complex disease processes.


INTRODUCTION
Tuberculosis (TB) is an infectious disease caused by the bacterial pathogen Mycobacterium tuberculosis (M. tb), which is spread via aerosolized droplets that originate from the expectorations of diseased individuals. In 2017, it was estimated that 10 million people fell ill with TB and 1.6 million died from the disease. Overall, about 10% of individuals latently infected with M.tb will progress to active disease at some point in their lives (1). However, the risk of progression is higher for certain groups, including HIV+ individuals, children (2)(3)(4), and those with metabolic and nutritional conditions such as diabetes (5,6) and Vitamin A deficiency (7). Progression is also more frequent immediately post-contact with a TB patient: sharing a household with an active TB case is associated with elevated risk of exposure to M.tb and subsequent development of active disease (8)(9)(10)(11)(12). Major obstacles to fighting TB are the lack of effective TB diagnostics and the extremely large number of latently infected individuals, estimated at 23% of the world's population (13). Due to the impracticality of effectively treating all latently infected individuals and the accompanying possible side effects of such treatments, an effective method for identifying individuals at high risk of progression to active TB disease is highly desirable. Since M. tb is spread by individuals with active TB, early identification and treatment of high-risk individuals could break the chain of transmission and facilitate control of the TB epidemic.
A deeper understanding of immune dysregulation that leads to active TB disease also has the potential to point the way toward novel interventions to prevent progression. Blood transcriptional profiles offer the advantage of being easily monitored and strongly indicative of immune perturbations driven by TB disease. Previous studies have identified transcriptional signatures in peripheral blood that discriminate active TB from latent TB (14)(15)(16)(17)(18)(19)(20)(21)(22)(23). In addition to transcriptional approaches, the development of sensitive metabolic profiling technologies has allowed investigation of relationships between specific metabolites and immune functions (24). Metabolomic profiling can detect non-transcriptional changes in cellular activity as well as metabolites released into the plasma from local tissue sites. Metabolomics has been used to develop specific signatures of TB disease which implicated inflammatory and hypoxic metabolic pathways (25,26). Recently, integration of transcriptomic and metabolomic measurements in healthy individuals was shown to reveal systematic relationships between serum metabolite and blood transcript levels in various signaling, transport, and metabolic processes (27). By taking a similar approach within the context of TB, immunometabolic processes that are altered during TB progression may be revealed.
This study uses RNAseq and metabolomic profiling of household contacts of TB index cases (HHC) samples that were collected as part of the Bill and Melinda Gates Grand Challenges 6-74 program (GC6-74). These cohorts have previously been used to successfully validate a transcriptional signature of TB risk (28). Furthermore, RNAseq (29), metabolomic profiling (30), and c-miRNA (31) analyses of these cohorts identified and validated novel transcriptional and metabolomic signatures of risk for TB. We hypothesized that the ability to predict and understand the processes underlying TB progression after exposure to TB will be improved by integrating the transcriptional and metabolomic profiles for the GC6-74 HHC cohorts. We demonstrate this improvement through the realization of increased prediction accuracy when applying existing transcriptional and metabolic signatures of TB risk and disease and through the de-novo identification of TB progression-associated functional immunometabolomic pathway elements. Furthermore, using whole blood RNA and plasma metabolomic profiles measured 28 days after challenge, we independently validate the multiomic signatures by applying them to predict the spectrum of TB-associated disease (measured at necropsy) observed in rhesus macaques (RMs) that had been vaccinated with the TB vaccine candidate RhCMV/MTB, named according to its design as rhesus cytomegalovirus vector (RhCMV) encoding M.tb antigen repeats. Partial protection has been observed after vaccination with RhCMV/MTB, allowing evaluation of correlates of TB risk (32).

Multi-Omics Analysis Strategy to Identify and Validate Immunometabolic Signatures for Risk of Progression to Active Tuberculosis
By employing a multi-step analytical strategy ( Figure S1), we tested whether integration of blood transcriptional profiling with serum metabolomic profiling can provide new understanding of disease processes and enable improved prediction of TB progression. As detailed below, this analysis involved combined testing of a pre-existing transcriptional TB risk signature (28) and a metabolic TB disease signature (25) on the GC6-74 HHC cohort (28) and a stringent non-human primate (NHP) TB challenge model (32), followed by direct integrated analysis of GC6-74 RNAseq (29) and metabolomics (30).

Household Contact Study Design, Participant Recruitment, and Sample Processing
The GC6-74 cohort study design has been previously described (28). GC6-74 comprised HIV-negative household contacts of active TB cases that were recruited from four African study sites, in South Africa (Stellenbosch University/SUN), The Gambia (Medical Research Council Unit The Gambia/MRC), Uganda (Makerere University/MAK), and Ethiopia (Armauer Hansen Research Institute/AHRI). Whole blood samples were taken from study participants at enrollment of HHCs (BL: which was within 2 months of the exposure event). Where feasible, further whole blood samples were taken 6 months post-enrollment (M6) and 18 months post enrollment (M18), provided the individual remained free of active TB.
Study participants that developed microbiologically confirmed active TB between up to 2 years after HHC were termed TB "progressors" and participants who remained TB free after HHC were termed "controls." Progressors that developed TB within 3 months of HHC were excluded from further analysis as possible co-incident cases. Controls were matched to progressors by age, sex, study site, and year of enrollment ( Table 1). Mass-spectrometry based metabolomic profiling of collected blood-derived plasma (The Gambia, Ethiopia) and serum (South Africa, Uganda) along with RNA sequencing (RNAseq) of whole blood total RNA was performed for available progressor and control samples. Progressor and control samples for which both RNAseq and metabolomic profiling were  , Table S1). The signature was trained on data from a published dataset of metabolite profiles measured in the serum of South African adults and adolescents with active TB, latent infection, and healthy controls (25), with metabolites not also detected in all three GC6-74 sites being removed before model fitting.
We computed the ACS-CoR and MetabAD signature scores for the GC6-74 samples for which both RNAseq and metabolomics measurements were available ( Table 1). As previously-reported (28), ACS CoR significantly discriminated TB progressors from controls amongst GC6-74 HHCs [ROC AUC = 0.71 (95% CI:0.64-0.78)]. Although derived from active disease datasets, the diagnostic MetabAD also significantly discriminated GC6-74 HHC progressors from controls [ROC AUC = 0.68 (95% CI 0.61-0.74); Figure 1A]. Binary classification of GC6-74 samples by both signatures using optimal discrimination thresholds indicated that, of 366 samples, 177 were correctly classified by both signatures, 80/336 samples were correctly classified by ACS CoR but incorrectly by MetabAD, and 50 were correctly classified by MetabAD but incorrectly classified by ACS CoR (Table S2). Despite these differences, the scores for the ACS-CoR and MetabAD signatures were significantly correlated, albeit weakly (Spearman's ρ = 0.30). As shown in Figure 1B, some TB progressor samples had high MetabAD scores and low ACS CoR scores, and vice-versa. Both signatures produce prediction scores in the range [0, 1], therefore scores from metabolomics and transcriptomic signatures lie on the same theoretical scale. To formally assess the prediction improvements attainable by applying the two signatures together, we computed a combined transcriptomics + metabolomics signature score for each sample by adding the two individual scores together. This sum represents a parameter-free combined signature score that did not require the fitting of an additional model to the data. This combined additive signature showed an AUC of 0.75 (0.69-0.81) (Figure 1A), a significant improvement over either signature alone (ACS-CoR χ 2 p = 0.0006; or MetabAD χ 2 p = 7× 1 0 −8 ) for all samples. The greatest prediction performance improvement was observed on samples within 6 months of TB diagnosis, with the combined model achieving ROC AUC of 0.8 (0.7-0.9) compared to ROC AUC of 0.7 (0.58-0.82) and 0.73 (0.58-0.89) for the individual MetabAD and ACS CoR models, respectively ( Figure S2A). The combined signature also showed improved predictive performance vs. both ACS CoR and MetabAD independently for all three study sites (South Africa, The Gambia, Ethiopia, Figures S2B-D).
Further validation of this result was performed using transcriptional and metabolomic profiles measured 28 days post M.tb challenge in RMs from the RhCMV/TB vaccine study (32). This was performed by evaluating the association between the ACS CoR and MetabAD signature scores and the RM outcome harmonized disease score. As the harmonized disease score is a strictly positive number derived from countbased measures (i.e., necropsy score and number of positive necropsy cultures), Poisson modeling was used to evaluate the association of signature scores with harmonized disease scores. Figure 1C illustrates the prospective association of ACS-CoR, MetabAD and the sum of ACS-CoR + MetabAD with the harmonized disease score. The combination of ACS-CoR with MetabAD shows a significantly stronger association with disease outcome (p = 4.9 × 10 −8 , Kendall's τ = 0.5) than either ACS-CoR (p = 6.2 × 10 −6 , τ = 0.34) or MetabAD (p = 0.0022, τ = 0.31) alone.
Altogether, these results demonstrate that the pre-defined transcriptomic and metabolomic signatures each capture complementary TB-related biological variation that is not present in the other signature. Combining them yields a significantly improved signature of risk for TB that also prospectively captures the spectrum of post-challenge disease severity in RhCMV/MTB-vaccinated RM.

The Transcriptome and Metabolome Provide Complementary Information on TB Progression
Given that the specific case of ACS-CoR and MetabAD signatures demonstrated the benefit of combining transcriptomic and metabolomic measurements for predicting TB progression, we sought to more globally quantify the benefit of combining data from transcriptional and metabolomic platforms. Specifically, we determined whether combining an individual transcript with an individual metabolite more frequently resulted in a significant improvement in prediction performance than combining two transcripts or two metabolites. After performing an initial filtering step to remove transcripts and metabolites lacking any univariate association with TB progression (t-test p > 0.05), pairwise logistic regression models fitting TB progression as a function of all possible transcript-transcript (t-t), metabolitemetabolite (m-m), and transcript-metabolite (t-m) pairs from the remaining 5,892 transcripts and 195 metabolites were constructed. Each pairwise model was tested for significant complementarity between the two features, which was defined as a significant improvement (χ 2 p < 0.05) for the pair Bold values serve to highlight the most important metabolomic + transcriptomics row.
model compared to univariate logistic regression models for either individual element alone. All resulting (t-t), (t-m), and (m-m) pairs that exhibited complementarity between the features are listed in Table S3. While the majority (64%) of (t-m) pairings exhibited complementarity, this was observed for only a minority of (t-t) pairs (18%) and (m-m) pairs (42%) ( Table 2). This result indicates that, in a general sense, transcriptional and metabolomic measurements provide nonredundant information that is relevant for predicting risk of TB progression.

Correlations Between Transcripts and Metabolites Reveal Biological Networks Involved in TB Progression
We sought to explore biological relationships between transcript and metabolite abundances by comprehensively evaluating rank correlations between all detectable transcripts (n = 15,320) and metabolites (n = 830). A total of 11,851 statistically significant correlations were identified (FDR < 0.05). We validated these significant (t-m) correlations by testing them on an independent set of transcriptional and metabolomic profiles measured in healthy elderly German adults from the KORA F4 cohort (27). Despite demographic differences between the GC6-74 and KORA F4 cohorts, (t-m) correlations between the studies were significantly concordant (p = 2.61 × 10 −88 ). Furthermore, of 1,109 significant correlations identified in KORA F4, 181 were also significant in GC6-74 (FDR = 0.05), and correlation coefficients (Spearman's ρ) for the significant KORA F4 transcript-metabolite pairs were highly correlated between the studies (ρ = 0.59, p < 2 × 10 −16 ) (Figures S3, S4). Thus, robust correlations between the abundance of particular transcripts in whole blood and particular metabolites in serum/plasma were consistently observed in two very different human cohorts, suggesting that these transcripts and metabolites are functionally linked. We next identified the subset of (t-m) pairs that were significantly correlated in both GC6-74 and KORA F4 and that were significantly impacted by TB progression (Figure 2A). The resulting data-driven immunometabolic network was dominated by a core of hub genes connected to multiple metabolites. Three hub genes are significantly associated with TB progression: the mitochondrial fatty-acid metabolism genes CPT1A, SLC25A20, and PDK4. Correlated with these genes are fatty acid metabolites and related molecules such as carnitines, some of which are also associated with TB progression. In order to estimate the potential of the fatty-acid metabolism network for predicting TB progression, logistic regression models were fitted as above for each of the (t-m) pairs in this network. The best fit pair model was SLC25A20:eicosenoate (20:1n9 or 11), with an AUC of 0.66 (0.60-0.72) (Figures 2B,D). Another datadriven immunometabolic subnetwork was centered around a correlation between cortisol and immune signaling genes such as FKBP-5, CXCR4, CEBPD, DDIT4, and SOCS1. This small subnetwork exhibits strong potential for predicting TB with an AUC of 0.77 (0.71-0.82) (Figures 2C,D).

Joint Pathway Analysis of Transcripts and Metabolites Reveals Canonical Biochemical Pathways Altered in TB Progression
To complement the data-driven transcriptional/metabolic discovery analysis, we determined whether a canonical pathway knowledge-driven integration of transcriptional and metabolomic profiles would reveal additional immunometabolic TB risk signatures. Metabolites (n = 195) and transcripts (n = 5,892) previously selected for global (t-m) complementarity analysis ( Table 2) were separately tested for over-representation in canonical metabolic pathways defined in the Kyoto Encyclopedia of Genes and Genomes (KEGG), (33), and joint (t-m) enrichment p-values were then calculated using Fisher's method (34). The analysis identified 5 significantly jointly enriched pathways ( Table 3, Table S4) that were driven by a total of 142 TB-progression associated transcripts and 61 TB progression-associated metabolites ( Table S5)

Integration of Pathway-Based Signatures Leads to Improved Prediction of Progression to Active TB in HHCs
We next sought to determine whether the novel knowledgedriven metabolomics (t-m) signatures generated by the joint metabolic pathway enrichment analysis could be combined to yield a more accurate predictor for TB progression. This analysis is driven by our hypothesis that improved knowledge of the biological processes underlying TB progression would result in better prediction-an argument that has been made for omicsbased signatures for vaccine immunogenicity and efficacy [34]. We constructed a Composite Canonical metabolic pathway enrichment-based Signature (CCS) that was composed of the sum of scores from the optimal (t-m) pairs derived from each of the five jointly enriched canonical metabolic pathways (described above; Table 4, Figure 3). The ROC of the CCS signature for predicting TB progression in the GC6-74 HHC cohort is shown in Figure 4A, and surpasses the AUC obtained for the ACS CoR, MetabAD, or the combination of the two. We further evaluated whether the CCS signature could be enhanced by the knowledge-driven (KD) metabolomics (t-m) signatures that were identified by the comprehensive correlation analysis-i.e., the SLC25A20-eicosenoate and SOCS1-cortisol pairs ( Figure 2C). We termed this signature the CCS+KD signature. Combining the KD with the data-driven multi-omics signatures resulted  in further improved discrimination of TB progressors from controls in GC6-74 (CCS+KD ROC AUC = 0.81; Figures 4A,B).
Although direct comparison of the significance of the CCS+KD and the ACS-CoR signature may be biased by the fact that the CCS+KD signature does not represent a blind prediction on the GC6-74 dataset, overfitting of CCS+KD is limited by its small size (7 genes, 7 metabolites), and use only of validated biological pathways to construct the signature. Nevertheless, to test whether the improvements in predicting TB progression obtained with the CCS+KD signature was significant, we performed a bootstrapped resampling comparison of the signatures. This showed that the CCS+KD signature significantly outperformed the previously published ACS-CoR signature on all samples (p = 0.02, Figure 4B). Improved performance was also observed on samples taken within 6 months of TB progression, and on each study site (South Africa, The Gambia, Ethiopia) taken individually (Figures S2, S6). Because a further independent human test set was not available to validate the CCS+KD signature, we compared the signature to predictors based on randomly-selected sets of complementary (t-m) pairs to determine whether the specific combination of gene signatures from distinct metabolic pathways resulted in increased accuracy compared to randomly selected predictive (t-m) pairs. These random pairs were selected so that each individual random pair had similar predictive ability to the pairs making up the composite pathway based predictor. The 7-pair combined pathway model was in the top 0.2% of results (p = 0.002) (Figure 4C), indicating that using canonical pathway knowledge to guide signature design yields improved predictive performance. Finally, we assessed the association between the CCS and CCS+KD signature scores at 28 days post challenge and the RhCMV/TB trial harmonized disease outcome scores (as described above). Strong associtions were observed, with performance being similar for CCS and CCS+KD signatures (CCS: p = 8.8 × 10 −5 , τ = 0.356; CCS+KD: p = 4.0 × 10 −6 , τ = 0.359 Figure 4D). Both CCS and CCS+KD showed a stronger correlation than was observed for either of the single-omic ACS-CoR or MetabAD signatures (Figure 1C), however only CCS+KD showed a lower model p-value for the association. This was despite the non-detection of the SLC9A3 gene in the RM transcriptional data, which required the omission of the alanine/SLC9A3 pair, representing the protein digestion and absorption pathway ( Table 4), from the both CCS and CCS+KD scores. Signature performance was driven by the highly significant associations of the arginine/WARS (p = 4.3 × 10 −7 , τ = 0.43), cortisol/SOCS1 (p = 7.4 × 10 −6 , τ = 0.28) and 5-oxoproline/LAP3 (5.8 × 10 −5 , τ = 0.37) pairs ( Figure S8). This approach demonstrates that the biologically interpretable pathway-driven multi-omic signature, based on simple (t-m) pairs, outperforms the existing singleomic signatures for prospectively capturing the spectrum of TBassociated disease that will be observed in M.tb-challenged RM that experience varying degrees of protection mediated by the vaccine RhCMV/MTB.

DISCUSSION AND CONCLUSIONS
Improved biosignatures to identify the HHCs of TB cases likely to progress to active disease are urgently needed. Such biosignatures can serve to prioritize at-risk individuals for closer monitoring and targeted prophylactic treatment and to identify high-risk individuals suitable for enrollment in TB vaccine and drug trials (35). Combined transcriptional and metabolomic analyses of blood samples from HHC cohorts have the potential to reveal these predictive biomarkers and simultaneously identify immunometabolic inflammatory processes associated with TB progression. This could also enable development of novel host-directed therapies for TB (36,37). In the current study, we performed transcriptional and metabolomic analyses of HHCs from multiple African sites in the GC6-74 cohort and demonstrated that these two platforms provide complementary information for predicting TB disease progression. We further integrated these platforms with validated immunometabolic pathway information to generate accurate and functionally interpretable signatures of TB progression. We also demonstrate that while existing signatures of TB progression prospectively correlate strongly with post-necropsy measures of TB-induced lung pathology and bacterial growth measured in the RhCMV/TB vaccine challenge studies, multiomic signatures show further improved performance. The validation of these multi-omic signatures in both humans and rhesus macaques is noteworthy, as it reveals a universal set of M.tb progression pathways in human and RM, underscoring the utility of the RM model to explore the early response to TB in humans.
In our first analysis, we observed that combining a preexisting transcriptional signature of risk for TB progression [ACS-CoR (28)] with a diagnostic metabolomic signature derived from an independent cohort [MetabAD (25)] yielded a significant improvement in predicting TB progression and showed improved correlation with disease pathology. While our previously-reported ACS CoR demonstrated a sensitivity of 42% at a specificity of 80% for identifying individuals who progress to active disease up to 2 years after the initial HHC exposure, the combined signature (ACS CoR + MetabAD) achieved 56% sensitivity while maintaining 80% specificity on the same samples. This improvement in prediction accuracy allows detection of the majority of TB progressors while maintaining comparatively high specificity. This is important, given that the vast majority of TB exposed individuals do not progress to active disease. As such, the combined ACS CoR + MetabAD signature may serve for therapies in which a rule-in test is appropriate. Importantly, the Stop TB Partnership target product profile for a progression signature (38) is achieved by the ACS CoR (39,40) for identifying individuals who will progress to active TB in the following 12 months. The ability to improve ACS CoR by combining it with MetabAD will expand the potential clinical relevance of this signature.
By taking data-driven (correlation) and knowledge-driven (pathway) approaches to integrate the GC6-74 transcriptomics and metabolomics data in the context of TB progression, we identified novel functionally-interpretable signatures that demonstrated the potential to predict TB with higher accuracy. While previously reported blood-based signatures of TB disease and TB risk (14-23, 25, 26, 28) largely implicate interferon-driven changes in transcriptional activity (41), the data-driven CCS and CCS+KD signatures identified a broader range of biological functions. This increase in pathway diversity may derive, in part, from the potential of plasma and serum metabolomic profiling to quantify metabolites that are released into bloodstream from tissues throughout the body, not solely limited to blood cells (which is the case for whole blood based transcriptomics), and potentially including the lung and TB granuloma themselves. We identify several key immunometabolic pathways associated with TB disease progression in GC6-74 HHCs. Among the pathways implicated by the CCS signature is the fatty-acid metabolism network. This pathway is of critical importance in TB progression as M.tb favors fatty-acids as its cellular nutrient source (42), and M.tb itself has roughly 250 genes dedicated to fatty acid metabolism, a higher proportion than any other micro-organism (43).
The CCS+KD signature also implicates key immune regulatory pathways in TB progression, particularly strong correlations between levels of the metabolite cortisol and the genes SOCS1 and DDIT4, all three of which are upregulated in TB progressors. Cortisol is a glucocorticoid steroid hormone that induces apoptosis through activation of the glucocorticoid receptor, and it has been shown that inhibition of mTOR with rapamycin sensitizes lymphoid cells to glucocorticoid receptor mediated apoptosis (44). SOCS1 (Suppressor of Cytokine Signaling 1) is a negative regulator of signaling in the JAK/STAT pathway, which acts downstream of interferon-gamma. SOCS1, in its role as a repressor of the JAK/STAT pathway, shows extensive cross-talk with mTOR in response to interferon signaling (45). DDIT4 (DNA-damage inducible transcript 4), regulates p53/TP53-mediated apoptosis via the mTORC1 complex in response to DNA damage and hypoxic stress (46). The optimal SOCS1-Cortisol (t-m) pair in this subnetwork is the strongest individual predictor in the combined signature, underscoring the key role of the mTOR and JAK/STAT immune signaling pathways in TB progression. The lysosome pathway was among the five pathways implicated by the knowledgedriven analysis. This may reflect the established ability of M.tb to prevent formation of the phagolysosome complex (47,48) for protection from degradation by lysosomal enzymes and autophagy, thus decreasing antigen presentation (49), and facilitating subsequent M.tb escape to the cytosol (50). The analysis similarly implicated the protein digestion pathway, which was strongly down-regulated during TB progression, particularly SLC9A3 Na+/H+ membrane transporter protein. Importantly, lysosomal activity depends on the maintenance of an acidic milieu in the phagolysosome, and the main role of SLC9A3 is to regulate intracellular pH by transporting H+ out of the phagolysosome. M. tb also plays an active role in counteracting the phagolysosomal maturation and subsequent acidification (51). These were accompanied by the related amino-acyl tRNA pathway, which reflects the significant decrease of amino-acid levels in progressors vs. controls. Free amino-acids are produced by the degradation of proteins in the lysosome. Recently, it has been observed that inhibition of mTOR strongly reduces the efflux of amino-acids from the lysosome (52). Thus, mechanistic linkages exist between four of the pathways implicated here in TB progression (lysosome, protein digestion, amino-acid tRNA, and cortisol signaling).
The glutathione pathway, relating to the production of the antioxidant glutathione and other gamma-glutamyl amino-acids, was also selected as part of the CCS signature. Consistent with the observed upregulation of protein-degradation enzymes in the lysosome, leucine amino peptidase 3 (LAP3) was strongly upregulated in TB progressors, and this enzyme formed part of the optimal predictive (t-m) pair in this pathway. Also notable was the observed down-regulation of glutathione peroxidase 2 (GPX2) in progressors. GPX2 helps protect cells against oxidative stress by catalyzing glutathione-mediated reduction of peroxides. While these oxidative peroxides can damage the cell, release of reactive oxygen species by macrophages plays a critical role in bacterial killing (53). Reduction in GPX2 levels is consistent with allowing a higher degree of bacterial killing accompanied by increased host cell damage. Counteracting this process by supplementation with N-acetylcysteine to increase glutathione levels during TB treatment has recently been shown to be associated with faster sputum negativity and reduced lung cavity size (54). This example indicates the potential of this pathwaydriven analysis to reveal targets for host directed therapies. Our analysis also implicated the Sphingolipid metabolism pathway, an established mediator of the host response to TB (55)(56)(57). Sphingolipids are key building blocks of cell membranes which also play important roles in immune signaling and represent major constituents of the mucus secreted by lung alveolar epithelial cells (57). In particular, sphingosine-1-phosphate is involved in the induction of antibacterial activity in macrophages that participate in the control of M.tb (55). Inhibition of translocation of cytosolic SK1 to the phagosome membrane is associated with survival of M.tb (56). Furthermore, M.tb may manipulate host sphingolipid metabolism to enhance its persistence and replication (57).
The approach described in this work can, in principle, be applied to other, similar diseases. Leprosy, another mycobacterial disease, also often remains asymptomatic in the host for years after the initial infection. Thus, a signature of risk post-exposure could be of clinical value. Gene-expression studies have been performed on several human leprosy disease cohorts, looking at samples taken from patients; including leprosy skin lesions (58), Schwann cells from nerve biopsies (59), and from PBMCs (60). A key limitation of these studies is that samples are from individuals who have already succumbed to active disease. The development of a prospective signature of leprosy risk would require the recruitment and follow-up of a high-risk population in order to search for biomarkers apparent before disease onset. Lessons learned from TB risk cohorts in this study suggest immune processes associated with leprosy may also be evident in blood prior to the onset of symptoms, and transcriptional studies of active disease may be of use to guide the discovery of a potential leprosy risk signature.
The integrated transcriptomic and metabolomic approaches applied here have allowed characterization of biological processes that are coordinated simultaneously inside and outside the cell, which cannot be captured by transcriptional profiling alone. Intriguingly, the set of processes identified here as significantly involved in progression in certain aspects mirrors events occurring in M. tb itself. Galagan et al. (61) have previously revealed a direct link between the hypoxic stress response, fatty acid catabolism, lipid biosynthesis, and protein degradation in the M.tb transcriptional regulatory network.
An unanswered question still exists: what is the precise protective or pathogenic role associated with each of these pathways? Transcriptomic and proteomic investigation of TB-infected adolescents suggests that there exists a sequential modulation of immunological processes leading to TB progression (41), consistent with TB progression from incipient, to subclinical, to active disease. However, it remains unclear whether our power to predict TB progression is derived from observing a normally effective immune response to TB be overcome by high bacterial load, or whether differential regulation of the pathways identified here represents a failure of particular immune mechanisms to respond adequately to M.tb. Recent studies have noted a continuum of granulomas at different developmental stages including solid granulomas, necrotic granulomas and caseous granulomas, and the decisive impact of the range of granuloma microenvironments in a single patient potentially showing a range of outcomes from sterilizing immunity to loss of control (62,63). This suggests that analysis of biological samples taken close to or from the granuloma itself may be required to better understand the precise protective or pathogenic role of diverse host responses to TB. In addition, a greater knowledge of pathways associated with progression can lead to novel host-directed approaches to improve treatment outcomes, and suggest biological mechanisms underlying granuloma-specific loss-of-control. Future analysis may expand this strategy by integrating other complementary high throughput assessments of host inflammatory and metabolic processes, including cellular and serum proteomics, intra-cellular metabolomics, and focused single-cell analysis of macrophages and T-cells (64).

MATERIALS AND METHODS
The study design, sample acquisition, RNAseq and metabolomic profiling techniques applied here have been described in detail elsewhere (25,29,30,40). These methods are summarized below.

Household Contact Study Design and Sample Acquisition
This study includes cohorts from four geographic sites, all with a prospective longitudinal design to identify prospective correlates of risk of TB disease. The household contact study design included participants from four African sites: South Africa, The Gambia, Ethiopia, and Uganda, as part of the Bill and Melinda Gates Grand Challenges 6-74 study. The GC6-74 parent cohort consisted of 4,466 HIV-negative participants, aged 10-60 years, with no clinical signs of active TB disease. Participants were HHCs of a TB index case, who was at least 15 years old, with a confirmed positive sputum smear for acid fast bacilli. HHCs were enrolled within no more than 2 months of the index case being diagnosed with active TB.
HHCs who progressed to active TB disease between 3 and 24 months from recruitment were considered progressors. Active TB in progressors was diagnosed by microbiological confirmation of M.tb in sputum samples in all except 7 individuals, who were diagnosed based on clinical symptoms, chest and other radiographs (CXR) consistent with TB and response to chemotherapy, without microbiological confirmation (29). HHCs diagnosed with active TB disease within 3 months of enrolment were excluded from further analysis. Each progressor was matched to 4 controls who remained healthy during followup. Matching was done by site, age category, sex, and year of recruitment category. Age categories included 4 categories: <18,

RhCMV/TB RM Study Design and Sample Acquisition
RM disease outcome data was obtained from the previously published RhCMV/TB vaccine trial (32). This trial comprised RMs from two independent challenge studies, including those vaccinated with RhCMV/TB candidate vaccines, BCG and unvaccinated RMs. RM counts for each experimental group were 23 from Study 1, and 36 from Study 2. Thirty three RMs were vaccinated with a RhCMV/TB candidate vaccine; 7 were vaccinated with both RhCMV/TB and BCG; 13 were unvaccinated, and 6 vaccinated with BCG only.

RNAseq Profiling
RNAseq was carried out as previously described (28,32). PAXgene blood RNA samples (Beckton Dickinson, New Jersey, USA) from Uganda and Ethiopia and extracted RNA from the Gambia were shipped for processing at the University of Cape Town. RNA was extracted from blood using the PAXgene Blood RNA kit (Qiagen, Germantown, MD, USA), and separated into aliquots for local quality control, RNA-sequencing and qRT-PCR. Quantification of RNA and initial quality control were performed using the NanoDrop 2000 TM spectrophotometer (ThermoFisher Scientific, Waltham, MA, USA) to measure concentrations and 260/280 ratios, followed by sampling on the Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) to determine RNA Integrity Number (65). TruSeq cDNA library preparation (Illumina, San Diego, CA, USA) from a minimum of 200 ng RNA samples of RIN ≥ 7 was sequenced at the Beijing Genomics Institute (BGI, Shenzhen, China), at 60 million 50 bp paired-end reads using Illumina HiSeq-4000 sequencers. Gsnap (66) software was used to align read pairs to the hg19 human genome. Further analysis was done using the gene count table, normalized with edgeR. Transcriptional profiles are available on NCBI GEO for GC6-74 samples (GSE94438), and RhCMV/TB RMs (GSE102440).

Metabolomic Profiling
Metabolomic profiling was performed by Metabolon, Inc. as described previously (67), using either participant plasma (Ethiopia, The Gambia) or serum (Uganda, South Africa) samples. For RMs, plasma was collected as previously described (32). Plasma samples from Study 1 and Study 2 were analyzed in two separate batches ∼6 months apart. Prior to metabolomics analysis, RM plasma samples were rendered non-infectious by sterile filtering twice using sterile 25 mm Pall acrodisc PF syringe filters with stupor membrane (prod. #4187).
Plasma and serum samples were analyzed in concert with a pool of normalization control plasma extensively characterized by Metabolon. Samples were analyzed using three mass-spectrometry pipelines: ultra high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS; positive mode), UPLC-MS/MS (negative mode), and gas chromatography-mass spectrometry (GC-MS). The UPLC-MS/MS pipeline used a Waters ACQUITY UPLC (BEH C18-2.1 x 100 mm, 1.7 µm) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer. The GC/MS pipeline used a Thermo-Finnigan Trace DSQ GC/MS with electron impact (EI) ionization.
Metabolites were identified by automated comparison of the ion features in the experimental samples to a reference library of >4,000 chemical standard entries that included retention time, molecular weight (m/z), preferred adducts, and in-source fragments as well as associated MS spectra and curated by visual inspection for quality control using software developed at Metabolon (68). Additional mass spectral entries have been created for structurally unnamed biochemicals (>5,000 in the Metabolon library), which have been identified by virtue of their recurrent nature (both chromatographic and mass spectral). These compounds have the potential to be identified by future acquisition of a matching purified standard or by classical structural analysis.
Peaks were quantified using area-under-the-curve. Raw area counts for each metabolite in each sample were normalized to correct for variation resulting from instrument inter-day tuning differences by the median value for each run-day, therefore, setting the medians to 1.0 for each run. Missing values were imputed with the observed minimum after normalization.

Development of Metabolomics Disease Signature (MetabAD)
The metabolomics active disease cohort obtained by Weiner et al. (25) was used to train and parameterize the metabolomics active disease model. These samples are available as part of the R tmod (70) package as the tbmprof dataset. Metabolites detected in GC6-74 plasma and serum samples that also had values available in the active disease dataset were considered for model inclusion (207 metabolites). A random forest model was trained using the R (71) caret (72) package, with performance assessed by leave-oneout cross validation. Metabolite model importance to the model was ranked by contribution to prediction error using the caret varImp function, and the model retrained on the top 100 features only. This was repeated recursively to shrink the model to 50 and 25 metabolites. This 25-metabolite signature was used for all further analyses.

Receiver-Operating Characteristic (ROC) Analysis
The R package pROC (73) (function: roc) was used to calculate ROC curves by applying a set of thresholds to numeric predictions from predictive models to predict the progressor/non-progressor status of the samples, and then calculating the sensitivity and specificity of the predictor at each threshold. 95% confidence intervals (CI) around the AUC were estimated using 2000 bootstrapped replicates as implemented in pROC, and ROC curves with a lower 95% confidence interval above 0.5 were considered statistically significant. ROC curves were plotted using the R ggplot2 (74) package. The optimal classification threshold for binary classification was chosen from the ROC curve by identifying the point on the ROC curve with the smallest Euclidian distance to perfect classification (sensitivity = 1 and specificity = 1).
Significance of combining the ACS CoR and MetabAD signatures was assessed by fitting a logistic regression model to ACS CoR alone, MetabAD alone, and ACS CoR + MetabAD and using the R anova.glm(test = "Chisq") function to test for a significant reduction in residual deviance for the Combined model vs. both ACS CoR and MetabAD alone.

Poisson Modeling of RhCMV Study Data
Transcriptomic, metabolomics, and harmonized disease outcome data data from the RhCMV/TB trial were used to develop regression models of association between harmonized disease score "Outcome" and (t-m) signature scores "Signature_Score." As disease outcome scores were strictly non-negative and derived from lung pathology and TB culture count scores, Poisson regression was chosen as the appropriate modeling approach. Two forms of models were fit to evaluate goodness of fit to the data, and a "Study" term was included in the modeling in order to exclude technical effects related to the separately performed transcriptional or metabolomic profiling of the particular vaccine study (Study 1 or Study 2).  (75)] using the regression coefficient covariance matrix provided by the heteroscedasticity-consistent covariance estimation (R library: sandwich) were performed comparing the two (above) models generated for each signature. Thus, the resulting Pvalue measures the study-invariant P-value of the association of Outcome and Signature_Score. The correlation between study adjusted Signature_Score and Outcome was calculated as Kendall's tau (τ ).
All plots showing the results of modeling the relationship between Signature_Score and Outcome have been adjusted to remove the effect of the Study coefficient.

Logistic Regression Modeling
Logistic regression models of the form progression ∼ (gene|metabolite) were fit for each individual metabolite and gene, using the R glm() function. χ 2 p-values were calculated using the R anova.glm() function. To assess complementarity with ACS COR, logistic regression models of the form (1) progression ∼ ACS CoR and (2) progression ∼ ACS CoR + metabolite were fit. Significant complementation was defined as model (2) showing significantly better fit to the data (p < 0.05) than model (1) calculated using anova.glm(test = "Chisq"). For each gene-metabolite pair, a model of the form (1) progression ∼ gene + metabolite was compared to both (2) progression ∼ gene and (3) progression ∼ metabolite. Complementarity was defined as model (1) being significantly better than both models (2) and (3) as measured by anova.glm().

Transcript-Metabolite Correlations
Transcript-metabolite correlations were calculated for matched samples using Spearman's rho. P-values for these correlations were calculated using Fisher's transformation, and a falsediscovery rate correction applied. Significant overlap of correlated pairs of genes and metabolites significant at FDR < 0.01 were compared between GC6-74 HHC and KORA F4 using the hypergeometric test.

Joint Pathway Analysis
KEGG IDs for each transcript were obtained using the biomaRt (76) and KEGGREST (77) R packages. KEGG pathway annotations for each transcript were obtained using the Bioconductor org.Hs.eg.db (78) package. KEGGREST was used to obtain KEGG pathway IDs for each metabolite. The number of significant genes and significant metabolites from the GC6-74 HHC datasets mapped to each pathway was determined, and compared with the total number of genes and metabolites mapped. The hypergeometric test was used separately to determine pathways enriched in significant (1) genes and (2) metabolites. These two p-values were combined using Fisher's method to determine an overall p-value for the pathway.

Joint Metabolic Pathway Enrichment-Based Logistic Regression Models
Logistic regression models were fit, as above, for each gene/metabolite pair in each significant pathway, with the single most predictive pair from each pathway being selected. Predictions from each selected gene/metabolite pair model were summed to form the final pathway predictor.

Significance Testing of Pathway-Enrichment-Based Model
The pathway based model was created by selecting the most predictive and synergistic (as described above) (t-m)-pair for each significant KEGG pathway. The top synergistic pairs from each correlation subnetwork (if a synergistic pair was identified) were also included. Predictive performance of the individual pairs in the signature was referred to as "pair-ROC." The overall ROC for the signature was calculated by summing the logistic-regression predictions for each individual pair to construct an overall ROC curve ("sum-ROC").
Significance of this model was calculated by repeatedly randomly sampling the same number (N) of synergistic pairs as were included in the model from the global set of synergistic pairs. Here N = 7 = the total number of pairs in the composite pathway predictor. In order to ensure comparable performance of randomly sampled pairsets to those in the pathway model, the mean pair-ROC for each pairset was required to be within ± 10% of the mean pair-ROC for the pathway model (mean composite AUC = 0.709). The combined AUC for the random predictor was then calculated by summing the individual pairs, in the same way as for the pathway predictor (see Figure S7). Significance was calculated as p = 1-(fraction of randomly sampled pair models with lower sum-ROC).
Significance comparison of the combined pathway based model to the ACS CoR model was done using a bootstrapped comparison implemented in the pROC roc.test() function.

DATA AVAILABILITY
All datasets generated for this study are included in the manuscript with the exception of the rhesus macaque metabolics data which is included as Table S6.

ETHICS STATEMENT
All study sites adhered to the Declaration of Helsinki and Good Clinical Practice guidelines. Ethical approvals were obtained from institutional review boards (IRBs) and all study participants provided written informed consent. For all sites, adult participants, or legal guardians of participants aged 10-17 years old, provided written or thumb-printed informed consent to participate after careful explanation of study aims and any potential risks. The relevant IRBs and ethical approvals were: for the South African study site, the Stellenbosch University Institutional Review Board (N05/11/187); for the Ugandan study site, the Uganda National Council for Science and Technology (MV 715) and University Hospitals Case Medical Center (12-95-08); for the Ethiopian site, the Armauer Hansen Research Institute (AHRI)/All Africa Leprosy, TB and Rehabilitation Training Center (ALERT) (P015/10); and for the Gambian study site, the Joint Medical Research Council and Gambian Government (SCC.1141vs2). Thumb-printed indication of informed consent was explicitly included as acceptable in the IRB-approved Informed Consent Form (ICF) for the Gambian site. None of the Ugandan study participants used thumb-printed confirmation of informed consent. For the South African and Ethiopian sites, we obtained letters from the relevant IRBs explicitly approving the use of thumb-printed informed consent.        Table 3. (A-E) Maps for lysosome, aminoacyl-tRNA synthesis, protein digestion and absorption, glutathione metabolism, and sphigolipid metabolism respectively. Significantly up-and down-regulated transcripts are highlighted in red and green, respectively, and significantly up-and down-regulated metabolites are shown in blue and yellow, respectively.      Figure 1A) closest to 100% sensitivity and specificity.

AUTHOR CONTRIBUTIONS
Table S3 | (t-t), (t-m), and (m-m) pairs that showed significant complementarity in discriminating progressor and control samples. Table S4 | (t-m) pairs associated with TB progression found in all combined transcriptional/metabolic KEGG pathways. Table S5 | Progression-associated fold changes of each significant transcript and metabolite mapped to the KEGG pathways listed in Table 3.