Transcriptome-Based Molecular Networks Uncovered Interplay Between Druggable Genes of CD8+ T Cells and Changes in Immune Cell Landscape in Patients With Pulmonary Tuberculosis

Background Tuberculosis (TB) is a major infectious disease, where incomplete information about host genetics and immune responses is hindering the development of transformative therapies. This study characterized the immune cell landscape and blood transcriptomic profile of patients with pulmonary TB (PTB) to identify the potential therapeutic biomarkers. Methods The blood transcriptome profile of patients with PTB and controls were used for fractionating immune cell populations with the CIBERSORT algorithm and then to identify differentially expressed genes (DEGs) with R/Bioconductor packages. Later, systems biology investigations (such as semantic similarity, gene correlation, and graph theory parameters) were implemented to prioritize druggable genes contributing to the immune cell alterations in patients with TB. Finally, real time-PCR (RT-PCR) was used to confirm gene expression levels. Results Patients with PTB had higher levels of four immune subpopulations like CD8+ T cells (P = 1.9 × 10−8), natural killer (NK) cells resting (P = 6.3 × 10−5), monocytes (P = 6.4 × 10−6), and neutrophils (P = 1.6 × 10−7). The functional enrichment of 624 DEGs identified in the blood transcriptome of patients with PTB revealed major dysregulation of T cell-related ontologies and pathways (q ≤ 0.05). Of the 96 DEGs shared between transcriptome and immune cell types, 39 overlapped with TB meta-profiling genetic signatures, and their semantic similarity analysis with the remaining 57 genes, yielded 45 new candidate TB markers. This study identified 9 CD8+ T cell-associated genes (ITK, CD2, CD6, CD247, ZAP70, CD3D, SH2D1A, CD3E, and IL7R) as potential therapeutic targets of PTB by combining computational druggability and co-expression (r2 ≥ |0.7|) approaches. Conclusion The changes in immune cell proportion and the downregulation of T cell-related genes may provide new insights in developing therapeutic compounds against chronic TB.


INTRODUCTION
Tuberculosis (TB) is a chronic infectious lung disease caused by pathogenic Mycobacterium tuberculosis (MTB) belonging to the Mycobacteriaceae family. Despite the widespread use of antibiotics and live attenuated vaccine, TB remains to be the major cause of morbidity and death among all bacterial diseases (1). This is primarily due to the rapid emergence of drug-resistant MTB strains and the incomplete knowledge of complex host-pathogen interactions (2). In the initial stages of infection, MTB invades and replicates in the macrophages after reaching the alveolar air sacs of the lungs (3). Granulomas, hallmark of TB, are formed around the infected macrophages by the organized aggregation of immune cells (like T and B lymphocytes), multinucleated giant cells, dendritic cells, and fibroblasts. Granulomas also suppress the host immune responses, as dendritic cells and macrophages were unable to present antigen to lymphocytes (4). It is noteworthy to mention that mycobacteria can induce distinct host responses from asymptomatic conditions to severe pulmonary illness (5). However, underlying immune cell types and their association with the differentially expressed genes in TB and how they contribute to severe infection are not yet fully explored.
Over the past few decades, microarray-based genome-wide RNA profiling has evolved as a powerful approach to investigate the host transcriptional response (of ∼19,000 genes) in infectious diseases (6). However, differences in the type of clinical samples, array platforms, and statistical approaches used, created a discordance in interpreting massive transcriptomics data. Advances in statistical modeling and bioinformatics approaches have accelerated the identification of disease-centric genes by employing gene networking methods based on graph topological parameters for many infectious diseases (7,8). Moreover, the new bioinformatic methods like estimating relative subsets of RNA transcripts (CIBERSORT), Tumor Immune Estimation Resource (TIMER), and Estimating the Proportions of Immune and Cancer cells (EPIC) are developed to characterize immune cell composition using large-scale gene expression data (9,10). These bioinformatic methods implement functional enrichment scores based on the presence of the query genes over reference gene sets. They perform variety of biological analyses including immune responses based on the defined gene sets. Exploring abnormal immune cell infiltration is critical for developing novel transformative therapies to combat diseases such as cancer, myocarditis, and TB (11,12). Therefore, in order to characterize alterations in immune cell proportion landscape and transcriptomic profile, and to identify new molecular therapeutic targets, this study applied statistical and knowledgebased systemic investigations (such as semantic similarity, gene correlation, and graph theory parameters) to the blood transcription data of patients with TB.

Study Design and Global Expression Data
The genome-wide gene expression dataset (GSE83456) (13) was imported in raw format from the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo). This dataset has expression profiles of 45 pulmonary TB (PTB) and 61 control blood samples generated on the Illumina Human HT-12 V4.0 expression bead chip (Illumina, Inc, USA). The detailed sample information is given in the Supplementary Material Table 1. Figure 1 depicts the overall work design employed in the current research analysis.

Global Data Preprocessing and Screening of Differentially Expressed Genes
R/Bioconductor packages were used to analyze microarray gene expression data. Raw data was fed into the R package limma (14) for the standardization and noise reduction of the probe data, and the raw signal levels for each probe set were standardized. The Quantile method was used to normalize the microarray datasets. The t-statistic was used to detect statistically significant differentially expressed genes (DEGs) between the PTB and control samples. To eliminate false positives, the Benjamini and Hochberg false discovery rate (FDR) with p < 0.05 was used as a cut-off point for gene data. Thereafter, probes were matched to Entrez Gene IDs, and duplicates (with the highest fold change difference) and unmatched transcripts were filtered out. In the final stage, all the DEGs were classified as up-and downregulated genes based on the fold change threshold (FC ≥ |1.5|).

Identification of Immune Cell Composition From Gene Expression Profiles
The fractions of 22 immune cell types in the PTB transcriptome profile were estimated using the CIBERSORT algorithm (15). This program employs linear support vector (SVR) regression to perform feature selection and to deconvolve the cell mixture from the gene expression profile. In this study, gene expression profiles of PTB and control samples were fed into the CIBERSORT algorithm where the algorithm converts the gene expression matrix into the immune cell-matrix and applies the filtering criteria of 1,000 permutations and significant p value set at ≤ 0.05. FIGURE 1 | Study workflow. The gene expression profiles of patients with tuberculosis (TB) and controls were used to deconvolute immune cell fractions. Differentially expressed genes (DEGs) were mapped to functional pathways and then correlated with immune cell and TB meta-analysis gene signatures. The overlapping genes showing semantic similarity were explored by druggability and protein interaction analysis to identify novel candidate therapeutic targets/biomarkers for combating TB infection.

MTB Meta-Profiling Genetic Signatures
We obtained 380 genetic signatures identified from the modular analysis and meta-profiling of 16 publicly available gene expression datasets (16) (Supplementary Material Table 2). We compared the overlapping genes between these 380 TB genetic signatures, DEGs, and genes associated with immune cell populations for downstream analysis.

Identification of Immune Pathways From Gene Expression Profiles
The functional enrichment analysis of the DEGs was performed using g:Profiler (17), a webserver to interpret the function of gene lists (https://biit.cs.ut.ee/gprofiler/gost). This server matches a queried gene list to established functional data sources and uncovers gene ontologies as well as pathway terms that are significantly enriched at q ≤ 0.05. Immune-related pathways were screened from the functional enrichment list. The DEGs which are contributing to immune-associated pathways were mapped to the known signature of TB and immune signature genes in the CIBERSORT to identify unreported genes in TB.

Identification of Semantic Similarity
Using encoded evidence in the Gene Ontology (GO) hierarchy, the functional similarity between unreported genes and known TB signatures is assessed. In this study, we used Wang's similarity metric to compare the biological process (BP) hierarchy. To quantify the semantic similarity between gene pairs, we used the R tool GoSemSim (18).
We employed Resnik's measure of Best-Match Average (BMA) method, which combines the semantic relationship scores of numerous GO terms and produces the average of all maximal similarities in each row and column because a gene can be annotated by many GO terms (19). Following that, gene pairs were selected based on a semantic score of ≤0.5, with a larger score indicating a stronger relationship. The following formula was used to calculate the semantic similarity among gene pairs: where T A designates the contribution of t ∈ T A term to the semantics of A based on the relative positions of t and A in the graph, and S A (t) implies the role of t ∈ T A term to the semantics of A. The difference of immune infiltration between PTB and normal controls (the control group was marked in blue color and the PTB group was marked in red color. P < 0.05 were considered as statistically significant).

Druggability Analysis
The Drug-Gene Interaction Database (DGIdb) (20) was used to assess the druggability of the genes. DGIdb is a central resource for drug-gene interaction data and the potential druggability of each query gene based on different databases. We included approved drugs, antineoplastic drugs, and Immunotherapeutic drug interactions filters and in advance filters, we selected 9 Disease-Agnositic sources databases, 43 gene categories, and 31 interaction types. We used drug target interaction with interactions score ≥0.03 to search the DEGs, which could act as potential drug target genes for MTB.

Correlation Among the Druggable Genes
The correlation between the druggable genes in PTB was investigated using Pearson's correlation method. The correlation (r) between each pair of gene matrices was ranked using Pearson's correlation coefficient (PCC). The formula used for computing the PCC existing between two genes is given below.
wherex andȳ are the average of sample's gene expression signal in PTB of the two genes, respectively. The gene co-expression was confirmed using the Search Tool for the Retrieval of Interacting Genes (STRING) (21), an online protein interaction database, with high confidence interaction score of ≥0.7.

Real Time-PCR (RT-PCR) Validation of Druggable Genes
In order to verify our bioinformatics findings, we validated the expression of 9 druggable genes by the RT-PCR method. In brief, the RNA collected from THP-1 cell lines infected with the MTB strain (H37Rv) was collected after the post-incubation period as previously described (22). In brief, total RNA was reverse transcribed as complementary DNA (cDNA) and then amplified by the RT-PCR method using gene-specific oligonucleotide primers. The relative expression level of potentially druggable genes between the control and test cell lines was estimated by the 2 − CT formula after normalizing their expression levels with the GAPDH internal reference gene. A p < 0.05 under the standard two-tailed t-test was considered a significant value.

Immune Cell Proportion Analysis of PTB Gene Expression Profile
The immune cell proportion landscape of PTB is not yet fully revealed, particularly in low abundant cell subpopulations.
In this study, the CIBERSORT algorithm has identified the

like B cells naive, plasma cells, T follicular helper cells, CD8 + T cells, resting memory CD4 + T cells, T cells, CD4 + memory T cells activated, memory B cells, naive CD4 + T cells, regulatory T cells (Tregs), and Gamma-delta (γδ) T cells.
On the other hand, DEGs associated with 12 innate immune cell type categories were NK cells resting, macrophages M2, monocytes, macrophages M1, macrophages M0, resting dendritic cells, eosinophils, dendritic cells activated, mast cells resting, NK cells activated, mast cells activated, and neutrophils were also found be enriched. The immune cell proportions of adaptive immune cells and innate immune cells are represented in Supplementary Material Figure 1.
The genes associated with four immune cells; NK cells activated, T follicular helper cells, dendritic cells resting, and eosinophils, were not significantly enriched in both groups. The proportion plot of the enriched immune cell types is represented in Figure 2A. We observed higher relative proportion of genes enriched for cell types like CD8 + T cells (P = 1.9 × 10 −8 ), NK cells resting (P = 6.3 × 10 −5 ), monocytes (P = 6.4 × 10 −6 ), and neutrophils (P = 1.6 × 10 −7 ) in the PTB samples compared to the control samples ( Figure 2B, Supplementary Material Figure 1). Among the 4 cell types with a higher relative proportion of enriched genes from DEGs, the genes of CD8 + T cells and NK resting cells were found to be downregulated in PTB when compared to the control samples. On other hand, monocytes and neutrophil-associated genes were highly active in PTB when compared to control samples.

Identification of DEGs From Gene Expression Profile
The standardized gene expression data of "PTB vs. controls" was used to identify the differentially expressed genes. The volcano plot representing the distribution of fold change and the significant p-value is given in Figure 3A. The PTB vs. control group analysis revealed 624 DEGs (FC |1.5|, adj p-value of 0.05), with 393 upregulated and 231 downregulated genes. The top 10 DEGs obtained from PTB vs controls are given in Table 1. The mean distribution of intensity of differentially expressed genes in PTB and control samples is represented in Figure 3B.

Functional Enrichment Analysis of DEGs
The differentially expressed genes enriched using g:Profiler with the statistical significance of q value ≤ 0.05, generated 309 ontologies of Biological Process (BP), 17 ontologies of Molecular Function (MF), 42 ontologies of Cellular Component (CC), and 85 terms in pathways (Supplementary Material Table 3). Overall, the enrichment analysis has shown the overlap with immune-related ontologies and pathways. We pooled immune-related pathways from enrichment terms to check how DEGs affect the immune system pathways. We observed the upregulation of pathways such as interferon signaling (q = 9.16 × 10 −27 ), cytokine signaling in the immune system (q = 2.34 × 10 −21 ), neutrophil degranulation (q = 3.13 × 10 −11 ), viral genome replication (q = 2.47 × 10 −7 ), and response to biotic stimulus, etc. (Supplementary Material Figure S1). On the other hand, pathways like T-cell antigen receptor  Figure S1). Overall, our functional enrichment analysis points to a major downregulation of T cell-related ontologies and pathways.

Mapping DEGs to Immune Cell Proportions in PTB
Here we investigated the genes overlapping between the CIBERSORT signature and DEGs. There are about 96 DEGs ( Figure 3C) contributing to different immune cell types (Supplementary Material Table 4). Interestingly, we found that 31.25% of DEGs were contributing to the immune cell type "CD8 + T cells." We also observed that all those genes contributing to the "CD8 + T cells" were downregulated in PTB samples, as shown in Figure 3C. The findings from the mapping of DEGs to immune cell proportions are consistent with functional enrichment analysis, where T cell-related pathways have shown major dysregulation.

Comparison of DEGs, Immune Cell Signatures With TB Meta-Analysis Signatures
The differentially expressed immune signatures in the sample of patients with TB were compared with the known signatures of TB (Supplementary Material Table 4). Here, we observed 39 (40.6%) differentially expressed immune signatures overlapping with TB meta-analysis gene signatures and 57 novel genes (59.3%) contributing to the immune cell proportion (Figure 3D). Further, semantic similarity (functional association) of these 57 novel genes with 39 overlapping with TB signatures was performed to identify the most predominant genes. The semantic similarity score of ≥0.5 among the gene pairs was considered as a highly significant score implying a stronger association. The semantic similarity of 45/57 genes has shown a stronger functional association with overlapping with TB signatures ( Figure 3E). Again, it is important to pinpoint that 20 out of 45 genes (44%) were contributing to the immune cell type "CD8+ T cells."

Druggability and Co-expression Analysis
Druggability analysis was performed on the 45 genes that had shown higher functional similarity to the known TB signature. We found that 21 druggable genes (Supplementary Material Figure S2) (46%) with an interaction score ≥ 0.03, were enriched against terms like an antibody, binder, inhibitor, antagonist, agonist, modulator, and activator. Of all the druggable genes, ITK and FCGR3B genes were observed to have the highest number of drug interactions (19 drugs) followed by PTGDR (13 drugs), TLR7 (12 drugs), and CD3E (10 drugs). The drug-target interaction network is represented in Figure 4A. Next, we checked the association of druggable targets to the immune cell types. Among the 21 druggable targets, 48% were contributing to the immune cell type "CD8 + T cells" followed by monocytes (9%), naïve B cells (9%), and neutrophils (9%) (Figure 4B). Interestingly, the expression patterns of these 21 druggable genes have shown a clear distinction in PTB when compared to control samples ( Figure 4C).
Interestingly, we noticed a cluster of 12 co-expressed genes ITK, CD2, CD6, ZAP70, CD247, CD3D, SH2D1A, CD27, CD3E, IL2RB, IL7R, and PTGDR. To validate the co-expression among the 12 downregulated genes we queried them in the STRING database with a high confidence score of ≥ 0.7. The STRING database identified strong interaction among the 11 genes except for PTGDR (Figure 5C). Co-expression and protein interaction network from the STRING database has shown the mutual influence of the 11 genes in the expression and functional activities. Nine genes (expect CD27, PTGDR, and IL2RB) were contributing to "CD8 + T cells." All the genes were predicted to be targeted by different drug molecules, most of which are  monoclonal antibodies (Table 3). Hence, the integrated analysis has depicted predominant deregulation of "CD8 + T cells" as the key genetic signatures for active PTB.

RT-PCR Validation of Druggable Genes
The real-time PCR gene expression results showed that the relative expression levels of 9 potential druggable genes (ITK, CD2, CD6, CD247, ZAP70, CD3D, SH2D1A, CD3E, and IL7R) were consistent with the findings of microarray hybridization. All the genes were differentially expressed between treated and untreated cell lines (p ≤ 0.01). These results confirm the dysregulated "CD8 + T cell signaling" plays important role in establishing TB infection (Supplementary Material Figure S3).

DISCUSSION
Host genetic factors are known to play an important role in regulating the initial TB infection and determining the disease progression in the lungs (37). Genome-wide association studies have underlined the relevance of numerous polymorphisms in immune response-related genes in contributing to susceptibility or resistance to TB (38). However, polymorphism studies were unable to provide full insight into the complex molecular crosstalk between thousands of host genes involved in innate and adaptive immune responses. In this context, high throughput transcriptome approaches have shown great promise in dissecting the host-pathogen interactions thereby helping to develop a novel vaccine and therapeutic targets for several infectious diseases (39)(40)(41). Therefore, we explored host immune system response through integrated systems biology approach based on immune cell subtyping and differential gene expression profiles of patients with PTB to normal controls. The cellular and molecular background of TB-induced systemic immunological dysregulation is poorly understood. Therefore, we screened the DEGs in PTB and deciphered their contribution to immune cell proportion alterations. Traditionally, host transcriptomics studies have relied on whole blood to characterize TB gene signatures by aggregating transcriptomic signals from many different cell types but were unable to identify specific immune cell type signatures (42). To overcome these constraints, we used a powerful computational technique called CIBERSORT to define the range of immune cell states in the blood of patients with TB. This method relies on linear support vector regression (SVR), a machine learning approach to deconvolute the gene expression signatures, known as "signature matrix" for determining the relative fraction of immune cell proportions in blood or tissues (15). The CIBERSORT method has been widely used to infer immune cell types from transcriptomics data to predict outcomes of different cancers (9,43,44) and infectious diseases (45)(46)(47). In this study, the CIBERSORT output identified the downregulation of "CD8 + T cells" in patients with PTB.
The GO functional enrichment analysis of gene expression profile revealed the upregulation of interferon signaling, cytokine signaling in the immune system, neutrophil degranulation, and response to biotic stimulus pathways. The MTB infection of primary human macrophages is shown to induce type I IFN signaling and limit the expression of IL-1β, which imparts immunity against the infection (48). Additionally, the downregulation of major pathways associated with T cells function like T-cell antigen receptor signaling, leukocyte differentiation, leukocyte activation, T cell activation, T cell differentiation, T cell receptor signaling, alpha-beta T cell activation, NF-kappa B signaling, and antigen receptor-mediated signaling pathway were noted in PTB samples.
Majority of the DEGs contributing to the immune cell type "CD8 + T cells" were clearly downregulated in the PTB samples indicating their potential roles in defense against TB. These cells are also known as killer or cytotoxic T lymphocytes, as they potentially destroy the infected cells by recruiting cytokines and other immune cells to the site of infection. The low abundance of blood CD8 + T lymphocytes may impair the effective immunity against pathogens, as they lack a sufficient cytotoxic T cells to recognize the MHC class I-restricted epitopes of MTB antigens, in the site of infection (49). A recent RNA transcriptome study used the positron emission tomography (PET) data collected from recovered patients with TB, at 4th and 24th weeks has also reported that genes associated with the overexpression of B cells and down expression of T cells and platelets confirms our findings (50). Thus, the downregulation that contribute to immune cell type is concordant with the pathway enrichment analysis findings of lower expression of T cell-related ontologies and pathways.
The druggability potential of any protein is attributed to its binding specificity with small compounds following Lipinski's rule-of-five for drug likeliness (51). Numerous bioinformatic and empirical methods which consume less time and provide faster prescreening of druggability of candidate proteins than conventional methods have been developed (52,53). There are a variety of computational methods available, which can predict druggability and protein-binding sites by using energy dynamics to geometrical topological estimations, and from flexible to rigid proteins (54,55).
By applying druggability and co-expression features we identified 9 CD8 + T cells associated genes (ITK, CD2, CD6, CD247, ZAP70, CD3D, SH2D1A, CD3E, and IL7R) as potential therapeutic targets of PTB. However, it is pivotal to carefully prioritize the drug molecules based on their mode of action, whether activator or inhibitor based on the gene expression status. For example, over-expressed genes can be targeted by inhibitory molecules, and downregulated genes can be targeted by activator molecules (56). From the above 9 genes, the therapeutic potential of ITK and IL7R has been characterized by experimental methods. ITK is a tyrosine kinase expressed on T-cells, which regulates its T-cell development and function. Human lungs with ITK deficiency impair early protection against MTB in vivo (57). Improving ITK signaling pathways could become an alternative approach for combating MTB infection. One study reveals the role of IL7R on T-cell immunity in human TB (58). The authors reported that patients with TB had lower IL7R concentrations and lower IL7R expression in T cells than healthy controls, indicating that patients with TB have impaired T-cell sensitivity. In addition, due to posttranscriptional processes, patients with TB had reduced amounts of IL7R in T cells. In vitro experiments revealed that MTBspecific T lymphocytes from patients with TB have reduced IL-7-induced STAT5 phosphorylation and IL-7-promoted cytokine production (59). The role of the remaining 7 genes (CD2, CD6, CD247, ZAP70, CD3D, SH2D1A, and CD3E) in T-cell signaling and modulation of host immune responses in mycobacterium infections is also supported (60)(61)(62)(63)(64).
Our results highlight the dysregulation of CD8 + T cells and the associated genes in PTB patients. These findings are exciting not just from the fact that CD8 + T cell-associated genes have the potential to act as potential therapeutic targets but prove that their role is not less important than CD4 + T cells in controlling MTB infection. We acknowledge that our strategy has some technical constraints. CIBERSORT was a convenient computational tool for determining infiltrating immune cell fractions, but it was still less precise than immunohistochemistry or flow cytometry, which could lead to inaccuracies in immune cell fractions. However, to overcome this limitation to some extent, we have linked gene expression profiles of immune signatures followed by functional enrichment, semantic similarity, druggability, and co-expression among the identified key signatures.

CONCLUSION
In this study, by coupling computational deconvolution algorithms and high throughput blood transcriptomics data, we identified the difference in T-cell-related immune cell populations among patients with PTB. The functional enrichment of 624 DEGs (393 over-expressed and 231 underexpressed) identified in the blood transcriptome of PTB patients revealed the major dysregulation of T cell-related ontologies and pathways. By linking DEGs against immune cell populations and TB gene signature, this study identified 9 CD8 + T cells associated genes (ITK, CD2, CD6, CD247, ZAP70, CD3D, SH2D1A, CD3E, and IL7R) as potential therapeutic targets of PTB. The expression levels of these 9 genes in MTB infection in cell lines were assessed by RT-PCR-based expression assay, confirming the experimental validation. However, further in vitro and in vivo studies are needed to establish the role of these genes in PTB infection, progression, and treatment.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://www.ncbi.nlm.nih.gov/geo; GSE83456.