Identification and Classification of Differentially Expressed Genes and Network Meta-Analysis Reveals Potential Molecular Signatures Associated With Tuberculosis

Tuberculosis (TB) is one of deadly transmissible disease that causes death worldwide; however, only 10% of people infected with Mycobacterium tuberculosis develop disease, indicating that host genetic factors may play key role in determining susceptibility to TB disease. In this way, the analysis of gene expression profiling of TB infected individuals can give us a snapshot of actively expressed genes and transcripts under various conditions. In the present study, we have analyzed microarray data set and compared the gene expression profiles of patients with different datasets of healthy control, latent infection, and active TB. We observed the transition of genes from normal condition to different stages of the TB and identified and annotated those genes/pathways/processes that have important roles in TB disease during its cyclic interventions in the human body. We identified 488 genes that were differentially expressed at various stages of TB and allocated to pathways and gene set enrichment analysis. These pathways as well as GSEA’s importance were evaluated according to the number of DEGs presents in both. In addition, we studied the gene regulatory networks that may help to further understand the molecular mechanism of immune response against the TB infection and provide us a new angle for future biomarker and therapeutic targets. In this study, we identified 26 leading hubs which are deeply rooted from top to bottom in the gene regulatory network and work as the backbone of the network. These leading hubs contains 31 key regulator genes, of which 14 genes were up-regulated and 17 genes were down-regulated. The proposed approach is based on gene-expression profiling, and network analysis approaches predict some unknown TB-associated genes, which can be considered (or can be tested) as reliable candidates for further (in vivo/in vitro) studies.


INTRODUCTION
Tuberculosis (TB) is a communicable disease generally caused by the bacterium Mycobacterium tuberculosis (MTB). The bacteria typically affects the lungs (pulmonary TB), but other body parts can be also affected (extra pulmonary TB) (Ahmad, 2011). The disease is communicable and spread through the air by expelling out the active MTB while coughing and sneezing (Schnappinger et al., 2003). In 2016, 10.4 million individuals were infected with TB, and 1.7 million died from the disease (including 0.4 million among people with HIV i.e., 40% of HIV deaths were due to TB). TB kills more adults in India than any other infectious disease (In 2016, an estimated 28 lakh cases occurred, and 4.5 lakh people died due to TB). India has the highest burden of both TB and advanced TB (like MDR TB) and second highest of HIV-associated TB. In India, the major challenge to curb the TB is poor primary healthcare system in rural areas due to deregulation of private health care leading to indiscriminate use of first-and second-line TB drugs, poverty, spreading HIV infection, and lack of administrative coordination among government functionary bodies. In our current study, we used meta-analysis of individual raw microarray data (GSE series) deposited in the GEO database which are obtained from various blood cell types (macrophages, monocytes, and CD4+) and cell lines (THP1) of individuals with different datasets (e.g., controls vs. TB disease, control vs. latent TB, latent TB vs. TB disease). We have performed gene-transition study of differentially expressed genes (DEGs) data and text mining between different stages of TB and classified the DEGs in stage-specific manner like normal to latent TB infections, normal to active TB, and latent infection to active TB. It is widely thought that, to know that function of a gene, it must be analyzed in the context of gene interaction network, because gene networks are commonly interpreted as encoding functional information in their connections. So, our study usually focused on the "guilt-by-association (GBA)" presumption which state that physically and functionally linked genes are possibly participating in the same biological pathways having comparable effects on the phenotypes . The concept of network theory is an imperative method to know the topological properties and the complex-system dynamics that correspond to their functional modules. The complex networks may be classified into four types of networks: (a) scale-free network, (b) small-world network, (c) random network, and (d) hierarchical network. For the biologist, hierarchical network has special interest because it incorporates the mien of modules, and distributed hubs (sparsely) regulate the network. So, we constructed the gene regulatory network and then analyzed its topological properties, because it helps to understand the structure of a network which facilitates in understanding the hidden mechanisms. Further, we identified important network modules which contain fundamental key regulators that have a fundamental importance.

MATERIAL AND METHOD Inclusion Criteria for Differentially Expressed Genes
A set of 11 microarray data sets were selected from the NCBI GEO repository database (http://www.ncbi.nlm.nih.gov/geo/). The datasets include GSE54992 (Cai et al., 2014), GSE52819 (Verway et al., 2013), GSE64335 (Bouttier et al., 2016), GSE11199 (Thuong et al., 2008), GSE98750 (Lavalett et al., 2017), GSE78233 (Cheng et al., 2017), GSE57736 (Guerra-Laso et al., 2015), GSE16250 (Reyes et al., 2015), GSE27882 (Liu et al., 2012), GSE57028 (Salamon et al., 2014), and GSE83456 (Blankley et al., 2016). The background data correction and data normalization were done by the robust multiarray average (RMA) in R affy and lumi packages to ensure unbiased and dysregulated gene expression data. The RMA method, which performs quantile normalization, was used to minimize inconsistencies due to normalization of the each Affymetrix GSE series. The RMA method was chosen over others due to its fine differential change detection and stable variance on a log scale and minimizes the false positive results. The specificity and sensitivity of RMA method are good during fold-change calculation to identify DEGs. Similarly, we have used lumi pipeline (Bioconductor package) which is exclusively developed to analyze Illumina data (BeadChip). It checks the data quality and data normalization and stabilizes the data variance. The gene expression data (GSE57736 and GSE27882) which is generated by Agilent platforms were already present as normalized data. In the present study, we used linear model for microarray analysis (LIMMA) package which is a highly recommended method to measure the differential expression of genes. It not only calculates simple t-test but also calculates moderate t-test and f-test by applying the Empirical Bayes approach and reducing the standard errors and gives us steady and reproducible outcomes even with a less quantity of arrays. So, limma package (Smyth, 2004) was used to identify the DEGs among the healthy control, latent TB, and active TB. We set the criteria to select significant differentially expressed genes i.e., the adjusted p-value was <0.05, and the fold change was >1.5. We have used the BRCW [Bioinformatics & Research Computing website (http://jura.wi.mit.edu/bioc/tools/compare.php)] to select the DEGs which is common in at least two datasets of gene expression profile. Doing this, we became more specific in the selection of DEGs and the chances of biased data compilation became negligible. According to the corresponding correlation between the probe and gene from the data, the probe numbers of the expression profile were changed into the corresponding gene symbols using the Synergizer database (Huang et al., 2009).

Gene Classification, Ontology, and Pathway Analysis of Degs
To know the significance of the identified DEGs, we have categorized them by GO-molecular function, GO-biological process, and protein class using the Protein ANalysis THrough Evolutionary Relationships (PANTHER v.13.0) Classification System and analysis tools and DAVID database (https://david.ncifcrf.gov/) to enrich the given set of DEGs to possible GO terms (Mi et al., 2013) (Huang et al., 2009). The PANTHER overrepresentation study (Fisher's exact with FDR multiple test correction) was used to search the data against the PANTHER, and GO databases and p-values were set according to Bonferroni correction.
The GO analysis (gene ontology) is the useful method for annotation of genes and its products and characterization of biological attributes for high-throughput genome or transcriptome data (Ashburner et al., 2000). The differentially expressed genes among "active-TB" and "latent TB" on compared with "normal condition" were over-represented in various GO classes. The gene ontology provides and visualizes us the basic terms subdivided into three important categories, namely, BP (Biological process), MF (molecular functions), and biological pathways among these groups.

Gene Transition Among Different Stages of TB
In order to get the behavior of normal gene expression perturbation, we tried a normal way for finding the associated genes while moving from one stage to another. In all the transitions that we took into consideration have been provided by a list of up-and down-regulated genes. These gene(s) in both the cases (i.e., stage from which is transferred to the targeted stage) have its own meaning. This meaning to a gene(s) regulation gives us an opportunity to say something about the ongoing mechanobiology inside the cell. So, to observe these transitions, we framed our study in such a way (based on the data we've got) which considers every possible transition in view. These transitions are discussed in brief as follows. In this study, we made comparison of the geneexpression profiles among individuals with normal conditions, latent infection, and active TB. Thus, we observed the expression of genes from normal to different stages of the TB and try to arrest those genes which play a key role in susceptibility to TB.
• Normal to Latent Infections: In this section, we have taken those DEGs which are involved in between normal and latent TB conditions. In LTBI, the bacteria remain in inactive form for many years (years to decades) before transforming into TB disease. In this study, we have studied several BPs and important pathways that lead to the latent infection for identification of latently infected individuals. • Normal to Active TB: In this section, we have taken those DEGs which are differentially expressed in TB disease condition as compared to healthy control and recognized those immune process and pathways which are prominent in TB disease. • Latent Infection to Active TB: In this section, we have taken those DEGs which are involved in between latent TB and active TB diseases. The individual with LTBI eventually reactivates and becomes infectious, seriously influencing epidemiological situation. Mechanisms of MTB transition to dormancy and TB reactivation are inadequately understood, and biomarkers of latency remain largely mysterious (Kondratieva et al., 2014).

Topological Characterization of Networks
The classified genes of various stages of TB were used to construct their regulatory networks. The networks were built by STRING database (https://string-db.org/) and then visualized in Cytoscape v3.4 (Shannon et al., 2003). As the network was built, the first and foremost basic analysis are its topological properties. Topological analysis helps to understand the structure of a network which facilitates in understanding the hidden mechanisms. The following networks properties were analyzed to seek the important behaviors of the network: • Degree distribution: The degree (k) of a node (gene) in a biological network is the number of links with other nodes, the probability distribution of these degrees called degree distribution (P[k]).
where n k =no. of nodes with degree k. N= the total number of nodes in the network.

• Neighborhood connectivity:
A node (gene) has number of neighbors and is considered as a connectivity of node. So, neighborhood connectivity (C N [K]) is: where P q k       = the conditional probability.
• Clustering co-efficient: The ratio of number of edges (e i ) between the node's neighbor or maximum numbers of edges that could possibly exist between the nodes. So, the total network cluster coefficient is the average of cluster coefficient of all nodes (i th ) in the network. • Betweenness centrality: In the complex network, a node's betweenness centrality (CB) represents the prominence of information flow through one node to another via shortest path (Brandes, 2001) (Mason and Verwoerd, 2007). From node (i) to node (j), the geodesic paths are shown by "dij(v)" passing via node "v" and "dij. " • Closeness centrality: In the network, how quick info is circulated from one node (i) to another (j) is measured by closeness centrality (C C ). Closeness centrality is given by: • Eigenvector centrality: In the network, eigenvector centrality of a node "i (C E [i])" is proportionate to the total of i's neighbor centralities (Bonacich, 1987).

Community Analysis: Leading Eigenvector Method
In hierarchical network, to distinguish the nature of modular and its properties is important to explain the activities of network at various levels of hierarchy. In our study, the leading eigenvector method (LEV) (Newman, 2006a;Newman, 2006b) was used to detect the communities in R from package "igraph" (Gabor and Nepusz, 2006) (the community detection script is given Supplementary Data 3).
• Modularity: Modularity determines to measure the strength of division of a network into clusters or communities (Newman and Girvan, 2004).
where m = Total no. of edges. A ij = adjacency matrix of size "i × j. " k = degrees. δ = function yields 1 if nodes "i" and "j" are in the same community.

Genes Tracing
To access the regulation of network, we first tried to find out the most influential nodes (genes) within the network. The gene tracing (up to motif level) was done purely on the appearance of the respective genes in various submodules obtained from the clustering. Then, these genes were used to get the picture of changes in the network organization in their absence.

Hub Gene Knockout
To know the changes of organization within the complex network in the absence of most influencing gene (node), we must remove constructed leading hubs (rich clubs) in the networks. We consecutively eliminated all the important hubs (one by one) from each network and measured the topological properties of this reorganized network to observe the regulating abilities of these hubs by calculating the degree of structural change due to their absence.

LCP-DP Approach to Estimate the Network Compactness
The LCP-decomposition-plot (LCP-DP) provides a way to characterize the topological properties of the network in 2D space of common neighbors (CN) index of connecting nodes and local community links (LCL) of each pair of interacting nodes in the network (Cannistraci et al., 2013). The LCP correlation defined by:
(Note: MATLAB script is given in Supplementary Data 2).

Hamiltonian Energy Estimation: Energy Distribution in the Network
The HE is used to organize a network at a certain level by following the formalism of constant Potts model (Traag et al., 2011;Traag, 2013). The energy distribution at the global and modular levels of the network is given by HE. HE of a network can be measured by: where e c = no. of edges. n c = no. nodes. "c" and "γ" = the resolution parameter

Differentially Expressed Genes (DEGs) Classification and Overrepresentation Analysis
A total of 5, 680 differentially expressed genes (DEGs) were identified after the extensive analysis of all the 11 GSE series, of which 2,660 were up-regulated, and 3,020 were down-regulated genes. These differentially expressed genes were clustered according to "GO-MF (molecular function), " "GO-BP (biological process), " and "PANTHER protein class" shown in Figure 1. All these predictive DEGs showed a broad range of protein classes which involved in various processes. The helicases and nucleases were found within the "nucleic acid binding" protein class. The "enzyme modulator" category features kinase, G-protein, phosphatase, and protease modulators. The structural motif and nuclear hormone receptors are part of the "transcription factor" protein class. The "hydrolases" is a sub-category of proteases and phosphatases. The "receptor" protein class includes cytokine receptors, protein-kinase receptors, ligand-gated ion channels, nuclear-hormone receptors, and G-protein-coupled receptors. Besides of these above protein classes, signaling molecules, transferase, oxidoreductase, and November 2019 | Volume 10 | Article 932 Frontiers in Genetics | www.frontiersin.org transporter are also most abundant protein classes. The two most abundant GO-biological process groups-"metabolic process" and "cellular process" which are not astonishing as these processes carry those genes which involved in the most basic life processes. The "cellular process" includes cell cycle, cell-cell signaling, cell component movement, cytokinesis, and proliferation. The "metabolic process" section includes metabolism of lipid, protein, carbohydrate, cellular amino acid, and nucleobase-containing compound metabolism. The "biological regulation" includes metabolism, cell cycle, the regulation of apoptosis, catalytic activity, translation, and homeostasis. Further, these biological process responses to stimulus, localization, and developmental process also represent the significant number of proteins; the complete categories details are given in Supplementary Data 1.
To know the probability of largely occupied protein classes and GO categories among the differentially expressed genes, we used PANTHER's overrepresentation analysis. When we compared with the reference genome, we found that any of the richest classes are overrepresented in the data ( Table 1). The most abundant protein classes "chemokine, " "cytokine, " "hydrolase, " "ribosomal protein, " and "RNA-binding protein" were enriched along with the classes "signaling molecule" and "cell adhesion molecule. " The highly populated GO-biological processes were enriched in "cytokinemediated signaling pathway, " "response to external stimulus, " "immune response, " "locomotion, " "signal transduction, " "cell communication, " "developmental process, " "cellular process, " and "MAPK cascade. " Similarly, the most abundant GO-molecular functions were enriched in "chemokine activity, " "cytokine activity, " "cytokine receptor binding, " "oxidoreductase activity, " and "receptor binding. "

Gene-Transition and GO Enrichment Analysis
All the 5,680 differentially expressed genes (DEGs) were divided into three different groups, normal vs. latent infection, normal vs. active TB, and latent infection vs. active TB. We filtered out a total number of 488 DEGs which are listed in (Table 2), and then, we also identified few genes which are commonly differentially expressed among various stages of TB (including NC vs. LI, NC vs. ATB, and LI vs. ATB) shown in Figure 2. Further, we isolated up-and downregulated genes from each stage and performed GO-enrichment analysis using DAVID tool (v6.7). It is well preferred annotation database which used novel algorithms to extract biological information from large gene lists. The DEGs are enriched with a certain biological process or molecular function that are given in Table 3. Further, the most enriched pathways associated with these differentially expressed genes were identified, given in Table 4  are enriched in "immune response, " "inflammatory response, " "chemokine activity, " "cellular process, " "biological regulation, " "regulation of metabolic process, " "protein binding, " "catalytic activity, " "bindings" etc. On the pathway analysis, most of the DEGs were found to be down-regulated (beneficial to pathogen) and enriched in very important pathways like toll-like receptors, NF-kappaB signaling, cytokine-cytokine receptor interaction, MAPK signaling pathway, TB, and TNF signaling. • Normal to Active TB diseases: A total of 266 differentially expressed genes (DEGs) were identified, of which 149 were up-regulated, and 117 were down-regulated among the various cases. In this condition, the DEGs were enriched in "inflammatory response, " "immune response, " "signal transduction, " "response to stimulus, " "apoptotic process, " "cellular process, " "metabolic process, " "binding, " "catalytic activity, " "receptor activity" etc. For the up-regulated genes, the most abundant pathways were "cytokine-cytokine receptor interaction, " "chemokine signaling, " "toll-like receptor signaling pathway, " "NF-kappa B signaling pathway, " "transcriptional misregulation in cancer, " and "pathways in cancer. " Similarly, for down-regulated genes, the most enriched pathways were "cell cycle, " "chemokine signaling pathway, " "NF-kappa B signaling pathway, " "TNF    signaling pathway, " "PI3K-Akt signaling pathway, " "metabolic pathways, " and "MAPK signaling pathway. " • Latent infection to Active TB Disease:: A total of 127 differentially expressed genes (DEGs) were found to be up-regulated, and 69 were down-regulated. These differentially expressed genes were enriched in "binding, " "catalytic activity, " "receptor activity, " "transporter activity, " "cellular and metabolic process, " "biological regulation, " "immune response, " and "oxidoreductase activity" including others. The most abundant enriched pathway for up-regulated genes belonged to "metabolic pathway, " "NOD-like receptor signaling pathway, " "regulation of actin cytoskeleton, " "biosynthesis of antibiotics, " "metabolism of xenobiotics by cytochrome p450, " "thyroid hormone synthesis, " "regulation of autophagy, " and "peroxisome, " and for down-regulated genes, the enriched pathways were "chemokine signaling pathway, " "TNF signaling pathway, " "RAP1 signaling pathway, " "PI3K_AKT signaling, " "metabolic process, " "apoptotic, " "focal adhesion, " "MAPK signaling pathway, " "microRNAs in cancer, " and "amoebiases. "

Gene Interaction: Hierarchical Scale-Free Network
The classified genes of various stages of TB ( Table 2) were used to construct their regulatory network. We constructed six networks for Up and Down-regulated genes separately. The topological parameters of the networks obey power law distributions. The probability of clustering co-efficient C(k), degree distributions P(k), and neighborhood connectivity C N (k) exhibits fractal nature as shown in Figure 3. The results for all the networks are summarized as follows: Normal to latent Inf N ormal to active TB .   (Ravasz and Barabási, 2003). The power law fits on the data points of the network's topological parameters were done and confirmed by following the standard statistical fitting method given by Clauset et al. (Clauset et al., 2009). The p values for all data sets were calculated (against 2,500 random samplings) and found to be greater than 0.1, and data fitting goodness was less than 0.1. The values of P(k) and C(k) were negative, which implies that the network follows a hierarchical pattern, and C N (k) was positive, which means that the network follows the assortativity that identifies a huge cluster of degree-nodes (rich club), which regulates the network. The centrality parameters: betweenness (C B ), closeness (C C ), and eigenvector (C E ) centralities of the network also showed fractal behavior, and good connectivity of nodes in a network is distinguished by eigenvector or centrality C E (k). It calculates the effectiveness of the spreading (receiving) power of data of nodes from the network. These properties follow the power law behaviors as follows: -Normal to latent Inf N ormal toactive TB .

Identification of Key Regulators and Properties
Since the popularity of leading hubs gets changed according to the gene activities and its regulation, we can't say that all the predictive leading hubs are key regulators for disease progression, but some of these hubs can play a significant role, which we called them as fundamental key regulators (FKR). The structure of modular and its arrangement have been performed through Newman and Girvan's standard community finding algorithm at various levels of the organization. Using this community finding algorithm, we found that our six networks are hierarchically organized at various levels (Supplementary Data 2). The Hamiltonian energy (HE) and corresponding modularity (Q N ) are reduced as one goes from top to down organization of network (Figure 4).
We have calculated the probability to know the regulating ability of our 31 key regulators: The measured probability P x (y l ) of all the KR showed an increase in P x as the level the increases from top to bottom direction. At deeper level of the organization, regulation of FKR increases, and their activities become more prominent. Thus, these FKR becomes backbone of the network organization, stabilization, and active workers at grassroot level. The FKR are shown in Figures 5 and 6.
As the results, we identified a total of 31 FKR genes from various stages of TB. In normal to latent infections, we identified eight genes, of which three genes (FZD2, NDUFS8, NLRC4) were up-regulated, and another five genes (CCL4, IL1B, IL1A, TNF, AREG) were down-regulated. While in normal to active TB, we found 15 genes out of which eight genes (EIF2AK2, SAMD9L, IFI44L, DDX58, IFI44, HERC5, NFE2L3, LRRK2) were up-regulated, and seven genes (CSF3, MAP3K8, IRAK2, TNFAIP3, TRAF1, PLAUR, and CD44) were down-regulated, and similarly, for latent to active TB diseases, three genes (TST, MTHFD1, CARS2) were up-regulated, and five genes (ETV6, NFKB1, MET, DOCK4, PARD6G) were down-regulated. The functional interpretations of these 31 key regulators were annotated by GO-TermFinder (LAGO) (Boyle et al., 2004) using biological process terms at P-value 0.01; the results are interpreted in Figure 7. The gene specific pathways were identified by KEGG pathway database (Kanehisa et al., 2016), and the results are shown in Figure 8.

Local Perturbations of Network
The gene knockout experiment of all the hubs/motif from the parent networks may able to highlight the local perturbations driven by these individual hub or motif, and their effect on global network properties. It has been revealed that the network is tolerant to hub's deletion which implies that the important network elements are still remains after elimination of hubs at level 0 (parent network). One of main causes is that the P-P interactions networks are too dense to be broken into fragments by only removing hubs. However, the elimination of these hubs/motif from the complete network does cause significant variations in the network properties, where P(k) and C(k) change significantly in complete network level, whereas C N (k) changes slightly. Likewise, the variations in the exponents of centrality measurements have also showed significant changes, as shown in Figure 9. Since, it is clear from the differences in the exponents of topological parameters that network perturbation increases when it goes to deeper level (top to down direction). In our case, most of the perturbation increases after the 3rd level; at this level elimination of key regulator from network almost breaks down the submodules existing in the deeper levels; such type of behavior shows that local perturbation of network is highest at deeper levels.
The Local-Community-Paradigm: Evidence of Self-Organization The LCP architecture supports the quick transfer of data across several network modules and through local processing too. We have analyzed all the six networks to check self-organization behavior at different levels using LCP method. For different level, the calculated LCP-corr of all the modules or sub-modules are shown in Figures 10 and 11. The average values of LCPcorr (we ignored modules which having zero LCP-corr) were greater than 0.853 at each level. This shows that the network maintains compactness and self-organization and has efficient data processing. It characterizes robust LCP networks that are dynamic in nature and heterogeneous, which help in network re-organization and evolution.

DISCUSSION
TB disease remains to be one of the most significant infectious diseases and a leading cause of death worldwide. Currently, the main challenge is to develop a delicate and effective method to identify the latent TB infection (LTBI) because 90% of cases of latent infection with MTB do not show any symptoms/signs but have a 10% lifetime possibility of transforming into active TB. As we know, many events have happened in response to TB infection cycle. According to the cycle of infection by Young et al., 2008, the exposure of microorganism up to its development to latent stage, there are stringent amount of processes that have appeared in the host cell. The long fight of MTB to establish its generation by stabilizing its environment for metabolism has evolved with an ability to overcome immunity. In this scenario of transferring from initial infection stage to latent stage has a profound increase in the overall immune response outside the host cell and an increased intensity of basic cellular machine establishment process for the pathogen cell. In case, the elimination eliminates the eliminator even after the induction of T-cell but able to suppress the effect; pathogen cells (Pcells) have two choices, either develop army or attack the adaption of immunity. In order to decide to enter the resting phase (i.e., latent), MTB's survival instincts allow it to develop its metabolic environment by downregulating antigen presentation and the release of cytokines too (like IFNγ). On the other hand, if adaptive immunity has taken its chance, then MTB takes the defensive mode by subverting various normal cell cycle functioning inside the macrophages thus making a halt on the maturation of phagosomes.
To stop the disease epidemic, early diagnostic method or techniques are required. In this way, the gene expression profiling has uncovered the differences in the transcriptome among normal condition, latent infection, and active TB. These results not only revealed significant genetic biomarkers indicative of LTBI, TB-disease conditions but also recognized transcriptionally regulated genes that vary in biological functions. In the current study, we have identified differentially expressed genes (DEGs) from various stages of disease and found that most of the DEGs relative to normal vs. TB disease condition (including latent infection) are enriched in important FIGURE 7 | This heatmap shows the 31 hub genes with their involvement in various biological processes.
FIGURE 8 | The molecular pathways associated with our key regulators were identified by KEGG pathway database. Out of the 31 key regulators, we did not find any pathways information of five genes namely: SAMD9L, IFI44L, IFI44, HERC5, and NF2L3. pathway like-toll like receptor, NOD-like receptor signaling pathway, MAPK signaling pathway, TNF signaling, chemokine signaling pathway, PI3K-Akt signaling pathway, apoptosis etc. In fact, pathogen is identified by the receptors on the surface of immune cells, and toll-like receptors (TLRs) are one of them. Different TLRs including TLR2, 4, 9, and 8 play an important roles in TB infection (Faridgohar and Nikoueinejad, 2017). These receptors (toll-like, NOD-like etc.) are expressed irrespective of whether they participate in immune signaling or immunity. The interaction of MTB with these receptors (like TLRs) initiates an intercellular signaling cascade that culminates in a proinflammatory response (cytokines and chemokines that serve as a signal for infection). As we know that MAPK signaling pathway is important for transcription and non-transcription responses of the immune system, most of the pathogens during infection hijack the immune system by targeting the MAPK signaling pathway (Soares-Silva et al., 2016). The TNF signaling pathway is involved in the regulation of immune cells, and it can regulate immune responses (Kim, 2018). However, pathogen can target the TNF signaling pathway for its survival in the host cell. The PI3K-Akt signaling pathway plays an important role in apoptosis, autophagy, metabolism, cell growth, and differentiation. The expression of FoxP3 by inhibiting the activation of transcription factor Forkhead-O3a (Foxo1-3a) is negatively regulated by this pathway (Zhang et al., 2017). The FoxP3+Treg cell activation which will assist to set up a new target for the involvement of TB immunotherapy molecules as part of the immune-escape mechanism to provide a theoretical basis is inhibited by M. tuberculosis (Scott-Browne et al., 2007). On comparing between latent infection and TB disease, few additional pathways are involved, like regulation of actin cytoskeleton during MTB infection; to maintain the stability of the cytoskeleton, macrophages cells themselves are also trying to regulate cytoskeletal-associated proteins (Wang et al., 2015b). Thyroid hormones (hormones, T4 and T3) are produced by thyroid gland, which is essential for the regulation of metabolic processes throughout the body. The regulation of autophagy is important for host in response to invading mycobacteria; the host defense mechanism identifies pathogen motifs through innate receptors but also releases appropriate cytokines. The autophagic pathways are regulated by these innate signals by regulation of genes of this pathway during infection (Jo, 2013). Recently, a study reports that the enzyme Msm_ACTase (from bacteria) helps in scavenging increased amount of H 2 O 2 due to upregulation of genes involved in peroxisome pathway which gives an insight into a new idea as to how the MTB bacteria surpasses the host defense in MTB infection (Ganguli, 2015). Moreover, macrophages are the main effector cells responsible for killing pathogen (MTB) via different mechanisms, including apoptosis. But important genes in this pathway were down-regulated by MTB that slow down or stop the apoptotic process for its survival in host system (Behler et al., 2015). Most of the microRNAs that were found deregulated in cancer were affected by MTB in a similar way as its infestations, like deletion, mutation, and epigenetic silencing (Garzon et al., 2009). The apoptotic pathways are important to kill the infected macrophage. During MTB infection, the pathogen try to control the timing and mode of host cell death for its persistence and replication (Schaaf et al., 2017) In addition, we have built the networks which emphasize on genes that were regulated by network. The constructed network FIGURE 9 | The changes in the exponents of the three important topological parameters (P[k]), C(k), and C N [k]) due to hub genes knock-out experiment from parent network.
of classified genes from various stages of TB showed hierarchical nature, which indicated that the networks have system level organization including modules/sub-modules which are interconnected. Since the network's nature is hierarchical, its synchronization confirms several important functional regulations of the network, but individual gene activities are not so important. In our networks (including up-and downregulated genes), a total of 31 key regulators were identified by affecting motifs and module regulation, showing their biological importance and serve as the foundation of network activities and their regulations, and could be the most probable target gene for disease control. We performed extensive analysis of all the key regulators (31 genes) to find evidence of their association with TB or which are directly or indirectly involved in host immune response and confirmed by manually searching for evidence in the literatures as presented: • Down-regulated genes: We found that TNF and IL1 (A and B) are key mediators present in severe inflammatory diseases. However, both IL-1 and TNF receptor pathways are important for the control of MTB infection, and it is critical to assess the respective role of IL1A, IL1B, and TNF (Bourigault et al., 2013) (Cavalcanti et al., 2012). The expression of CCL4 is high in the late phase of the active TB disease, but there are low levels of expression during early infection (Rangel-Santiago et al., 2016). It has been currently reported that AREG plays an important role in orchestrating both host resistance and tolerance mechanisms. Although AREG is known as epithelial cell-derived factor, the recent studies showed that AREG can be expressed by multiple populations of activated immune cells in inflammatory conditions (Zaiss et al., 2015). The NFKB1 activates the immune cells by up-regulating the expression of various cytokines in the host immune system (Ghosh et al., 1998). The gene CD44 plays an important role in innate and adaptive immune responses, in the acute inflammatory response to both infectious and sterile stimuli. Also during infection, CD44 may influence host defense by affecting phagocytosis (van der Windt et al., 2010). In our present study, we found that NFKB1 and CD44 are down-regulated which means that MTB is burglarized and seizing the host immune system by slowing down the expression of NFKB1. It is well known that MET (hepatocyte growth factor receptor) regulates several functions of immune cells, including cytokine production, differentiation and maturation, cellular migration and adhesion, and T-cell effector function, and HGF exerts antiinflammatory activities through MET signaling (Sagi and Hieronymus, 2018). A group of researchers has elucidated that PLAUR domain containing Lypd8 that inhibits bacterial invasion of colonic epithelia (Okumura et al., 2016). The gene TRFA1 has an active or passive interaction with many tumor necrosis factor receptor (TNFR), and it has been also shown that TRAF1 plays as an antiapoptotic role in lymphoma cells by activation of NFKB (Wan et al., 2016). Further, the gene TNFAIP3 (A20) is a cytoplasmic zinc-finger protein which works as a negative-feedback regulator of NFKB activation (Vereecke et al., 2011). It has been shown that IRAK2 is an important factor and potential novel biomarkers for human antiviral innate immunity (Wang et al., 2015a). Also, MAP3K8 is a serine-threonine kinase and has a critical function in integrating host immune responses to complex pathogens (Mielke et al., 2009). Similarly, S. Srijata, et al. has suggested that AgNP attenuate the NF-κB pathway as indicated by the downregulation of NF-κB target genes CSF3 and subsequently inhibit MTB-induced proinflammatory responses (Sarkar et al., 2015). The downregulation of these genes has allowed the process of host defense to cease and give time to bacteria for progression. • Up-regulated genes: We found that NLRC4 gene is associated with inflammasome signaling. Its upregulation means inflammasome activation (play an important role in host defense against MTB) not only leads to cytokine secretion but may also cause pyroptosis (Ting et al., 2008) (Bergsbaken et al., 2009). The NDUFS8 genes also play an important role in host immunity by increasing the expression level (Lohman et al., 2017). The gene FZD2 (frizzled class receptor 2) is Wnt pathway component (Wnt signaling), and Wnt signaling pathway plays a key role at different stages of TB development (Villaseñor et al., 2017). It has been reported that LRRK2 is involved in the IFN-γ response and host response to pathogens (Gardet et al., 2010), and LRRK2 also inhibits the immune response transcription factor NFAT1 (Liu et al., 2011). The genes DDX58 interact with IRGM and promote its K63-linked polyubiquitination, indicating that IRGM is positioned at a nexus of various innate immunity (Chauhan et al., 2016) while SAMD9L and EIF2AK2 play key roles in the innate immune responses to multiple stimuli. The gene EIF2AK2 is also involved in the regulation of signal transduction, apoptosis, cell proliferation, and differentiation (Lemos de Matos et al., 2013) (Onomoto et al., 2012) (Reineke and Lloyd, 2015). It is well known that type-I IFN induces potent cellular defense against viral infection mediated by up-regulation of ISGs like IFI44, IFI44L, and HERC5 (Jeon et al., 2010;Power et al., 2015). In our study, the expression of IFI44 and HERC5 were found to be up-regulated that means host system tried to resist bacterial infection. We didn't find any literature evidence that supports the role of four genes (NFE2L3, TST, MTHFD1, and CARS2) and three genes (ETV6, DOCK4, and PARD6G) which are down-regulated and up-regulated respectively in host immune response against the pathogen, but most of these genes are involved in various types of cancers like ETV6 is involved in prostate cancer as well as colorectal cancer susceptibility (Fararjeh and Liu, 2019;Wang et al., 2016); mutation in DOCK4 can cause ovarian, prostate, glioma, and colorectal cancers (Sundaravel et al., 2015), and PARD6G suppresses cell proliferation and is targeted by lossof-function mutations in several cancers (Marques et al., 2016). Besides of these NFE2L3 (transcription factor) which involved in several cellular processes like inflammation, stress response, differentiation, and carcinogenesis (Kannan et al., 2015), MTHFD1 are key players in folate metabolism, which is essential for de novo purine synthesis, and several defects in this pathway have been associated with immunodeficiency (Keller et al., 2013), and CARS2 is associated with a severe progressive myoclonic epilepsy most resembling MERRF syndrome (Hallmann et al., 2014). The TST gene is a mitochondrial matrix enzyme (Pallini et al., 1991). It may play roles in cyanide detoxification, the formation of iron-sulfur proteins, and the modification of sulfur-containing enzymes.

CONCLUSION
In this study, we have compared gene-expression profiling of active TB patient and latent TB patients, as well as healthy controls. We have combined different data sets from multiple microarray experiment, and this combined effect size method for the gene expression analysis gave us more biologically consistent and conservative results. We have identified differentially expressed genes and their associated particular biological processes and pathways among healthy controls, active TB and LTBI. The gene interaction networks employed to obtain functional modules and also uncovered novel key regulators for active TB and LTBI which might play an important role in regulating the expression of multiple TB-related host factors. In conclusion, our study provides new insights into host genes and pathways important for TB infection. Experimental validation of these hypotheses would confirm the credibility of the key regulators and make easier the development of a cost-effective and sensitive molecular diagnostic platform for TB disease.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/gds AUTHOR CONTRIBUTIONS RI conceived the study design instructed on data analysis. AA curated data and performed statistical and network analyses. NI, MM and AF implemented key statistical tools. AA, NI, MA, ST, AF, and NT drafted the manuscript. All the authors read, edited and approved of the manuscript.