Protein and Microbial Biomarkers in Sputum Discern Acute and Latent Tuberculosis in Investigation of Pastoral Ethiopian Cohort

Differential diagnosis of tuberculosis (TB) and latent TB infection (LTBI) remains a public health priority in high TB burden countries. Pulmonary TB is diagnosed by sputum smear microscopy, chest X-rays, and PCR tests for distinct Mycobacterium tuberculosis (Mtb) genes. Clinical tests to diagnose LTBI rely on immune cell stimulation in blood plasma with TB-specific antigens followed by measurements of interferon-γ concentrations. The latter is an important cytokine for cellular immune responses against Mtb in infected lung tissues. Sputum smear microscopy and chest X-rays are not sufficiently sensitive while both PCR and interferon-γ release assays are expensive. Alternative biomarkers for the development of diagnostic tests to discern TB disease states are desirable. This study’s objective was to discover sputum diagnostic biomarker candidates from the analysis of samples from 161 human subjects including TB patients, individuals with LTBI, negative community controls (NCC) from the province South Omo, a pastoral region in Ethiopia. We analyzed 16S rRNA gene-based bacterial taxonomies and proteomic profiles. The sputum microbiota did not reveal statistically significant differences in α-diversity comparing the cohorts. The genus Mycobacterium, representing Mtb, was only identified for the TB group which also featured reduced abundance of the genus Rothia in comparison with the LTBI and NCC groups. Rothia is a respiratory tract commensal and may be sensitive to the inflammatory milieu generated by infection with Mtb. Proteomic data supported innate immune responses against the pathogen in subjects with pulmonary TB. Ferritin, an iron storage protein released by damaged host cells, was markedly increased in abundance in TB sputum compared to the LTBI and NCC groups, along with the α-1-acid glycoproteins ORM1 and ORM2. These proteins are acute phase reactants and inhibit excessive neutrophil activation. Proteomic data highlight the effector roles of neutrophils in the anti-Mtb response which was not observed for LTBI cases. Less abundant in the sputum of the LTBI group, compared to the NCC group, were two immunomodulatory proteins, mitochondrial TSPO and the extracellular ribonuclease T2. If validated, these proteins are of interest as new biomarkers for diagnosis of LTBI.

Differential diagnosis of tuberculosis (TB) and latent TB infection (LTBI) remains a public health priority in high TB burden countries. Pulmonary TB is diagnosed by sputum smear microscopy, chest X-rays, and PCR tests for distinct Mycobacterium tuberculosis (Mtb) genes. Clinical tests to diagnose LTBI rely on immune cell stimulation in blood plasma with TB-specific antigens followed by measurements of interferon-g concentrations. The latter is an important cytokine for cellular immune responses against Mtb in infected lung tissues. Sputum smear microscopy and chest X-rays are not sufficiently sensitive while both PCR and interferon-g release assays are expensive. Alternative biomarkers for the development of diagnostic tests to discern TB disease states are desirable. This study's objective was to discover sputum diagnostic biomarker candidates from the analysis of samples from 161 human subjects including TB patients, individuals with LTBI, negative community controls (NCC) from the province South Omo, a pastoral region in Ethiopia. We analyzed 16S rRNA gene-based bacterial taxonomies and proteomic profiles. The sputum microbiota did not reveal statistically significant differences in a-diversity comparing the cohorts. The genus Mycobacterium, representing Mtb, was only identified for the TB group which also featured reduced abundance of the genus Rothia in comparison with the LTBI and NCC groups. Rothia is a respiratory tract commensal and may be sensitive to the inflammatory milieu generated by infection with Mtb. Proteomic data supported innate immune responses against the pathogen in subjects with pulmonary TB. Ferritin, an iron storage protein released by damaged host cells, was markedly increased in abundance in TB sputum compared to the LTBI and NCC groups, along with the a-1-acid glycoproteins ORM1 and ORM2. These proteins are acute phase reactants and inhibit excessive neutrophil activation. Proteomic data highlight the effector roles of neutrophils in the anti-Mtb response which was not observed for LTBI cases. Less abundant in the sputum of the LTBI group, compared to the NCC group, were two INTRODUCTION Diagnosis of active tuberculosis (TB) and latent TB infection (LTBI) is important to control the spread of Mycobacterium tuberculosis (Mtb), a persistent and increasingly multi-antibiotic drug resistant (MDR) pathogen. TB remains an urgent public health issue in 30 high disease burden countries (World Health Organization, 2015). Eight to 12 million Mtb infection cases per year advance to stages with clinical symptoms; 90% of all cases do not cause symptoms (Raviglione et al., 1995). The primary manifestation is pulmonary TB (PTB). Extrapulmonary TB affecting other organs is less prevalent (Smith, 2003). The severity of TB is influenced by intrinsic (genetic) factors (Bellamy et al., 1998;Bellamy et al., 1999;Soborg et al., 2003;Fernando and Britton, 2006) and extrinsic (environmental) factors, such as nutrition and the immunological status of the host. Helminth infections, and particularly HIV/AIDS are comorbidities prevalent in African countries that increase risk of progression to symptomatic TB (Smith, 2003;Potian et al., 2011). The roles of distinct oral and alveolar niche microbiomes as modulators of disease outcome are less well understood (Wu et al., 2013;Adami and Cervantes, 2015). Pertaining to PTB, transmission occurs through inhalation of aerosolized bacteria in droplets.
Certain Mtb strains are more transmissible than others (Dye and Williams, 2010), but the reasons for differences in the severity of patient outcomes appear to be linked to their immune responses. Mtb invades and manipulates the function of alveolar macrophages, resulting in the formation of granulomas that consist of differentiated epithelioid and multinucleated macrophages, dendritic cells, CD4 and CD8 Tcells as well as B-cells at the infection sites (Tufariello et al., 2003). Key cytokines in the immune response against Mtb in the lungs are interferon-g (IFN-g) and tumor necrosis factor-a (TNF-a). While IFN-g is mostly produced by T-cells, TNF-a is released by T-cells and macrophages (Tufariello et al., 2003). Neutrophils are recruited to the infection site and contribute to local inflammation and pathogen containment. In the chronic phase of TB, the balance between Th1 and Th17 responses appears to control bacterial growth and limits the immunopathology. The Th1/Th17 balance was demonstrated in murine TB models while it has not yet been verified for the adaptive immune response to TB in human patients (Torrado and Cooper, 2010). Mycobacterial cells largely reside in the macrophages of granulomas. The latter act as barriers to bacterial spread to other regions of the lungs. Individuals with delayed-type hypersensitivity responses to TB antigens such as ESAT-6 and CFP10, lacking symptoms, are deemed to be latently infected. While not contagious (Tufariello et al., 2003), they are susceptible to disease activation at a later timepoint. An IFN-g release assay (IGRA) that relies on the stimulation of immune cells in blood plasma with TB antigens followed by the quantitative measurement of secreted IFN-g is currently the gold standard of LTBI diagnosis (Bastian et al., 2017). Using cytokine antibody arrays, we surveyed the blood plasma of TB, LTBI, and negative community controls (NCC), from the same cohort under study here, and identified RANTES as a potential plasma biomarker of LTBI. This chemokine was differentially abundant with statistical significance (LTBI vs TB), with and without prior TB antigen stimulation (Teklu et al., 2018a).
Sputum is a great source of TB biomarkers due to the low health risk associated with sampling and the proximity to the pulmonary infection site. Widely used, but slow or not sufficiently sensitive diagnostic TB tests are acid-fast bacilli (AFB) smear microscopy and the isolation of Mtb strains from sputum culture (Ryu, 2015). The PCR-based test Xpert MTB/RIF assay (Cepheid, USA) has higher sensitivity for Mtb detection in smear-positive than in smear-negative patients and also detects rifampicin resistance allowing assessment of the antibiotic treatment options (Ryu, 2015). Sputum is also a source of biomarkers that measure inflammation of the lungs in association with other diseases, such as cystic fibrosis (Sagel et al., 2007). Such biomarkers may not be specific for a single respiratory pathogen, irritant, or intrinsic factor. Quantitative changes of metabolites and proteins have been evaluated in clinical samples for TB diagnosis, the latency state, disease progression, and anti-TB treatment responses. Miranda et al. detected sustained levels of ferritin and C-reactive protein (CRP) in patients who remained Mtb culture-positive during antibiotic treatments in a Brazilian cohort. These biomarkers indicated persistence of lung inflammation as compared to those who proceeded to a culture-negative state (Miranda et al., 2017). Gopal et al. identified calcium-mobilizing calprotectins as mediators of neutrophil-linked inflammation in granulomapositive PTB patients and suggested that silencing their activities may attenuate patient symptoms (Gopal et al., 2013). Goletti et al. highlighted the need of developing predictive biomarkers as surrogate endpoints in clinical trials for new investigational anti-TB drugs (Goletti et al., 2016). A neutrophil-driven blood transcriptional signature induced by IFN-g was identified for active TB (Berry et al., 2010) consistent with neutrophil infiltration of the patients' infected lungs. Jiang et al. demonstrated reduced expression of CD27 in TB antigen-targeting CD4 T cells during persistent infection and commented on the role of this T-cell surface biomarker for diagnosis of chronic TB (Jiang et al., 2010). Of particular interest are biomarkers that discern LTBI from healthy controls since there is a risk of pathogen reactivation. One study revealed proteomic fingerprints in TB antigen-stimulated plasma of patients and described greater than 82% specificity and 89% sensitivity distinguishing TB from LTBI (Sandhu et al., 2012). A proteomic approach was reported to discern LTBI from healthy controls at greater than 85% specificity and sensitivity using nonstimulated plasma samples (Zhang et al., 2014). Protein biomarkers suggesting protective effects of vaccines are of high clinical interest.  to be involved in protective immunity against a virulent Mtb strain using a mouse model of lung infection. Furthermore, a role of IL-17 in driving murine Th1 cell responses upon BCG vaccination was suggested (Gopal et al., 2012;Gopal et al., 2014).
The impact of oral and respiratory tract microbiomes on human health are in an exploratory phase and generally difficult to unravel due to the complexity of microbiome-host relationships (Dewhirst et al., 2010). Distinct respiratory tractcolonizing bacterial taxa may correlate with, cause susceptibility to, or mediate protection from lung diseases including TB. In the context of asthma in infants, such a causal relationship was indeed identified (Teo et al., 2015). Studies of the oral and respiratory tract microbiomes for TB patients have been limited to date, as reviewed by Hong et al. (2016) and Naidoo et al. (2019). Here, our objectives were to investigate an interesting multi-ethnic cohort of Ethiopian pastoralists (such groups are generally neglected in basic research studies due to both geographic remoteness and poverty), to identify quantitative differences in the proteomes and microbiomes of sputum using high-dimensional analysis methods for three cohorts (TB, LTBI, NCC), and to gain additional insights into the host-Mtb-microbiome crosstalk. The analyses could result in novel biomarker candidates for TB and LTBI.

Human Subjects and TB Diagnostic Approaches
Human subject recruitment occurred in a multi-ethnic southern Ethiopian region (South Omo province). Covering eight districts, a study named "Systems Biology for Molecular Analysis of Tuberculosis in Ethiopia" resulted in informed consent of~2,000 individuals aged 16 years or older at local or regional clinics. Parental consent was obtained for children ranging in age from 12 to 15 years. Some individuals had symptoms consistent with TB. Negative community controls (NCC) often were household contacts of those suspected to be infected with TB. Nearly 1,200 sputum and blood samples were collected. Of those, nearly 13% were positive based on clinical signs and symptoms, AFB smear microscopy, and/or isolation of Mycobacterium tuberculosis complex (MTBC) strains on Lowenstein Jensen medium (Wondale et al., 2018). Nearly 500 participants were tested for LTBI employing IFN-g assays; 50.5% of them were positive for LTBI (Teklu et al., 2018b). Subjects positive for PTB were offered DOTS treatments at a clinic in proximity to their residences (World Health Organization, 1997). Information on the geographic and socio-economic characteristics, ethnic affiliations, exclusion from participation, and other medical data were reported recently (Wondale et al., 2018;Teklu et al., 2018b), a subset of which are included in Supplementary File S1.  (ref. no. 2014-200) approved the human subject protocol. Subjects not diagnosed with TB donated blood samples to perform IRGAs, allowing separation of the LTBI group from healthy individuals based on QuantiFERON-TB Gold In-Tube test (GFT-GIT) data. We set the positivity threshold for LTBI at the recommended value of ≥0.35 IU/ml (Teklu et al., 2018b).

Sputum Sample Collection and Processing
Study participants were asked to cough up sputum. It was collected in pre-labeled cups. Where possible, subjects completed three cycles of sputum expectoration using 3-5% hypertonic sodium chloride. The single-timepoint specimens were combined and processed as described previously (HaileMariam et al., 2018). Briefly, collected sputum aliquots for proteome and microbiome analysis were disinfected with SDS buffer (1% SDS, 10 mM Na-EDTA, 50 mM DTT, 0.03% Tween-20, 50 mM Tris/Tris−HCl, pH 8.0) and centrifuged at 16,000 × g to recover the supernatants. SDS-denatured and heattreated lysates (85°C) were shipped to the site of "omics" analyses (J. Craig Venter Institute) and stored long-term at −80°C prior to analyte extraction. Protein samples were run in SDS-PAGE gels (4-12%T) to estimate total protein concentrations. Aliquots of 150 µg protein extract were subjected to S-Trap Ultra-Fast sample preparation (HaileMariam et al., 2018), digesting proteins with trypsin at a 50:1 mass ratio. Peptides were desalted prior to LC-MS/MS analysis using the spinnable StageTip protocol (Yu et al., 2014a). rDNA extraction and analysis are described below.

Shotgun Proteomics
An Ultimate 3000-nano liquid chromatography (LC) system coupled to a Q-Exactive mass spectrometer (both units from Thermo Scientific) was used for LC-MS/MS analysis. The workflow and data acquisition methods have been described comprehensively (Yu et al., 2014b). Briefly, peptide digestion products of approximately 10 µg were separated over a 150 min gradient from 2 to 80% acetonitrile (120 min to 35%, 10 min to 80%), with 0.1% formic acid in buffers A and B. The flow rate using an in-house packed column (75 µm × 15 cm, 3.0 µm ReproSil-Pur C18-AQ media) was 200 nl/min. MS survey scans were acquired at a resolution of 70,000 over a mass range of m/z 350-1,800. During each cycle in a data-dependent acquisition mode, the 10 most intense ions were subjected to high-energy collisional dissociation (HCD) applying a normalized collision energy of 27%. MS/MS scans were performed at a resolution of 17,500. Two technical replicate LC-MS/MS runs were performed per sample, and the MS data were combined in the MaxQuant analysis process.

Identification and Quantitation of Proteins
The MS raw data were processed using the Proteome Discoverer platform (version 1.4, Thermo Scientific) and the Sequest HT algorithm. The database contained protein sequences from the Mtb strain ATCC 25618/H37Rv (7,955 entries) and human proteins (20,195 sequence entries; reviewed sequences only; version 2015_06). The search parameters included two missed tryptic cleavages, oxidation (M), protein N-terminal acetylation and deamidation (N, Q) as variable modifications, and carbamidomethylation (C) as a fixed modification. Minimum peptide length was seven amino acids. The MS and MS/MS ion tolerances were set at 10 ppm and 0.02 Da, respectively. The false discovery rate (FDR) was estimated employing the integrated Percolator tool. Only protein hits identified with a 1% FDR threshold were accepted. For protein quantification, the MaxQuant and Andromeda software suite (version 1.4.2.0) was used, accepting most default settings provided in the software tool (Yu et al., 2017a). Label-free quantification (LFQ) generates relative protein abundance data from integrated MS1 peak areas of the high-resolution MS scans (Cox et al., 2014). Only proteins quantified by at least one unique peptide were used for analysis. LFQ values were log (base 2) transformed, and then imputed with respect to missing values. Clustering and correlation analyses were performed using functions embedded in the Perseus (version 1.5.0.15) software. The LC-MS/MS data were deposited in ProteomeXchange via the PRIDE partner repository under the dataset identifier is PXD012412.

16S rDNA Sequencing
Aliquots of 300 µl of the SDS-denatured sputum lysates were subjected to phenol-chloroform extraction to isolate total DNA. Ethanol-precipitated enriched DNA extracts were subjected to PCR amplification using 515F and 806R forward and reverse primers, respectively, to amplify the V4 region of microbial DNA in the extracts. The PCR amplification method was previously described (Claassen-Weitz et al., 2018). A two-step amplification was performed to reduce the risk of non-specific binding when using adapters/sequencing primers of more than 100 base pairs (bp). For 167 individual specimens, sufficient quantities of the V4 sequence region (254 bp) were amplified as visualized by ethidium bromide staining in agarose gels. Samples for DNA library preparation were obtained by excising bands of approximately 300 bp and normalizing the DNA quantity per sample by quantifying with Quanti-iTTM PicoGreen ® (Life Technologies, USA). Amplicons were pooled at 100 ng each. A positive control (Escherichia coli DNA extract) was subjected to the same amplification and purification protocol. A standard Illumina sequencing-based library preparation and sequencing protocol (MiSeq Reagent Kit v3, 600 cycles) was used as described (Claassen-Weitz et al., 2018). The library dilution had 15% PhiX as an internal control at a 4 pM concentration.

Processing and Filtering of 16S Sequence Reads
We generated operational taxonomic units (OTUs) de novo from raw Illumina 16S rDNA sequence reads using the UPARSE pipeline (Edgar 2013). Methods for trimming of the adapter sequences, barcodes and primers, the elimination of sequences of low quality, the de-replication step, and sequence read abundance determinations were analogous to those applied previously (Singh et al., 2019). Chimera filtering of the sequences occurred during the clustering step. We used the Wang classifier, bootstrapping using 100 iterations and mothur to report full taxonomies for only those sequences where 80 or more of the 100 iterations were the same (cutoff = 80). The taxonomies were assigned to OTUs using mothur (Schloss et al., 2009) with the version 123 of the SILVA 16S ribosomal RNA database as the reference (Quast et al., 2013). From the tables of OTUs with corresponding taxonomy assignments, we removed likely non-informative OTUs (rare OTUs and taxa strongly affected by MiSeq sequencing errors). Unbiased metadataindependent filtering was applied at each taxonomy level by eliminating features that did not pass the selected criteria (<2,000 reads and OTUs present in less than 10 samples), as described previously (Singh et al., 2019). The 16S rDNA sequencing data were deposited in NCBI BioProject under dataset identifier PRJNA663902.

Identification of Phylogenetic Groups for Gut and Oral Microbiota
The phyloseq package version 1.16.2 in R package version 3.2.3 was used for the microbiome census data analysis (McMurdie and Holmes, 2013). The plot_richness function was used to create a plot of alpha diversity index estimates for each sample. The differences in microbial richness (a-diversity) were evaluated using Wilcoxon t-test. The ordination analysis was performed using the non-metric multi-dimensional scaling (NMDS) with the Bray-Curtis dissimilarity matrix (Bray, 1957). The data output was used for the generation of a heatmap using the plot_heatmap function including a side bar where clinical variables associated with each sample were assigned to look for specific associations (McMurdie and Holmes, 2013).

Statistical Analyses of Microbiome and Proteomic Data
To detect differential abundances in microbiota at a genus or species level the DESeq2 package version 1.12.3 in R was used. Phyloseq data are converted into a DESeq2 object using the function phyloseq_to_deseq2 function. DESeq2 (Love et al., 2014) is a method for the differential analysis of count data that uses shrinkage estimation for dispersions and fold changes to improve both stability and interpretability of the estimates. The DESeq2 test uses a negative binomial model rather than simple proportion-based normalization or rarefaction to control for different sequencing depths, which may increase the power and also lower the false positive detection rate (McMurdie and Holmes, 2014). Default options of DESeq2 were used for multiple testing adjustment applying the Benjamini-Hochberg method (Benjamini YH, 1995). To detect differential abundances in proteomic datasets, we used the Welch t-test, an unequal variance two-sample location test embedded in the Perseus software tool. The P-value significance threshold was set at ≤0.01. To identify enriched biological pathways from differentially abundant proteins (TB vs LTBI), we employed GO term analysis (http://geneontology.org/). To determine the enriched expression of proteins in tissues or anatomical locations, we reviewed relevant information in the Protein Atlas database (https://proteinatlas.org/).

Human Subjects
Inhabitants of the South Omo province in southern Ethiopia are highly diverse with respect to ethnicity, culture, and language. Many of its people have a traditional lifestyle as cattle herders or in subsistence agriculture with little access to medical care. The major ethnic groups are Hamar, Daasanech, Bena, Tsemay, Selamago, Maale, Ari, and Nyangatom. The prevalences of TB and LTBI were of interest given the region's diversity of inhabitants, lack of urbanization, and increasing exposure to tourism. The age range for enrolment was 12 to 70 (mean = 38.5). Females accounted for 46% of the enrolled subjects. Individuals diagnosed with HIV/ AIDS were excluded from this study. Previously, we reported on the lineage diversity of MTBC strains for TB-positive subjects, the low prevalence of Mtb isolates resistant to first-line antibiotic drugs, and high prevalence of LTBI among individuals without disease symptoms (Teklu et al., 2018a;Wondale et al., 2018;Teklu et al., 2018b). Using AFB smear microscopy, 70 specimens were identified as positive for MTBC. Of these, 53 specimens resulted in positive mycobacterial cultures. Two individuals were included in the TB cohort based only on clinical symptoms consistent with TB. MTBC lineage genotyping analysis resulted in the highest prevalences for the Euro-American (EA) and East-African-Indian (EAI) lineages (67 and 22%, respectively) (Wondale et al., 2020). The prevalence of LTBI was determined for 497 individuals enrolled in this study, frequently household contacts for subjects diagnosed with TB. The IGRA data suggested LTBI to be a frequent occurrence in this population (50.5% of all tested subjects). Based on the availability of simultaneously collected sputum samples sufficient for one or two types of analyses (proteomic and microbial) and the preference to have good matches to TB patients regarding gender and ethnicity, we conducted analyses here using samples from 100 individuals without evidence of TB. In total, 115 and 161 sputum samples were subjected to proteomic and microbiome surveys, respectively, as depicted in Figure 1. Individual specimen and human subject data categorized into the groups PTB, LTBI, and NCC are provided in Supplementary File S1.

Sputum Proteomics
Collapsing protein identifications (IDs) regardless of disease group, 2,039 and 207 non-redundant human and Mtb proteins, respectively, were obtained from the LC-MS/MS data. The numbers pertain to proteins with at least two unique peptides (Supplementary File S2). We note that mycobacterial proteins are not discussed in this report. LC-MS/MS technical replicates had higher correlation values for protein abundances than datasets comparing different biological samples (R-values of 0.93-0.98 vs 0.57-0.76, as shown in Supplementary File S3). This was indicative of good quantitative accuracy achieved by the proteomic workflow including computational analysis. To assess the protein contributions from upper respiratory tract (saliva) and lower respiratory tract (expectorated sputum), we compared our data with three other studies, two of those for saliva (Wu et al., 2015;Grassl et al., 2016;Cao et al., 2017). The Venn diagram in Figure 2 displays protein ID overlaps among all datasets. As expected, protein ID overlaps of our study and the one by Cao et al. on a sputum proteome (Cao et al., 2017) were the highest (~87%). Sputum proteomes are composed of proteins originating from both upper and lower respiratory tracts. Examples of proteins that have been reported to be enriched in saliva and sputum, and are present in our surveys, are the basic salivary proline-rich protein 3 (PRB3) (Kim et al., 1993) and the pulmonary surfactant-associated protein A1 (SFTPA1) (Hermans and Bernard, 1999), respectively.

Sputum Proteomics Reveals Neutrophil Infiltration and Acute Phase Responses in the Respiratory Tract of PTB Subjects
We selected datasets with at least 150 protein IDs per sample and protein IDs detected in at least nine subjects. Thus, 75 datasets were retained for quantification and clustering analyses, each with 432 distinct proteins (Supplementary File S2). Pearson Correlation hierarchical clustering (HCPC) was used to assess if genderspecific omics analyses were warranted. While HCPC did not show significant gender-specific clustering, clusters were observed based on the diagnosis of infection with PTB (Supplementary File S4). Next, gender-integrated data were submitted to Principal Component Analysis (PCA). PTB datasets largely clustered separately, although some overlap with the LTBI and NCC groups was observed (Figure 3). Lack of separation between LTBI and NCC datasets in the PCA suggested that latency of TB does not strongly influence the sputum proteome compared to this disease's absence. Differential sputum proteomic analysis using unequal variance Welsh t-tests (Welch, 1947) resulted in 103 proteins with ≥2-fold changes (with Benjamini-Hochberg method corrected Pvalues <0.01; PTB vs LTBI data). Twelve of these proteins are listed in Table 1. All 103 proteins are displayed in the Volcano Plot of Figure 4 and listed in the Supplementary File S5. Examples of antimicrobial effectors increased in TB datasets are peptidoglycan recognition protein 1, collagenase MMP8, and myeloperoxidase (MPO). Detailed data on protein quantification, tissue localizations, and functional roles are presented in Supplementary File S6. This data supported strong inflammatory and immune responses against Mtb for the PTB cohort only: 16 and 14 of 47 proteins that were quantitatively increased in the PTB group (vs LTBI) are acute phase reactants or highly expressed in leukocytes, respectively. Protein Atlas tissue expression profiles supported the notion that these leukocyte-associated proteins were derived from neutrophils. Four proteins revealed consensus data in Protein Atlas (Uhlen et al., 2015) for unique presence in neutrophil granules or membranes (see Supplementary File S6). Other differentially abundant proteins contribute to oxidative stress responses (e.g., superoxide dismutase 2, apolipoprotein D, mitochondrial glutathione reductase and catalase). The GO term biological process enrichment ( Figure 5) and protein network analysis in Cytoscape ( Figure 6) were consistent with leukocyte-, primarily neutrophil-mediated immune activation and acute phase response pathways. Neutrophils are responsible for inflammation and kill pathogens using various mechanisms while acute phase and oxidative stress responses limit host cell collateral damage. Of the 56 proteins decreased in the PTB datasets in comparison to the LTBI datasets, half are expressed, and often enriched, in esophageal and FIGURE 2 | Venn diagram with protein identifications derived from two studies of saliva proteomes and two studies of sputum proteomes. The Cao study included analysis of samples associated with asthma. The Wu study included samples associated with squamous epithelial cell carcinoma of the oral cavity. In all studies, at least 800 proteins were identified. oral mucosal tissues ( Table 1 and Supplementary File S6). Four proteins are specifically secreted by salivary glands. Respiratory mucosal cells express and release proteins dedicated to form a barrier towards the external environment. Protein network analysis and enriched GO term categories yielded evidence that relative quantitative increases of the 56 proteins (LTBI vs PTB) are associated with processes reflecting the normal physiology of the respiratory tract squamous epithelium such as keratinocyte differentiation, peptide cross-linking, stress responses, and epidermal development (Figures 5 and 6). Given that inflammatory cells and their proteins are released into sputum during anti-Mtb responses, those that represent the normal squamous epithelial secretion and shedding of dead cells are reduced in abundance, relatively to the LTBI group. Examples of the proteins that represent the aforementioned processes are psoriasin (protein S100-A7), two mucins, two small proline-rich proteins, and additional cytoskeleton-associated cornified envelope proteins such as periplakin, desmoplakin, cornulin, and plakoglobin (Supplementary Files S5, S6). These findings are likely not biologically important since they reflect contamination of saliva in the sputum samples.

a-1-Acid Glycoproteins
Complement system activation and negative regulation of endopeptidase activities are elements of the acute phase response. Proteins with these functions were increased in the PTB group, e.g., complement factors C4B and C9, and the serpins A1, A3, B1, B10, F2, and G1 ( Figure 4) and enriched as a GO term ( Figure 5). In addition to ferritin, two isoforms of a-1-acid glycoprotein (a1-AGP), also named ORM1 and ORM2, had greater than 4.4-fold increases in the PTB group (Figures 4, 7) while no such differences were evident in the comparison of LTBI and NCC groups. a1-AGP is released by alveolar macrophages during pulmonary inflammation (Fournier et al., 1999), in addition to its secretion by hepatocytes into blood plasma. Western Blots for 18 sputum samples revealed a wide range of M r values for a1-AGP and made proteomic data validation in a defined M r range difficult (Supplementary File S7). But the a1-AGP bands in PTB samples had overall higher staining intensities than LTBI and NCC samples.   (Uhlen et al., 2015)].

Sputum Proteomic Analyses Reveal a Few Abundance Differences Comparing LTBI and NCC Groups
Twenty-one proteins were differentially abundant comparing IGRA-positive (LTBI) and NCC datasets (Figure 8). Lamin B1 (LMNB1), nucleobindin-1 (NUCB1), ribonuclease T2 (RNASET2), lactate dehydrogenase B chain (LDHB), and translocator protein (TSPO) displayed the highest statistical significances. Small proline-rich protein 3 (SPRR3) was the only protein that differed in abundance as it pertains to the TB vs LTBI and the LTBI vs NCC comparisons ( Table 2). It is noteworthy that two of the aforementioned proteins are enriched FIGURE 4 | Volcano plot depicting protein abundance differences for the comparison of PTB and LTBI groups. Data are derived from sputum shotgun proteomic analyses. The unequal variance Welch t-test with multiple testing corrections was used in the Perseus software, and 103 proteins (marked with UniProt short names) had differences with a P-value <0.01 and a fold change >2. Red and blue dots denote proteins increased and decreased in the PTB group, respectively.  Figure 8). They are of interest as surrogate biomarkers in sputum to discern the latent TB disease stage from healthy human subjects. Given that sample groupings were derived from IFN-g release data, inaccuracies associated with those measurements would adversely affect the differential analysis of sputum proteomic data (LTBI vs NCC).
FIGURE 6 | Protein network analysis and functional enrichment clusters. The network was built from 103 differentially abundant proteins comparing PTB and LTBI sample groups as input data and the String App in Cytoscape software. The score cut-off for interaction confidence was set to 0.4. Color coding is in accordance with the fold changes. Diamond shape depicts proteins associated with a response to stimulus. Protein clusters were annotated based on enrichment, a function embedded in Cytoscape.
FIGURE 7 | Quantitative differences for the a-1-acid glycoproteins ORM1 and ORM2 in box plots comparing datasets for PTB vs LTBI as well as PTB vs NCC. The P-values were highly significant indicating the important role for the acute phase reactants in modulating the PTB pathology. LFQ values are based on summed MS1 peak integrations for all peptides assigned to protein of origin. n.s, not statistically significant.

Microbiome Comparisons Reveal Differential Abundances of Rothia and Haemophilus for PTB and LTBI Cohorts
Using the V4 region of 16S rRNA to classify sputum microbial taxa, we determined Streptococcus to be the most abundant genus ( Figure 9) and noted the absence of a-diversity differences at the level of genera (Supplementary File S9) among the cohorts. There was no separation of the oral microbial profiles among the three cohorts based on the PCA data (Supplementary File S10). Mycobacterium, as a genus, was detected only for individuals in the PTB group (adjusted P-value of 3 × 10 −11 , PTB vs LTBI). This finding generated confidence in the accuracy of the V4-region 16S rRNA sequence analysis (Supplementary File S11). We identified significant quantitative differences for the taxa Rothia and Haemophilus. The genus Rothia featured a three-fold decrease with an adjusted P-value of 8.9 × 10 −8 (PTB vs LTBI, Figure 10). The genus Haemophilus had a 2.2-fold increase with an adjusted P-value of 0.029 (PTB vs LTBI). Interestingly, both Rothia and Haemophilus were also the on average most abundant genera for their respective phyla, Actinobacteria and Proteobacteria (Figure 9). We identified 13 additional genera with statistically significant differences (adjusted P-values <0.05), but the fold changes were either lower than 1.5 or the genera had low sequence read assignments (Supplementary File S12).

Neutrophil Effectors and APR Proteins Are Sputum Biomarkers for PTB
We report quantitative differences for proteins and microbial taxa derived from sputum samples comparing a PTB cohort from South Omo with asymptomatic cohorts, one of which was designated as LTBI based on our data from a WHO-approved IGRA (Bastian et al., 2017). The study followed investigations of lineage analysis of MTBC isolates derived from individuals with TB as well as cohort and immunological characterizations of the LTBI group from this multi-ethnic pastoralist region of Ethiopia (Wondale et al., 2017;Teklu et al., 2018a;Wondale et al., 2018;Teklu et al., 2018b). The PCA revealed good separation of PTB proteome profiles from those of the other groups. LTBI and NCC clusters were not discerned. More than 100 differentially abundant proteins (PTB vs LTBI) allowed insights into immune responses linked to the PTB infections (Figures 4-6). Functional and network analysis of these proteins is highly consistent with the infiltration of neutrophils, and perhaps other leukocytes in the respiratory tract and the release of antimicrobial effectors into sputum for subjects with PTB. Examples of such effectors are peptidoglycan recognition protein 1, the collagenase MMP8, and MPO. Proteins part of the complement system (e.g., C4B, C9) and the APR were also increased in abundance in the PTB cohort. Furthermore, innate immune response pathways were apparently activated to eliminate the pathogen and limit or resolve host cell damage. To counteract the inflammation-generating effects of immune effectors, protease inhibitors were elevated in abundance in the PTB compared to the LTBI group; this includes serpin B1, which inhibits neutrophil FIGURE 8 | Volcano plot depicting protein abundance differences comparing the LTBI and NCC groups. A P-value <0.01 and a fold change >2 were applied to identify the differentially abundant proteins (each denoted in green and marked with the UniProt short name).  (Uhlen et al., 2015)].
proteases and modulates innate immunity (Benarafa et al., 2007;Benarafa et al., 2011), serpin B10, which appears to control TNF-ainduced cell death (Schleef and Chuang, 2000); and serpin G1, which inhibits complement activity and other pro-inflammatory signals (Dorresteijn et al., 2010). Oxidative stress response enzymes including catalase, SOD2, and ApoD were also increased in the PTB group, suggesting a role in controlling damage by ROS. ROS levels rise when leukocytes degranulate and activate NADPH oxidase (subunit CYBB was increased in abundance in the PTB group) and MPO. High density lipoproteins harboring ApoD and ApoB (also increased in abundance in the PTB group) are known to suppress TNF-a release from Mtb-infected macrophages (Inoue et al., 2018). Two APR proteins that scavenge heme and iron extracellularly (hemopexin and ferritin, respectively) were elevated in abundance in sputum of subjects with PTB compared to LTBI. Sequestering iron/heme limits the growth of pathogens in infected tissues (Parrow et al., 2013). Mtb is able to enter a persistent state in necrotic granulomas under iron sequestration challenges (Kurthkoti et al., 2017), thus enabling long-term survival in the host. Given that sputum proteome profiles of the LTBI group were much more similar to the NCC group than to the PTB group, the host defense Mycobacterium, also an Actinobacterium, revealed low abundance so that it is not visualized in the segmented bars for this phylum. Haemophilus was the most abundant genus, followed by Neisseria, in the phylum Proteobacteria.
FIGURE 10 | Quantitative differences for Rothia displayed in box plots comparing PTB datasets with those of the LTBI and NCC groups. As shown, the P-values were significant in both comparisons. ****P-value < 0.0001; **P-value < 0.01; ns, not statistically significant. and inflammation-associated pathways were apparently absent, or immeasurably small, during latent infection. The systemic role of neutrophils in the response to PTB was linked to a neutrophildriven IFN-g inducible transcriptional signature in whole blood that correlated with disease severity (Berry et al., 2010). The important role of neutrophils as phagocytes critical to the defense against lunginvading Mtb was previously reported (Eum et al., 2010). Whether persistent neutrophil activation is detrimental to the clinical outcomes of chronic TB manifestations is a matter of debate (Lowe et al., 2012). Since TB severity was not assessed in our work, we were unable to correlate the APR and neutrophil biomarkers with the severity of infection.

Proteins of Epithelial and Salivary Gland Origin Were Lower in Abundance in PTB vs LTBI Subjects
Since label-free LC-MS/MS quantification calculates protein contributions relative to total proteome, we argue that decreased abundances of these proteins in sputum of PTB patients is a consequence of infiltration, degranulation, and lysis of immune cells along with APR proteins that are secreted via the microvasculature into the airways. It was reported that viral respiratory infections induce mucin secretion to enable trapping the viruses in mucus (Vareille et al., 2011). We do not see an analogy to Mtb infection. Two mucins secreted into the respiratory tract (MUC7 and MUC5AC) were actually less abundant in PTB compared to LTBI sputum profiles. Psoriasin (S100-A7), a squamous epithelial protein also decreased in PTB datasets has antimicrobial and neutrophil-degranulating functions in the upper airways (Glaser et al., 2009). Our data do not support a role of S100-A7 in the anti-Mtb response. We surmise that Mtb infection does not trigger host defenses in the mucosa of the upper airways, which is in line with the finding that upper respiratory tract TB is a rare clinical event (Jindal et al., 2016).
a-1-Acid Glycoproteins Are Strong Sputum Biomarkers for PTB APR proteins were invariably higher in abundance in sputum of PTB compared to LTBI subjects. APR protein increases were previously linked to Mtb infections and a biomarker role: serum a-1-antitrypsin in the context of TB diagnosis (Song et al., 2014); serum CRP and ferritin as biomarkers of disease persistence during anti-TB therapy (Miranda et al., 2017). Ferritin concentrations were reported to be increased in acute inflammatory lung injury in order to bind extracellular and intracellular iron to reduce host susceptibility to oxidative damage (Kim and Wessling-Resnick, 2012). Ferritin was also associated with leakage from damaged cells (Kell and Pretorius, 2014). In addition to ferritin, we found two a1-AGP proteins (ORM1 and ORM2) to be elevated in abundance in PTB patients. Earlier, a1-AGP was described as induced in alveolar macrophages upon lung inflammation, with a 4-fold increased secretion in patients afflicted by interstitial lung injuries (Fournier et al., 1999). A murine study identified foamy macrophages located in pneumonic areas of Mtb-infected lungs as an important source of a1-AGP; administration of antibodies neutralizing this glycoprotein led to lower bacillary loads and less tissue damage, suggesting an adverse effect of a1-AGP during disease progression (Martinez Cordero et al., 2008). We surmise that sputum a1-AGPs and ferritin are surrogate biomarkers of PTB and also influence the outcome of Mtb i n f e c t i o n s . T h e c o n t r i b u t i o n o f a 1 -A G P s t o t h e immunopathology of PTB needs to be evaluated further in disease models. Their roles as sensitive and specific protein biomarkers in sputum, in the context of early diagnosis or disease progression, requires further validation by surveying larger cohorts.

LTBI Biomarker Candidates
Five proteins were differentially abundant comparing LTBI and NCC cohorts with low P-values (<0.01), all of them less abundant in the sputum of the LTBI group. These proteins are candidate biomarkers for the diagnosis of LTBI, perhaps complementary to IGRAs. The specificity and negative and positive predictive values of IGRAs were reported to be high for LTBI diagnosis (Diel et al., 2011). However, the test kits are expensive and moderate performance was noted for their sensitivity to diagnose latent disease in children in the 55-70% range (Machingaidze et al., 2011). With respect to the proteins identified by our differential proteomic analysis, only the translocator protein TSPO was previously linked to the pathology and a biomarker role (monitoring TB progression in situ) (Foss et al., 2013). Radioiodinated DPA-713, a TSPO synthetic ligand, was used to visualize anti-TB host responses in vivo. Strong TSPO and CD68 co-staining was observed for macrophages in granulomas (Foss et al., 2013). TSPO, a protein enriched in mitochondrial outer membranes (Betlazar et al., 2020), is also endogenously expressed in bronchoalveolar epithelial cells (Hardwick et al., 2005), mediates cholesterol translocation, and was physiologically linked to neuroinflammation by influencing MAPK pathways and the NLRP3 inflammasome (Betlazar et al., 2020). Interestingly, Mtb activates the NLRP3 inflammasome via the ESX-1 secretion system (Dorhoi et al., 2012). TSPO was not differentially abundant in our PTB vs LTBI cohort comparison. We hypothesize that low abundance of TSPO in macrophages alters the crosstalk of pathogen and immune cells in infected lungs, but this does not explain why there would be quantitative differences for this protein between LTBI and NCC cohorts. LDHB, another protein decreased in the LTBI vs NCC cohort, may also influence anti-Mtb immune responses. The mitochondrial enzyme's product, lactate, was reported to promote the switch of CD4 T cells to an IL-17 T cell subset and reduce CD8 T cell cytolytic capacity (Pucino et al., 2017). A third protein was ribonuclease T2, part of a family of ribonucleases that have immunomodulatory and antimicrobial properties (Lu et al., 2018).

Rothia, a Respiratory Tract Microbial Biomarker for PTB?
We conducted one of the two largest microbiome profiling efforts related to TB to date. While there were no differences in overall aor b-diversity in sputum microbiomes among the cohorts, we identified statistically significant differences in abundance for the genera Rothia and Haemophilus, two oral microbiome community members, in the comparison of the PTB and LTBI datasets. Rothia mucilaginosa was recently reported to produce the siderophore enterobactin in the human oral niche (Uranga et al., 2020). Mtb is also a producer of siderophores (mycobactins), which are released to acquire iron from the host in infected lung tissue (De Voss et al., 2000). Enhanced iron sequestration during an active host inflammatory response in PTB possibly diminishes the fitness and growth of Rothia species in the upper respiratory tract. This genus is a ubiquitous commensal organism of the oral cavity (Singh et al., 2019;Uranga et al., 2020). One may also speculate that the genus Haemophilus, increased in abundance in respiratory secretions of PTB compared to LTBI subjects, is due to its better adaptation to inflammatory conditions caused by the infection with Mtb. The pathogen Haemophilus influenzae adapts to the neutrophil-rich milieu in the inflamed airways of asthmatic patients (Essilfie et al., 2012). Our 16S rRNA analysis did not allow further insights at the species level.
Recently, a comprehensive multi-center study assessing sputum metagenomic profiles from TB patients in comparison with healthy controls and patients diagnosed with other lung diseases reported the absence of distinct, common microbial signatures for TB patients across three geographical locations (Italy, Switzerland, and Bangladesh) (Sala et al., 2020). Consistent with our findings, differences in microbial a-diversity comparing the TB and healthy control cohorts were not observed. We note that different regions of the 16S rRNA gene were profiled in our and the study by Sala et al. We sequenced the V4 region while Sala et al. (2020) sequenced the V1-V2 region. Other oral microbiome characterization studies comparing TB and control cohorts did not report statistically significant differences for the genera Rothia and Haemophilus (Naidoo et al., 2019). One study analyzing nearly 100 sputum specimens (Wu et al., 2013) associated respiratory microbiota with different treatment outcomes for TB. Subjects with LTBI were not included. The most significant genus-level change was Prevotella, an anaerobic bacterium with higher abundance in healthy controls compared to subjects with TB (Wu et al., 2013). A recent intriguing discovery was the role of Helicobacter colonization of the gut in IFN-g-dependent reduced susceptibility to active TB progression, observed in a primate study that may translate to Helicobacter colonization in humans (Perry et al., 2010). The production of metabolites by anaerobic respiratory bacteria was discussed as a potential modulatory influence on risk of TB progression (Naidoo et al., 2019). Given the high compositional variability of oral and gut microbiota for different socio-economic and geographic settings, such comparisons remain challenging. Our study was plausibly influenced by the diets of the highly diverse participating ethnic groups and the occurrence of malnutrition. Thus, further studies will be needed to determine the functional relevance of Rothia and Haemophilus abundance changes in the context of PTB and inflammation of the human airways.

CONCLUSIONS
We performed comprehensive analyses of sputum microbiomes and proteomes comparing Ethiopian cohorts characterized by high cultural and ethnic diversity in a remote region to learn more about molecular and microbial community differences related to acute and latent infections with Mtb. In the comparison of three groups (TB, LTBI, NCC) we discovered new and verified previously reported protein biomarkers that are useful as targets of sputum diagnostic tests for active TB. Additionally, we identified potential protein biomarkers for latent TB that are interesting targets to complement or even replace the currently WHO-recommended diagnostic tests for LTBI, IGRA assays. We identified a potentially antagonistic relationship between Mtb and Rothia based on strongly decreased, statistically significant Rothia abundance decreases in the TB vs LTBI and NCC cohorts. Our findings need to be validated in larger clinical studies and deserve additional mechanistic investigations.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The 16S rDNA sequencing data were deposited in the NCBI BioProject under dataset identifier PRJNA663902. URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA663902. The proteomic raw (LC-MS/MS) data were deposited in ProteomeXchange via the PRIDE partner repository under the dataset identifier is PXD012412.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board at Aklilu Lemma Institute of Pathobiology, Addis Ababa University; National Research Ethics Committee of Ethiopia; J. Craig Venter Institute IRB. Written informed consent to participate in this study was provided by the participant or, under the age of 16, his/ her legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
MH: contributions to all experimental research sections and data analysis. YY: implemented proteomic research and analysis of protein biomarkers and significant manuscript writing contributions. HS: implemented computational part of microbiome analysis and manuscript writing contributions. TTe: contributions to experimental design and biological assays. BW: key contributions to design of human subject study and manuscript review. AW: sample collection and processing. AZ: sample collection and processing. SM: implemented experimental aspects of microbiome analysis using NGS methods. TTs: sample processing. ML: human subject study design. AG: responsible for human subject cohort design and epidemiological aspects and manuscript writing contributions. RP: conceptualized and implemented biomarker study, analyzed and interpreted hostpathogen interaction and immune response data, and wrote all sections of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the grant 1U01HG007472 (National Institutes of Health).