Comparative Genomics and Proteomic Analysis of Four Non-tuberculous Mycobacterium Species and Mycobacterium tuberculosis Complex: Occurrence of Shared Immunogenic Proteins

The Esx and PE/PPE families of proteins are among the most immunodominant mycobacterial antigens and have thus been the focus of research to develop vaccines and immunological tests for diagnosis of bovine and human tuberculosis, mainly caused by Mycobacterium bovis and Mycobacterium tuberculosis, respectively. In non-tuberculous mycobacteria (NTM), multiple copies of genes encoding homologous proteins have mainly been identified in pathogenic Mycobacterium species phylogenically related to Mycobacterium tuberculosis and Mycobacterium bovis. Only ancestral copies of these genes have been identified in nonpathogenic NTM species like Mycobacterium smegmatis, Mycobacterium sp. KMS, Mycobacterium sp. MCS, and Mycobacterium sp. JLS. In this study we elucidated the genomes of four nonpathogenic NTM species, viz Mycobacterium komanii sp. nov., Mycobacterium malmesburii sp. nov., Mycobacterium nonchromogenicum, and Mycobacterium fortuitum ATCC 6841. These genomes were investigated for genes encoding for the Esx and PE/PPE (situated in the esx cluster) family of proteins as well as adjacent genes situated in the ESX-1 to ESX-5 regions. To identify proteins actually expressed, comparative proteomic analyses of purified protein derivatives from three of the NTM as well as Mycobacterium kansasii ATCC 12478 and the commercially available purified protein derivatives from Mycobacterium bovis and Mycobacterium avium was performed. The genomic analysis revealed the occurrence in each of the four NTM, orthologs of the genes encoding for the Esx family, the PE and PPE family proteins in M. bovis and M. tuberculosis. The identification of genes of the ESX-1, ESX-3, and ESX-4 region including esxA, esxB, ppe68, pe5, and pe35 adds to earlier reports of these genes in nonpathogenic NTM like M. smegmatis, Mycobacterium sp. JLS and Mycobacterium KMS. This report is also the first to identify esxN gene situated within the ESX-5 locus in M. nonchromogenicum. Our proteomics analysis identified a total of 609 proteins in the six PPDs and 22 of these were identified as shared between PPD of M.bovis and one or more of the NTM PPDs. Previously characterized M tuberculosis/M. bovis homologous immunogenic proteins detected in one or more of the nonpathogenic NTM in this study included CFP-10 (detected in M. malmesburii sp. nov. PPD), GroES (detected in all NTM PPDs but M. malmesburii sp. nov.), DnaK (detected in all NTM PPDs), and GroEL (detected in all NTM PPDs). This study confirms reports that the ESX-1, ESX-3, and ESX-4 regions are ancestral regions and thus found in the genomes of most mycobacteria. Identification of NTM homologs of immunogenic proteins warrants further investigation of their ability to cause cross-reactive immune responses with MTBC antigens.

The Esx and PE/PPE families of proteins are among the most immunodominant mycobacterial antigens and have thus been the focus of research to develop vaccines and immunological tests for diagnosis of bovine and human tuberculosis, mainly caused by Mycobacterium bovis and Mycobacterium tuberculosis, respectively. In non-tuberculous mycobacteria (NTM), multiple copies of genes encoding homologous proteins have mainly been identified in pathogenic Mycobacterium species phylogenically related to Mycobacterium tuberculosis and Mycobacterium bovis. Only ancestral copies of these genes have been identified in nonpathogenic NTM species like Mycobacterium smegmatis, Mycobacterium sp. KMS, Mycobacterium sp. MCS, and Mycobacterium sp. JLS. In this study we elucidated the genomes of four nonpathogenic NTM species, viz Mycobacterium komanii sp. nov., Mycobacterium malmesburii sp. nov., Mycobacterium nonchromogenicum, and Mycobacterium fortuitum ATCC 6841. These genomes were investigated for genes encoding for the Esx and PE/PPE (situated in the esx cluster) family of proteins as well as adjacent genes situated in the ESX-1 to ESX-5 regions. To identify proteins actually expressed, comparative proteomic analyses of purified protein derivatives from three of the NTM as well as Mycobacterium kansasii ATCC 12478 and the commercially available purified protein derivatives from Mycobacterium bovis and Mycobacterium avium was performed. The genomic analysis revealed the occurrence in each of the four NTM, orthologs of the genes encoding for the Esx family, the PE and PPE family proteins in M. bovis and M. tuberculosis. The identification of genes of the ESX-1, ESX-3, and ESX-4 region including esxA, esxB, ppe68, pe5, and pe35 adds to earlier reports of these genes in nonpathogenic NTM like M. smegmatis, Mycobacterium sp. JLS and Mycobacterium KMS. This report is also the first to identify esxN gene situated within the ESX-5 locus in M. nonchromogenicum. Our proteomics analysis identified a total of 609 proteins in the six PPDs and 22 of these were identified as shared

INTRODUCTION
Purified protein derivatives (PPDs) also known as tuberculins have been used for more than 100 years as antigens in the diagnosis of human and bovine tuberculosis (TB), based on the delayed hypersensitivity (skin) reaction they induce. The first preparation of PPD was introduced by Robert Koch in 1890, where a boiled crude extract of Mycobacterium containing a mixture of proteins, both somatic and antigenic, referred to as "old tuberculin, " was prepared as a potential vaccine against tuberculosis in humans. Whilst the "old tuberculin" failed to live up to its initial claim of having curative properties, it formed the foundation for the modern PPD preparations (Burke, 1993). Tuberculin based tests (tuberculin skin test and the interferon gamma assay) have since been used as diagnostic tools for bovine tuberculosis. While these tests generally show satisfactory sensitivities, the major drawback to the tuberculin based assays is the reduced specificity. This is presumably associated with cross-reactive immune responses due to exposure of animals and humans to non-tuberculous mycobacteria (NTM) or environmental mycobacteria and the TB vaccine strain, Bacille Calmette et Guérin (BCG). These crossreactive immune responses are most likely due to the presence of immunogenic proteins conserved across the Mycobacterium genus as major components of PPD derived from M. bovis (Schiller et al., 2010). PPD prepared from Mycobacterium avium is included as a control antigen representative of the background responses to environmental mycobacterial antigens in the single intradermal comparative cervical tuberculin test (SICCT) as well as the interferon gamma assays used in different countries in bovine tuberculosis (BTB) control strategies . As an alternative approach to mitigate the frequency of false positive responses to PPD-B induced by environmental mycobacterial antigens a PPD derived from Mycobacterium fortuitum (PPD-F) used in the modified BOVIGAM assay, has shown to increase the test specificity in South Africa (Michel et al., 2011).
Interference of NTM and BCG in immune responsiveness against bacteria of the Mycobacterium tuberculosis complex (MTBC) has triggered investigation of "specific" or defined antigens which are uniquely present in pathogenic mycobacteria, and absent in most NTM as well as M. bovis BCG as markers for the diagnosis of bovine and human tuberculosis (Schiller et al., 2010). Comparative genomics studies have led to the identification and characterization of certain genetic regions in the genomes of mycobacteria that are present in Mycobacterium tuberculosis and Mycobacterium bovis but absent in M. bovis BCG and most NTM. The Region of Difference 1 (RD1), a major deleted region in BCG first described by Mahairas et al. (1996) seems to have been the early event in the attenuation of the present vaccine strain, hence proteins within this region are studied extensively in order to try to understand the virulence of pathogenic MTBC. The two most predominant proteins encoded in the RD1 region, namely the 6 kDa early secreted antigenic target (ESAT-6/EsxA) and the 10 kDa culture filtrate protein (CFP-10/EsxB) have been investigated widely for inclusion into candidate vaccines and used as antigens in the diagnosis of both TB and BTB (Pym et al., 2003;Vordermeier et al., 2006;Ganguly et al., 2008). Functional studies have revealed that RD1 encodes a specialized system for secretion in mycobacteria, involving genes situated both inside the RD1 locus and its flanking regions, known as the ESAT-6 secretion system (Brodin et al., 2004). The ESAT-6 and CFP-10 proteins belong to the Esx family that consists of 23 small secreted proteins encoded by genes esxA-esxW and arranged in 11 gene pairs (Bitter et al., 2009). Five of the 11 genomic pairs are contained in five loci: ESX-1 to ESX-5 encoding components of the type VII secretion system (Uplekar et al., 2011). Four of the five loci, except ESX-4, also harbor 11 proteins of the Proline Glutamate (PE), or Proline Proline Glutamate (PPE) family, several of which have been implicated in immune evasion (Akhter et al., 2012;Mustafa, 2014). The PE and PPE and Esx proteins whose genes are adjacent to each other in the genome of Mycobacterium tuberculosis interact to form pairs which appear to be essential for their secretion (Renshaw et al., 2005). The secretory proteins have been the focus of research aimed at development of TB vaccines and immunodiagnostic assays, because they are thought to have a potential to induce protective immunity and immune responses of diagnostic value (Ize and Palmer, 2006;Ganguly et al., 2008;Marongui et al., 2013;Mustafa, 2014).
Studies have shown however, that some genes encoding for proteins of the Esx family, as well as PE/PPE do occur in pathogenic NTM like Mycobacterium kansasii Mycobacterium marinum, Mycobacterium szulgai, Mycobacterium gordonae, Mycobacterium gastri, Mycobacterium ulcerans, Mycobacterium sp. JDM601, and Mycobacterium riyadhense, as well as in Mycobacterium leprae (Gey van Pittius et al., 2001Geluk et al., 2002Geluk et al., , 2004Newton-Foot et al., 2016) while genes encoding for PE/PPE family of proteins were also identified in Mycobacterium abscessus and Mycobacterium avium (Sorensen et al., 1995;Vordermeier et al., 2007;van Ingen et al., 2009;Mustafa, 2014). The immunogenicity and cross-reactivity against MTBC antigens of the ESAT-6 and CFP 10 proteins of pathogenic NTM was as well-demonstrated in M. kansasii (Waters et al., 2010). Therefore use of these antigens in BTB diagnosis may be hampered by the exposure of humans and animals to pathogenic NTM leading to false positive BTB immunological test results.
To assess the presence of homologous proteins in NTM potentially able to elicit cross-reactive immune responses against MTBC antigens we elucidated the genomes of four NTM species, viz Mycobacterium nonchromogenicum, Mycobacterium fortuitum ATCC 6841, Mycobacterium malmesburii sp. nov., and Mycobacterium komanii sp. nov. and compared these with the genomes of Mycobacterium tuberculosis H37Rv and Mycobacterium bovis AF2122/97 to identify orthologous genes situated in the ESX-1 to ESX-5. In addition, the proteomes of PPDs derived from the NTM viz M. nonchromogenicum (PPD-N), M. fortuitum ATCC 6841 (PPD-F), M. malmesburii sp. nov (PPD-M), and M. kansasii ATCC 12478 (PPD-K) were defined to identify the proteins actually produced and compared to those of the commercially available PPDs derived from M. avium and M. bovis. This is the first report of the occurrence of selected genes encoding proteins that were previously shown to be immunogenic in M. bovis/M. tuberculosis in the four NTM genomes as well as the expression of proteins they code for.

Bacterial Cultures
M. malmesburii sp. nov. and M. komanii sp. nov. were isolated from cattle nasal swabs during an NTM survey in South Africa (Gcebe et al., 2013). These two species were previously identified and characterized as novel NTM (Gcebe et al., unpublished). The M. nonchromogenicum strain was isolated from soil during an NTM survey (Gcebe et al., 2013). M. fortuitum ATCC 6841 and M. kansasii ATCC 12478 were obtained from the Agricultural Research Council-Onderstepoort Veterinary Institute, South Africa. The identity of all the strains used was confirmed by PCR-sequencing assays targeting the 16S rDNA (Gcebe et al., 2013), hsp65 (Telenti et al., 1993), rpoB (Adékambi et al., 2006), and sodA (Adékambi and Drancourt, 2004). For PPD production, liquid cultures were prepared in Middlebrook 7H9 medium (Becton Dickinson, USA) supplemented with 0.1% OADC and glycerol, incubated under continued shaking at 200 g at 37 • C (optimum growth temperature) for 4 weeks for the rapid growing mycobacteria (M. fortuitum and M. malmesburii sp. nov.) and 6 weeks for the slow growing NTM (M. kansasii and M. nonchromogenicum) or until turbid growth was observed. The liquid cultures were screened for contaminants before PPDs were prepared by spread plating each culture on two nutrient agar plates. The plates were incubated at 25 and 37 • C respectively, and evaluated after 2 and 5 days for fungal or any bacterial growth not typical of mycobacteria.
Whole Genome Sequencing, Assembly, and Annotation DNA was extracted from bacterial cultures (M. fortuitum, M. nonchromogenicum, M. komanii sp. nov., and M. malmesburii sp. nov.) in Löwenstein -Jensen medium (Becton Dickinson, USA) using the Qiagen nucleic acid extraction kit (Whitehead Scientific, South Africa). Genomic DNA paired-end libraries were generated using the Nextera DNA sample preparation kit (Illumina) and indexed using the Nextera index kit (Illumina). Sequencing was performed as paired-end reads (2 × 250 bp) employing the Illumina Miseq system using the Miseq reagent kit v2 at the Agricultural Research Council, as per manufacturer's instructions. De novo assembly of the sequencing reads was performed using SPAdes, which mainly involves construction of DNA sequences of contigs and the mapping of reads to contigs (Bankevich et al., 2012). Each assembly was evaluated using QUAST (Gurevich et al., 2013). The assembled genome sequences were annotated using Prokka annotation pipeline (Seemann, 2014). This involved predicting tRNA, rRNA, and mRNA genes in the sequences and assigning putative gene products to the protein-coding genes (CDSs) based on their similarity to sequences in a database of curated Mycobacterium genes. To gauge the similarity of the NTM genomes to those of M. bovis AF2122/97 (NC 002945.3) and M. tuberculosis H37Rv (NC 000962.2), alignment of each of the processed sets of reads to these reference genomes was performed using Burrow's Wheeler Aligner (BWA) (Li and Durbin, 2009). BLAST Ring Image Generator (Alikhan et al., 2011) was used to visualize the pairwise BLAST results of CDS features in M. bovis and each NTM annotation against that of M. tuberculosis and against the respective closest NTM relative.

Investigating the Closest Sequence Relatives of the NTM Strains
Basic local Alignment Search tool (BLAST) search of the largest contigs from each NTM assembly was performed against the NCBI Genbank database (http://www.ncbi.nlm.nih. gov/genbank/) to identify the closest known sequence relative of each NTM species. BLAST searches were also performed for the annotation CDS features for each NTM species against the reference genomes for their respective best hits and visualized using BLAST Ring Image Generator (BRIG) (Alikhan et al., 2011).

Identification of Predicted Immunogenic Proteins in NTM Genomic Annotations
BLAST searches were conducted to quantify the similarity of the annotated amino acid sequences in the NTM assemblies (M. fortuitum, M. malmesburii sp. nov., M. komanii sp. nov., and M. nonchromogenicum) to immunogenic proteins in M. bovis, M. tuberculosis, and M. smegmatis. Esx family proteins, PE and PPE proteins situated within the ESX-1 to ESX-5 clusters were chosen for this investigation (see Table 1 for the complete list of targeted proteins). BLASTP searches against the NCBI Genbank database were also conducted for the respective NTM protein sequences to quantify similarity to other sequences in the database. Multiple amino acid sequence alignments of the individual proteins: (Esx, PE, PPE situated in ESX-1 and ESX-3) and from different mycobacteria were performed using Clustalw (Thompson et al., 1994)

PCR Verification of the Presence of Genes in ESX-1, ESX-3 as well as esxN in NTM
We set out experiments to confirm the presence of ESX-1 in the four NTM genomes by PCR and sequencing of the two most investigated genes in this region: esxA and esxB. The presence of ESX-3 in the four newly sequenced NTM was verified by amplification of esxH. Since it was previously shown by hybridization experiments that ESX-5 might be absent in Mycobacterium nonchromogenicum we verified the presence of esxN ortholog (situated in ESX-5 locus) in M. nonchromogenicum by PCR and sequencing. The primers used for amplification of esxA were MSMEG_0066 -F (5 ′ -AATTTCGCCGGTATCGAGGG-3 ′ ) located at position 19-38 bp of the MSMEG_0066 gene (an M. smegmatis MC 2 155 esxA ortholog), and MSMEG_0066-R (5 ′ CAGGCAAACATTCCCGTGAC 3 ′ ) located at position 278-268 bp of the esxA ortholog of M. smegmatis MC 2 155 (MSMEG_0066).

Sequencing and Sequence Analysis
Sequencing of the PCR products was done at Inqaba Biotechnologies (Pretoria, South Africa). Sequencing was performed in both directions using the forward and reverse primer sequences that were initially used for amplification. Sequences from both strands were edited manually and pairwise Frontiers in Microbiology | www.frontiersin.org alignments undertaken using the BioEdit Sequence alignment editor (version 7.1.9) and Molecular Evolutionary Genetics Analysis (MEGA) platform (http://www.megasoftware.net) (version 6) (Tamura et al., 2013). The resulting consensus sequences were analyzed on the NCBI platform for gene sequence identity/similarity using BLAST (http://blast.ncbi.nlm. nih.gov/Blast.cgi).

PPD Production
PPDs were prepared from M. nonchromogenicum (PPD-N), M. malmesburii sp. nov. (PPD-M), M. fortuitum ATCC 6841 (PPD-F), and M. kansasii ATCC 12478 (PPD-K). Three preparations of each PPD [one at a different time interval (first preparation) and two done at the same time (second and third preparations)] were done following a modified protocol by Landi (1963). Briefly, liquid cultures were inactivated by steaming at 121 • C for 20 min. and filter-sterilized using the Whatman 40 filter paper and a vacuum pump. Each filtrate was then precipitated by adding 40% trichloroacetic acid (TCA) to a final concentration of 4% v/v) and left for at least 12 h at 4-8 • C. Afterwards the precipitated filtrates were mixed manually by shaking and centrifuged at room temperature for 20 min at 3900 g. The supernatants were discarded and the pellets washed twice by suspending them in 1% TCA and careful mixing, followed by centrifugation at 3900 g for 20 min at room temperature. The supernatants were discarded and the pellets suspended in 10% NaCl, then centrifuged for 20 min at 3900 g. After discarding the supernatant, the pellet was harvested by turning the tube upside down, on a piece of sterile filter paper and allowed to dry, weighed, diluted with 0.005% PPD buffer (0.005% Tween 80 in PBS: PH = 7.38) and stored at 4-8 • C until peptide digestion. The purified protein derivatives from M. bovis (PPD-B) and M. avium (PPD-A) were obtained from Prionics at the Netherlands.

In-Solution Digest of PPDs with Trypsin (IAA)
Trypsin digestion of each PPD preparation was done at the Centre for Proteomics and Genomics Research (CPGR, South Africa). All reagents used were analytical grade or equivalent. Twenty microgram of protein was dissolved in 10 µl of 50 mM triethylammonium bicarbonate (TEAB; Sigma). Protein was reduced by adding 1 µl of 100 mM tris (2-carboxyethyl) phosphine (Sigma) prepared in 50 mM TEAB, and incubated at 60 • C for 1 h. Samples were cooled to room temperature and the protein was alkylated by adding 1 µl of 200 mM iodoacetamide (Sigma) prepared in 50 mM TEAB. Samples were incubated at room temperature in the dark for 30 min. The sample volume was adjusted to 50 µl with 50 mM TEAB and then 5 µl of 1 µg/µl trypsin (Promega), prepared in MilliPore water, was added and digestion was allowed to take place at 37 • C for 18 h, followed by vacuum centrifugation.

LC-MS/MS Analysis
LC-MS/MS analysis was conducted in triplicate for NTM PPDs from the second and the third preparations and once for the first preparation and the commercial PPDs (PPD from M. bovis and from M. avium) using a Q-Exactive quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific, USA) coupled with a Dionex Ultimate 3000 nano-HPLC system at CPGR. The mobile phases consisted of solvent A (0.1% formic acid in water) and solvent B (80% ACN, 10% water, and 0.1% formic acid). The peptides (as an estimate 500 ng for each sample) were resuspended in sample loading buffer (95% water, 5% Acetonitrile, 0.05% TFA) and loaded on a C18 trap column (100 µm × 20 mm × 5 µm). Chromatographic separation was performed with a C18 column (75 µm ×1 50 mm ×3 µm). The gradient was delivered at 300 nl/min and consisted of a linear gradient of mobile phase B initiating from solvent B, 6-60% over 156 min. The mass spectrometer was operated in positive ion mode with a capillary temperature of 250 • C. The applied electrospray voltage was 1.95 kV. Details of data acquisition are listed in Table 2.

Database Searching
All MS/MS samples were analyzed using Mascot (Matrix Science, London, UK; version 2.4.1) and X! Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1). Mascot was set up to search the Mycobacterium database (derived from UniProtKB, 843597 entries) assuming the digestion enzyme trypsin. X! Tandem was set up to search a subset of the Mycobacterium database including also assuming trypsin. Mascot and X! Tandem were searched with a fragment ion mass tolerance of 0.020 Da and a parent ion tolerance of 10.0 PPM. Carbamidomethyl of cysteine were specified in Mascot and X! Tandem as fixed modifications. Gln->pyro-Glu of the nterminus, deamidation of asparagine and glutamine, oxidation of methionine were specified in Mascot and X! Tandem variable modifications.

Criteria for Protein Identification
Scaffold (version Scaffold_4.3.4, Proteome Software Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 9.0% probability to achieve an FDR less than 0.1% by the Scaffold Local FDR algorithm (Keller et al., 2002). Protein identifications were accepted if they could be established at >100.0% probability to achieve an FDR less than 1.0% and contained at least four identified peptides (Keller et al., 2002). Protein probabilities were assigned by the Protein Prophet algorithm (Nesvizhskii et al., 2003). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony.
FIGURE 1 | Alignment of M. bovis and NTM predicted CDS regions to those of M. tuberculosis visualized using BRIG. Immunogenic proteins of interest are highlighted in red (also see Table 1). De novo Assembly of the NTM DNA Sequence Reads

Alignment of the
The result for the assembly of the DNA sequence reads is summarized in Table 3. The number of contigs, the length of the largest contig as well as the N50 (the length for which the set of contigs of that length or longer contains at least half the assembly bases) and L50 (the number of contigs of that length for each assembly is presented). The overwhelming majority of the sequenced bases fall within the few largest contigs, but each assembly also produced a large number of very short contigs. The assembly for M. komanii sp. nov. produced 63 contigs, with its largest contig 292, 570 bases in size. While the assembly for M. nonchromogenicum produced the largest contig of all (425, 774 bases), it also produced the highest number of contigs (642 bases). The M. fortuitum and M. malmesburii sp. nov assemblies provided high total numbers of contigs (179 and 255 respectively), their largest contigs were 195, 004 and 135, 683 bases respectively.

Annotation of the Assembled Genome
The summary of the annotation is provided in  Visualization of the alignment of the M. bovis and the predicted CDS regions of the four NTM species, i.e., M. nonchromogenicum, M. malmesburii sp. nov., M. komanii sp. nov., and M. fortuitum to those of M. tuberculosis H37Rv using BLAST Ring Image Generator (BRIG) is given in Figure 1, while alignment of each NTM predicted CDS to their closest relatives is  Pittius et al., 2001Pittius et al., , 2006Newton-Foot et al., 2016). In-order to confirm the orthology of the NTM esx and pe/ppe genes to those of M. bovis/M. tuberculosis as well as the potential of their protein products to be secreted, we also investigated the newly sequenced NTM genomes for the adjacent genes situated in the ESX loci, as well as the genomic organization of these loci. The organization of these genes in the ESX-loci is illustrated in Figure 6. EsxA and esxB as well as esxG and esxH were found next to each other in the genomes of the newly sequenced NTM. The pe35 and ppe68 genes were found to occur next to each other in the genomes of the four newly sequenced NTM. One member of the pe5/ppe4 pair (ppe4), was not detected in any of the newly sequenced NTM. Comparing the ESX loci of M. tuberculosis and M. bovis to those of the newly sequenced NTM, we found that, all the genes of the ESX-1 locus and their organization (as known to occur in M. tuberculosis and M. bovis) except espJ (whose function is unknown), were FIGURE 4 | Alignment of M. malmesburii sp. nov. predicted CDS regions to those of M. rhodesiae NBB3 (CP003169.1) using BRIG. Immunogenic proteins of interest identified in M. rhodesiae are highlighted in red (dnaK: locus tagMycrhN_1341, product chaperone protein DnaK; mpt70: locus tag MycrhN_3596, product secreted/surface protein with fasciclin-like repeats; canA_1: locus tag MycrhN_1479, product sulfate permease-like transporter, MFS superfamily; canA_2: locus tag MycrhN_2217, product carbonic anhydrase; canA_3: locus tagMycrhN_2307, product isoleucine patch superfamily enzyme, carbonic anhydrase/acetyltransferase; canA_4: locus tag MycrhN_3599, product isoleucine patch superfamily enzyme, carbonic anhydrase/acetyltransferase; canA_5: locus tag MycrhN_3776, product carbonic anhydrase. Carbonic anhydrase genes have been numbered according to genomic position).
To quantify amino acid sequence homology of the EsxA and EsxB of Mycobacterium fortuitum in this study to other species in the NCBI database we conducted BLASTP searches. Interestingly, the BLAST searches using the M. fortuitum EsxA and EsxB amino acid sequences revealed the existence of their orthologs in additional NTM species, most notably M. vulneris (98% similarity to M. fortuitum at amino acid level for both proteins), M. mageritense (82 and 74% similarity to M. fortuitum respectively, at amino acid level), and M. farcinogenes (96 and 92% similarity to M. fortuitum respectively, at amino acid level).

Verification of the Presence the esx Genes in the Newly Sequenced NTM
The presence of esxA and esxB were confirmed in M. fortuitum by PCR and sequencing using primers designed from M. smegmatis sequences followed by BLAST searches. We demonstrated the presence of both these gene fragments in Mycobacterium fortuitum ATCC 6841 whose nucleotide sequences were 89 and 85% identical to the M. smegmatis esxA and esxB orthologs. nov., (84% identical to the M. smegmatis ortholog). The M. nonchromogenicum ortholog did not amplify when these primers were used.
Since DNA-DNA hybridization analysis by Gey van Pittius et al. (2006) previously suggested that ESX-5 may be absent in M. nonchromogenicum or the species was evolutionary so far removed from the slow growers that gene homology was insufficient to allow hybridization, we set out to verify the presence of esxN in this species by PCR and sequencing using primers designed from the esxN/rv1793c gene of M. tuberculosis. The highest BLAST hit of the sequenced gene fragment reveals that this sequence was 91% closely related to Mycobacterium species JDM601 esxN gene. tuberculosis is illustrated by the amino acid alignment in Figure 8 and immunogenic epitopes as demonstrated by Vordermeier et al. (2003Vordermeier et al. ( , 2007 are underlined. Comparing the CFP-10 amino   . *Represents identical sequences observed in all species, and >indicates presence of the same aa residue in at least one of the NTM species. Highly immunogenic epitopes of M. bovis as described by Vordermeier et al. (2000Vordermeier et al. ( , 2003Vordermeier et al. ( , 2007 are underlined. . *Represents identical sequences observed in all species, and >indicates presence of the same aa residue in at least one of the NTM species. Highly immunogenic epitopes of M. bovis as described by Vordermeier et al. (2000Vordermeier et al. ( , 2003Vordermeier et al. ( , 2007

TB10.4 (EsxH)
Alignment of the EsxH amino acid sequences of the NTM, M. bovis, and M. tuberculosis species is illustrated in Figure 9 and immunogenic epitopes described by Skjøt et al. (2002) are underlined. For the epitope at position 11-28, amino acid sequence similarities of 61% were observed between each EsxH orthologs of M.

PPE68 (Rv3873)
Alignment of the corresponding amino acid sequences of ppe68 orthologs of the NTM, M. bovis, and M. tuberculosis is illustrated in Figure 10. The immunodominant M. tuberculosis epitope "VLTATNFFGINTIPIALTEMDYFIR" described by Mustafa (2014) was also found in M. fortuitum, M. komanii, and M. nonchromogenicum showing 18/25 (72%) of the amino acid residues to be identical between the NTM and M. tuberculosis. M. smegmatis showed 19/25 (76%) amino acid sequence identity of this region to that of M. tuberculosis.

Identification of Proteins Present in the PPD Preparations
Using Mass spectrometry, 608 different proteins were identified in all the PPD preparations combined (consensus from the different analysis) (Supplementary data). One hundred and thirty two were identified in PPD-B. Among the NTM PPDs, PPD-A showed the highest number of proteins (n = 225) followed by PPD-F (n = 193), PPD-N (n = 189), and PPD-M (n = 136) while 39 proteins were identified in PPD-K. The molecular mass of the majority of the proteins was found to be in the range of 10-50 kDA. Twenty two proteins were identified as shared between the NTM PPD preparations and PPD-B  Table 7). Identities and functions of these proteins are also listed in Supplementary Table 8. These proteins include previously characterized immunodominant MTBC proteins like: the 10 kDa culture filtrate protein/CFP-10 (shared between PPD-B and PPD-M); the 10 kDa chaperonin GroES (shared between the NTM PPDs except PPD-M, and PPD-B); 60 kDa chaperonin GroEL (shared by all the NTM PPDs and PPD-B); Chaperone protein DNAK (shared by all the NTM PPDs and PPD-B); Secreted antigen Ag85C (shared between PPD-A and PPD-B), 6 kDa early secretory antigenic target/ESAT-6 (shared between PPD-K and PPD-B) and EsxN (shared between PPD-K and PPD-B). The Venn diagrams in Figure 11 illustrate the number of shared proteins between PPD-B, PPD-A and different NTM PPDs, as well as unique proteins of each PPD. There was a higher degree of protein overlap between PPD-A and PPD-B (16/22 shared proteins) than between the other NTM PPDs and PPD-B. Eight proteins were shared between PPD-B and each of the following NTM PPDs: PPD-M, PPD-while 11 proteins were shared between PPD-B and PPD-K and 10 between PPD-B and PPD-N. One hundred and ten proteins were identified as unique to PPD-B. Four hundred and seventy seven proteins were identified as unique to NTM. Some of the proteins detected only in NTM PPDs have been previously characterized as immunogenic in M. bovis/M. tuberculosis and were also in the NTM database entries.

DISCUSSION
In this study we first compared whole genome sequences of four NTM species to those of M. tuberculosis and M. bovis, mainly focusing on the presence in NTM of genes homologous to those encoding immunogenic proteins of the Esx and the PE/PPE families in M. bovis/M. tuberculosis. As a second approach, comparative proteomic analysis of PPDs derived from four NTM species to those of commercially available PPDs of M. bovis and M. avium-was conducted to identify the actual presence of homologous proteins. Three of the investigated NTM species i.e M. nonchromogenicum, M. malmesburii sp. nov., and M. komanii sp. nov. had previously been described as being among the most abundant NTM in cattle, buffalo and the environments of these animals in South Africa. M. fortuitum ATCC 6841 and its PPD was chosen because it is currently used in the modified BOVIGAM TM assay (Michel et al., 2011;Gcebe et al., 2013). M. malmesburii sp. nov and M. komanii sp. nov. strains used in this study were both isolated from cattle nose mucous membranes and M. nonchromogenicum from soil in South Africa (Gcebe et al., 2013). M. malmesburii sp. nov. and M. komanii sp. nov. are novel rapidly growing mycobacteria (RGM) (colonies taking <7 days to appear on solid medium). M. nonchromogenicum (a slow growing mycobacterial (SGM) species (colonies take >7 days to appear on solid growth medium) and M. fortuitum (a RGM) have also been isolated from cattle tissue in Great Britain, France, and Northern Ireland (Pollock and Anderson, 1997;Hughes et al., 2005;Vordermeier et al., 2007;Biet and Boschiroli, 2014) These two species were also reported to be among the most frequently isolated NTM in cattle in other parts of Africa (Diguimbaye-Djaibé et al., 2006;Berg et al., 2009). Exposure of cattle to these NTM species may constitute a possible cause of cross-reactive immune response to PPD and may lead to misdiagnosis of BTB. This further highlighted the need to investigate these NTM in much more detail in terms of their genetic make-up as well as the proteomes of their PPDs in order to assess the presence of immunogenic proteins in NTM potentially causing cross-reactivity with MTBC antigens. Comparative genomics revealed that overall there was very little similarity between the NTM genomes and those of M. bovis and M. tuberculosis. Despite these differences, orthologs of genes encoding for the most investigated antigens of the Esx family of MTBC including EsxA/ESAT-6, EsxB/CFP-10, EsxH/TB10.4, EsxG/TB 9.8 and EsxN; and the PE/PPE family proteins (PE35, PE5 and PPE68) were identified in the four newly sequenced NTM (Skjøt et al., 2000;Vordermeier et al., 2007;Zvi et al., 2008;Hoang et al., 2013;Amoudy et al., 2014;Mustafa, 2014). Only ESX-1, ESX-3, and ESX-4 loci were detected in the newly sequenced RGM, while genes of ESX-2 as well as esxN of ESX-5 were also identified in M. nonchromogenicum by whole genome analysis. The presence of esxA and esxB (situated within the ESX-1 locus) were confirmed in M. fortuitum by PCR and sequencing, while esxN (situated in ESX-5 locus) was confirmed in M. nonchromogenicum also by PCR and sequencing. The esxA and esxB could not be amplified in M. nonchromogenicum, M. malmesburii sp. nov., and M. komanii sp. nov. using M. tuberculosis/M. bovis (result not shown) as well as M. smegmatis primers, probably due to huge nucleotide sequence differences between these species. The esxH gene (situated in ESX-3 locus) was also confirmed by PCR and sequencing in three of the NTM except M. nonchromogenicum using M. smegmatis derived primers. This was possibly due to little homology between the M. nonchromogenicum esxH sequence and that of M. smegmatis, for these primers to anneal. The esxA, esxB, and esxH genes were however, identified in all the four NTM genomes by whole genome sequence analysis.
Studies have shown that the esx genes like esxA and esxB as well as esxG and esxH interact to form a 1:1 heterodimer which may be essential for their secretion (Uplekar et al., 2011). Likewise the pe and ppe genes are thought to form pairs (Gey van Pittius et al., 2006;Uplekar et al., 2011). In this study the esxA and esxB; esxG and esxH; pe35 and ppe68 genes were found to occur next to each other in the genomes of the four newly sequenced NTM. This confirmed that esxA, esxB, esxG, esxH, pe35, and ppe68 genes in these loci were true orthologs of M. tuberculosis and M. bovis esx, pe, and ppe genes. A closer inspection of the genes adjacent to the esx, pe, and ppe genes situated within the ESX-loci revealed the occurrence of orthologs of genes encoding for the ATP-dependent chaperones of the AAA family, membrane bound ATPAses, transmembrane proteins chaperonin, and mycosin-subtilin like proteins. These genes form part of the type VII secretory system, therefore if expressed in these NTM, the Esx, PE, and PPE proteins may be secreted.
In NTM; ESAT-6, CFP-10, TB10.4, PE35, PE5, and PPE68 antigens of M. bovis and M. tuberculosis have been mainly shown to cross-react with homologs of other pathogenic Mycobacterium species that are phylogenetically closely related to MTBC including M. kansasii, M, marinum, M. ulcerans, as well as in M. leprae (Geluk et al., 2002(Geluk et al., , 2004Skjøt et al., 2002;Vordermeier et al., 2007;Amoudy et al., 2014;Mustafa, 2014). Identification of the genes of the Esx family and the PE/PPE family proteins in nonpathogenic NTM in this study add to earlier reports of their occurrence in nonpathogenic M. smegmatis, Mycobacterium sp. MCS, Mycobacterium sp. JLS, Mycobacterium phlei, Mycobacterium thermoresistible, Mycobacterium gilvum PYR-GCK, Mycobacterium vaccae, Mycobacterium vanbaalenii, and Mycobacterium sp. KMS (Gey van Pittius et al., 2006;Newton-Foot et al., 2016). Expression of the proteins of the Esx family, the PE/PPE family as well as the other antigens in nonpathogenic NTM have not been investigated. For that reason we did comparative investigation for the presence of these proteins in PPDs of three of the nonpathogenic NTM: M. fortuitum (PPD-F), M. malmesburii sp. nov. (PPD-M) and M. nonchromogenicum (PPD-M) and in addition M. kansasii (PPD-K) compared to M. bovis (PPD-B) and M. avium (PPD-A) by Mass Spectrometric analysis. Mycobacterium kansasii derived PPD was included in this study, since its antigens have been investigated for their cross reactivity with M. bovis (Vordermeier et al., 2007). This bacterium has been isolated from cattle in Great Britain and the United States of America (Vordermeier et al., 2007;Thacker et al., 2013). Even though proteins are largely degraded during PPD preparation, we identified as many as 609 proteins in the combined six PPDs and we found 22 of these proteins to be shared between PPD-B and one or more of the NTM PPDs, including PPD-A. In the different NTM PPDs we identified homologs of some of previously characterized immunogenic MTBC proteins including CFP-10 (present in PPD-M), GroES (present in all PPDs except PPD-M), GroEL (present in all PPDs), EsxN (PPD-K), ESAT-6 (PPD-K), and DnaK (all PPDs except PPD-M). The occurrence of these immunogenic antigens in nonpathogenic RGM PPDs could potentially lead to cross-reactive immune responses that may interfere with the diagnosis of bovine tuberculosis. In fact these proteins may be responsible for cross-reactivity seen between the M. fortuitum PPD and M. bovis PPD in South Africa (Michel et al., 2011).
In an attempt to predict cross-reactivity of the NTM homologs of ESAT-6, CFP-10, EsxH, and PPE68 more precisely, we compared amino acid sequences in these proteins with those of defined immunogenic epitopes in the proteins of M. bovis and M. tuberculosis (Skjøt et al., 2000;Vordermeier et al., 2003Vordermeier et al., , 2007Mustafa, 2014). This was done despite the fact that some of the genes encoding for these proteins were identified at DNA level, but not represented as proteins in the respective NTM PPD preparations. These proteins, with the exception of M. malmesburii sp. nov.'s CFP-10 which was found in PPD-M, could have been degraded during PPD preparation, and possibly differential sensitivity for procedures of PPD production and trypsin treatment before Mass spectrometry or were not abundantly expressed in NTM to allow detection by LC-MS/MS (Borsuk et al., 2009). In all cases as illustrated in the amino acid alignments in Figures 6-10, homologies of >50% of the previously characterized immunogenic epitopes in M. bovis were seen in the different NTM homologs. For instance, six epitopes of M. bovis ESAT-6 recognized by bovine T-cells in the context of BoLA-DR and BoLA DQ, as described by Vordermeier et al. (2003Vordermeier et al. ( , 2007 had amino acid sequence homologies of as high as 81.28% and as low as 52.9% to those of the M. fortuitum and M. nonchromogenicum orthologs. Likewise amino acid sequence homologies of >50% between the four M. bovis CFP-10 immunogenic epitopes and those of the NTM species was observed. The same was seen with PPE68 immunogenic epitope (VLTATNFFGINTIPIALTEMDYFIR) described by Mustafa (2014) where 72% of the amino acid residues were identical between the NTM and M. tuberculosis. Still with the sequence homology of <100% of the NTM CFP-10, ESAT-6, PPE68, and EsxH epitopes and the M. bovis antigens, we could not unambiguously predicts T-cell cross reactivity. Several studies have reported contrasting results with regards to the relation of sequence identity and cross-reactivity of antigens. For instance, antigen cross-recognition has been observed with M. leprae ESAT-6 and CFP-10 on M. tuberculosis patients despite very low sequence identity (36 and 40% respectively at amino acid level) to the M. tuberculosis homologs (Geluk et al., 2002(Geluk et al., , 2004. Likewise, Hewinson et al. (2006) also demonstrated that sequence identity between epitopic regions from unrelated otherwise non-homologous mycobacterial antigens of >50% in the 16-20 mer regions indicated cross-reactivity in cattle . Contrary, other peptides that displayed similar degrees of sequence identity were not cross-reactive .
The NTM homologs in this study could potentially induce cross reactivity against MTBC antigens and should be tested in animal experiments.
In conclusion, in this study we identified genes and proteins present in and expressed by members of the Mycobacterium tuberculosis complex known as markers for BTB diagnosis in selected nonpathogenic NTM genomes and proteomes. The identification of the genes encoding ESAT-6, CFP-10, PE35, PPE68, and PE5 in nonpathogenic NTM in this study adds to earlier findings of these genes in nonpathogenic like M. smegmatis and Mycobacterium sp. KMS. Expression of one of the most studied M. bovis immunogenic proteins, CFP-10 in M. malmesburii sp. nov. and that of other previously characterized MTBC immunogenic proteins such as GroES, DnaK, GroEL was shown by mass spectrometric analysis of the PPDs. We also identified immunogenic epitopes of ESAT-6, CFP-10, EsxH, and PPE68 through genomic analysis, in some or all of the four newly sequenced nonpathogenic NTM, also suggesting the potential of these proteins, to elicit cross reactive immune responses against MTBC antigens. The NTM homologs of the immunogenic proteins need to be further investigated for their cross-reactivity with the M. bovis antigens and consequently their interference in BTB assays.

AUTHOR CONTRIBUTIONS
All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.