Genome-Wide Sequence Variation among Mycobacterium avium Subspecies paratuberculosis Isolates: A Better Understanding of Johne’s Disease Transmission Dynamics

Mycobacterium avium subspecies paratuberculosis (M. ap), the causative agent of Johne’s disease, infects many farmed ruminants, wild-life animals, and recently isolated from humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole-genome sequences of several M. ap and M. avium subspecies avium (M. avium) isolates to gain insights into genomic diversity associated with variable hosts and environments. Using Next-generation sequencing technology, all six M. ap isolates showed a high percentage of similarity (98%) to the reference genome sequence of M. ap K-10 isolated from cattle. However, two M. avium isolates (DT 78 and Env 77) showed significant sequence diversity (only 87 and 40% similarity, respectively) compared to the reference strain M. avium 104, a reflection of the wide environmental niches of this group of mycobacteria. Within the M. ap isolates, genomic rearrangements (insertions/deletions) were not detected, and only unique single nucleotide polymorphisms (SNPs) were observed among M. ap isolates. While more of the SNPs (~100) in M. ap genomes were non-synonymous, a total of ~6,000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomics had a enough discriminatory power to differentiate between isolates from different hosts but yet suggesting a bovine source of infection to other animals examined in this study. Interestingly, the human isolate (M. ap 4B) was closely related to a M. ap isolate from a dairy facility, suggesting a common source of infection. Overall, the identified phylo-genomes further supported the idea of a common ancestor to both M. ap and M. avium isolates. Genome-wide analysis described here could provide a strong foundation for a population genetic structure that could be useful for the analysis of mycobacterial evolution and for the tracking of Johne’s disease transmission among animals.


INTRODUCTION
Infection with Mycobacterium avium subspecies paratuberculosis (M. ap) causes Johne's disease, or paratuberculosis, in a large number of ruminants and wild-life animals (Collins et al., 1994b). The combination of low milk yield and mortalities caused by Johne's disease significantly impacts the economics of the dairy industry in the US and worldwide (Barrett et al., 2006;Raizman et al., 2009). Both infected and clinically ill animals can shed M. ap in their feces, a common source of infection, especially to young calves through a fecal-oral route (Collins et al., 1994a). The infected animals usually have a prolonged subclinical phase which eventually leads to severe gastroenteritis and death. The disease affects mainly ruminants with some evidence suggesting an association between M. ap and Crohn's disease in humans (Naser et al., 2002(Naser et al., , 2004Over et al., 2011). As expected, M. ap isolates from variable hosts were the subject of several genetic analyses to decipher a potential role for genetic variations on M. ap virulence and pathogenesis.
For example, approaches based on PCR amplification of specific targets (e.g., IS1311, IS900; Whittington et al., 2000) and short sequence repeats (SSR; Sevilla et al., 2008) revealed isolate variation on a genetic level. Recently, DNA microarrays were applied to examine variations on a whole-genome level (Paustian et al., 2005;Wu et al., 2006) which provided a comprehensive analysis of large scale differences among examined isolates. However, events of genomic rearrangements (insertions/deletions, Indels) were not easily identified. In this report, we resorted to a high throughput sequencing strategy to address our hypothesis linking genomic diversity to mycobacterial adaption to variable host and environments where they replicate and persist.
Several studies attempted to examine the genomic polymorphisms among M. avium complex (MAC) strains to better identify mycobacterial species associated with infections. In one study, dnaJ sequence revealed a limited genomic diversity among human and veterinary strains (Morita et al., 2004). However, comparative genomic hybridization revealed more diversity among M. avium, M. ap, and M. avium subsp. silvaticum (Semret et al., 2004;Paustian et al., 2005). Despite a 95% similarity at the nucleotide level between M. avium and M. ap, long oligonucleotide microarrays were able to assess genomic diversity among the genomes of MAC members (Semret et al., 2004;Wu et al., 2006). Fortunately, the complete genome sequence of M. ap K-10 was reported in 2005 and has been revised and re-sequenced recently (Li et al., 2005;Wu et al., 2009;Wynne et al., 2010) which allowed more detailed comparative analysis of several M. ap strains. For the M. tuberculosis complex, the presence of historical data and documented isolates collection helped in better understanding of the origin and evolution of members of this group (Behr and Small, 1999;Smith et al., 2009). Unfortunately, such records are not available for members of MAC.
With the advancements in next-generation sequencing technology (Bentley et al., 2008), we took advantage of Illumina-based technology to decipher the genome contents of M. ap isolates from various animals and their environments. This technology allows us to compare genomic sequences on unprecedented level, the nucleotide level, with high speed and accuracy. With the help of an array of bioinformatics tools, we were able to analyze the genomes of eight mycobacterial isolates including the ATCC 19698, a widely used M. ap isolate in several virulence and pathogenesis studies (Tanaka et al., 1994;Shin et al., 2006;Van et al., 2010). Most M. ap isolates had a high level of sequence similarity to the reference, M. ap K-10 strain on a genome-wide scale even when human isolates were analyzed (Wynne et al., 2011). However, the genomes of two M. avium isolates had lower level of sequence identity to the M. avium 104 genome, the reference genome for M. avium subsp. hominissuis (MAH). Overall, genomic rearrangements represented by large scale inversions and deletions were found between M. ap and M. avium genomes. However, single nucleotide polymorphisms (SNPs) were the most common variations observed among M. ap isolates from different animals despite the bovine origin for all of these isolates. The observed genomic polymorphism among MAC isolates provided us a better understanding for the evolutionary forces active on both closely related organisms with different characteristic phenotypes.

BACTERIAL STRAINS
Six strains of M. avium subspecies paratuberculosis (M. ap) from different hosts and environments and two M. avium subspecies avium (M. avium) were selected (  . Environmental isolates were obtained from the soil and utensils of dairy farms and provided by National Veterinary Service Laboratories at Ames, IA. The identity of each strain was confirmed and genotyped as M. ap versus M. avium by PCR analysis of three genes (16S rRNA, IS1311, hsp65) and growth phenotype in presence/absence of mycobactin J as outlined before . Each strain was cultured in Middlebrook 7H9 broth media supplemented with 10% ADC (2% glucose, 5% bovine serum albumin factor V, and 0.85% NaCl), 0.05% Tween 80 at 37˚C. For M. ap isolates, 2 μg/ml of Mycobactin J was added.

GENOMIC DNA EXTRACTION
Five to 10 ml of bacterial cultures at mid-log phase were used for DNA extraction and isolation. Briefly, M. ap cultures were centrifuged down at 10,000 rpm for 3 min and pellets were then resuspended in sterile Tris-EDTA buffer. Bacterial cells were killed at 80˚C for 20 min before adding lysozyme (10 μl of 100 mg/ml) for an overnight incubation at 37˚C. After cell lysis, 12 μl of 20 mg/ml proteinase K/pellet was added and incubated at 65˚C for 2-3 h. For DNA isolation, 100 μl of 5 M NaCl/pellet was added and incubated at 65˚C for 10 min, followed by adding 80 μl of CTAB/NaCl and then incubated for another 10 min at 65˚C. In addition, equal volume of phenol-chloroform-isoamyl alcohol (25:24:1) was added to each tube and centrifuge at 10,000 rpm for 5 min at room temperature. The aqueous upper layers were collected and transferred to a fresh tube for washing with equal volume of chloroformisoamyl alcohol (24:1) and then with isopropanol followed by incubation at −20˚C for at least 1 h. Genomic DNA was precipitated by centrifugation at 10,000 rpm for 15 min followed by washing in 75% ethanol. After wash, DNA pellets were dried in a Speed-Vac and resuspended in nuclease-free sterile water.

WHOLE-GENOME SEQUENCING
Purified genomic DNA (1-5 μg) samples isolated from each target strain were sent to the Genomic Resource Center (GRC) at the University of Maryland for Illumina GAIIx whole-genome sequencing with multiplexing using the sample preparation oligonucleotide kit from Illumina. The integrity and concentration of DNA was checked by GRC again and followed by fragmentation of DNA using nebulization. DNA ends were repaired and A-tails and adaptors were added using Illumina protocols. The desired size of DNA (around 200 bp) were selected and then amplified with adaptor specific primers that contained four nucleotide barcode tags. The amplified DNA libraries were analyzed by Agilent Bioanalyzer to determine the size and the concentration of DNA fragments. DNA libraries were loaded onto eight channel flow cell. DNA fragments were denatured and hybridized to the oligonucleotides in flow cells followed by and amplification step to form clusters. The flow cells were then transferred to Genome analyzer II for sequencing. For paired-end sequence reads, the amplicons were flipped on the flowcells so the other end can be read as described before (Quail et al., 2008). At the end of each run, four images were collected and used for base-calling.

SEQUENCE ASSEMBLY AND ALIGNMENTS
Raw sequences obtained from the Illumina GAIIx were analyzed using the CLC Genomic Workbench software (version 4.0.3, CLC Bio, Cambridge, MA, USA) to perform both de novo and comparative reference assembly. For the M. ap genomes, all sequences were assembled in reference to the revised M. ap K-10 sequences (Wynne et al., 2010). The M. avium DT 78 genome was assembled using the genome of M. avium subsp. hominissuis 104 (NCBI accession NC 008595) as a reference. The de novo assembly was used for the genome sequence of Env 77 strain because of the lack of significant similarity to other genomes. Additionally, the MAUVE algorithm was used to align paired or multiple genomes for comparative purposes, as outlined before (Perna et al., 1998;Darling et al., 2010). The gapped consensus sequence of each strain was imported to MAUVE for sequence alignment at default seed weight setting.

SINGLE NUCLEOTIDE POLYMORPHISMS ANALYSIS
For SNPs detection, we used algorithms implemented in CLC Bio Workstation. Criteria for identifying SNPs included a coverage range setting at 10-55 reads and a presence frequency in at least 50% of the reads before consideration for further analysis. A randomly selected number of SNPs were further analyzed using Sanger sequencing to confirm Next-Generation sequencing data. The primers were designed to cover 10 possible SNPs. The BigDye Terminator (Applied Biosystems, Foster City, CA, USA) version 3.1 cycle sequencing kit was used for sequencing. The sequencing PCR included an initial denaturation cycle at 95˚C for 5 min followed by 35 cycles of 95˚C for 20 s, 45˚C for 30 s and 60˚C for 2 min with a final extension at 72˚C for 7 min. All samples were sent to the Biotechnology Center at the University of Wisconsin-Madison for sequencing on a ABI 3730XL machine (Applied Biosystems). For the genome-wide phylogeny (phylo-genome analysis), the predicted SNPs from sequenced genomes (M. ap isolates) and the corresponding nucleotides in DT 78, M. ap K-10 and M. avium 104 were tabulated to create a concatenated sequences of each strain. The genome of M. avium Env 77 isolate was excluded from such analysis because of the low similarity to other genomes. The concatenated sequence of each strain was aligned using CLUSTALW, and phylogenetic trees were generated with MEGA version 5 using one of the following methods: maximum parsimony (MP), maximum likelihood (ML), maximum likelihood with molecular clock (MLK) assumption in addition to Neighbor-joining algorithm with a bootstrapping values of 1,000 replicates applied to all methods (Tamura et al., 2011).

CHARACTERIZATION OF MYCOBACTERIAL ISOLATES
Several mycobacterial species belonging to the M. avium complex are present in animal surroundings; each with different capacities to cause illness (e.g., M. ap, M. avium) and potential to spread to humans (Alvarez-Uria, 2010). Before initiating our genome analysis of members of the M. avium complex, we searched our collection of mycobacterial isolates originating from diverse hosts, diverse tissues as well as from environmental samples of dairy herds that might help in spreading the infection. Our selection scheme identified eight isolates that were subjected for further genotyping protocols to confirm their identity. Based on acid-fast staining and amplification of the 16S rRNA gene using mycobacteria-specific primers (Talaat et al., 1997), all eight isolates were shown to belong to the genus mycobacterium. Moreover, typing based on the hsp65 gene (Smole et al., 2002) confirmed the identity of two mycobacterial isolates, DT 78 and Env 77 as M. avium subspecies avium (M. avium) while the rest of the isolates were all M. ap. Identification of sheep or cattle types of M. ap was based on IS1311 amplification followed by HinfI digestion (data not shown). All of the six M. ap isolates belonged to the bovine origin (M. ap type II). A compiled list of all mycobacterial isolates used in this study and their origin is shown in Table 1.

WHOLE-GENOME SEQUENCING OF MYCOBACTERIAL ISOLATES
The Illumina sequencer generated an average read length of 50 nucleotides with an average coverage of 42-68× of each sequenced genome after reference assembly. The number of reads, mapped reads, and the length of consensus sequence are all listed in Table 2. The revised version of M. ap K-10 sequence (Wynne et al., 2010) and M. avium subspecies hominissuis (M. avium 104) were used as references for comparative genome assembly of the target isolates. As expected, all examined M. ap genomes showed a high sequence identity (up to 99%) to the M. ap K-10 genome. Lack of sequence coverage in some parts of the genome could explain some of the differences from the reference genome. Despite the presence of small deleted regions among M. ap genomes, only 2 gaps >1 kb had been seen among M. ap genomes, including the one isolated from human (M. ap 4B isolate), suggesting a high level of similarity to the M. ap K-10 strain isolated from cattle. On the other hand, the M. avium DT 78 strain had only 87% sequence identity to the M. avium 104 genome while it had a higher similarity (93%) to the M. ap K-10 genome, despite its established genotype as M. avium isolate. In the DT 78 genome, more gaps were present whether M. avium 104 or M. ap K-10 were used for reference alignment (Figure 1). The average gap size in this genome is ∼4 kb.
Among the sequenced genomes, the genome of M. avium Env 77 provided a significant challenge because of the low level of similarity to M. avium 104 genome during the reference assembly phase. Accordingly, we employed an algorithm for de novo www.frontiersin.org  assembly that generated 772 contigs. These contigs were used as queries in MegaBLAST search against the Mycobacteria genome database (blast.ncbi.nlm.nih.gov). The coverage of each contig is at least 20× and the average coverage of all contigs is around 30× for this strain. In fact, the Env 77 genome was sequenced twice with similar result for each sequencing run (data not shown). Interestingly, BLAST analysis showed only a third of the Env 77 genome with sequence similarity to the genomes of either the M. ap K-10 or M. avium 104 and to a lesser degree to other sequenced mycobacterial genomes, suggesting a mosaic genome structure (Figure 2). Detailed BLAST analysis of the Env 77 draft genome shared common conserved genes, mainly with four mycobacterium species, including ribosomal proteins, DNA polymerase, proteinase Clp, cell division protein Fts, and some transcription or translation regulatory factors. As indicated in Figure 2, the genome of M. avium Env 77 has higher similarity to M. avium 104 and M. ap K-10 than other mycobacterial species. Overall, the sequenced genomes from all strains, except Env 77, mapped to the reference genomes with a significantly high level of similarity. All sequenced genomes were deposited to GenBank database for download and further analysis. The accession numbers for the deposited sequences are listed in Table A1 in Appendix.

GENOMIC REARRANGEMENTS AMONG M. AP ISOLATES
A major goal of our investigation was to delineate events of insertions and deletions among mycobacterial genomes to better understand their evolutionary relationships. To identify large scale events of insertions/deletions (Indels), we compared the  Figure 3). Among the potential Indels that could exist among these genomes, we identified only gaps that are <1 kb. A common gap area located at reference position 3,767,550-3,767,870 which is part of MAPK 3350 gene encoding a hypothetical protein has been seen among all six strains with a gap size ∼300 bp. At this region, low or zero read coverage has observed among all six strains suggesting a problematic region for Illumina sequencer. The sequence in this gap region appeared to have high GC contents (82%) but no repetitive elements involved. Based on the MAUVE comparison, the consensus sequences of these six strains are closely matched to the M. ap K-10 genome and no inversions were observed (Figure 3). On the other hand, when MAUVE was used to compare the genome of M. ap isolates to the M. avium 104 or M. avium DT78 genomes, about seven large regions of Indels were identified, confirming earlier findings by our group when DNA microarray was used . For example, one 11 kb Indel was found in all six M. ap strains at position 2,318,400-2,333,740 (MAPK 2038-MAPK 2050) but absent from M. avium. This 11 kb region encodes mostly hypothetical proteins in M. ap K-10 genome with two exceptions, MAPK 2040 and MAPK 2050. MAPK 2040 is a predicted hydrolase and earlier analysis (Santema et al., 2009) also showed the absence of this gene in M. avium 104, but present in other M. avium strain ( Table 3). In addition, a total of six genomic inversions spanning ∼2.4 Mb were identified among all M. ap strains when compared to M. avium 104 genome, similar to our earlier analysis of only M. ap K-10 and M. avium 104 genomes .

SNPS AMONG M. AP ISOLATES
To better analyze genomic diversity among M. ap isolates, we also examined genomic variations on the nucleotide level. For SNPs analysis, we set stringent criteria for SNP detection (see Materials and Methods). The total number of SNPs among six M. ap genomes ranged from 56 to 131 (Figure 4), among which 17 were found in >1 genome ( Table 4). The number of non-synonymous SNPs (nSNPs) is slightly higher than synonymous SNPs (sSNPs), suggesting a positive selective pressure on the identified genes. In addition, most genes harbored one SNP with exceptions of 23 genes that contained two or three SNPs ( Table A2 in Appendix). Interestingly, GlnE and MAPK 4304 contained three SNPs each, all are nSNPs, suggesting a high selective pressure on these two genes. Majority of genes contained >1 SNP are larger than 1 kb in size with an average SNP density of 1 SNP per 1.44 kb. Remaining 232 genes that harbored only one SNP represented a similar SNP density of one SNP per 1.44 kb that was identified in other mycobacterium (Qi et al., 2009). For the M. ap JTC 1281 and M. ap 4B, the percentage of nSNPs were 52.68 and 51.76% respectively, and the rest of M. ap strains with >60% of SNPs were nSNPs. Interestingly, genes encoding the Cytochrome P450 proteins harbored a high number of alleles in three of the six examined genomes ( Table 5), similar to the same family of genes in M. tuberculosis (Cole, 1999). Intergenic SNPs were identified and counted for <10% of total SNPs. Generally, a modest number of SNPs were detected among genomes of M. ap isolates, unlike M. avium isolates. The M. avium DT 78 genome had a significantly high number of SNPs detected (6,278 SNPs) when compared to the standard M. avium 104 genome suggesting an earlier separation of this strain during its evolutionary pathway. In addition, >75% of the identified SNPs were synonymous, an indication of a higher stabilizing selective pressure for M. avium genes than those of M. ap. For the M. avium Env 77, SNP detection was not performed because the whole sequence aligned poorly with either M. ap K-10 or M. avium 104. Finally, 10 SNPs were randomly chosen for further confirmation using the Sanger sequencing method. The 10 SNPs were chosen based on the ATCC 19698 genome. The same 10 SNPs were also found in JTC 1281, while only 5 common SNPs were found in JTC 1285. All amplicons were sequenced from both forward and reverse strands ( Table A3 in Appendix). Three SNPs were not detected in JTC 1285 based on the Sanger results, and is most likely caused by the Illumina sequencer error. Overall, Illumina sequencing was very beneficial in providing a high level of single nucleotide polymorphism in all examined genomes.

PHYLO-GENOMIC RELATIONSHIP AMONG M. AP ISOLATES
Single nucleotide polymorphisms of six M. ap strains were concatenated and used for phylogenetic analysis on a genome-wide (phylo-genome) level. The two reference strains, M. ap K-10 and M. avium 104, were included in the analysis. A total of 301 SNPs present among the six M. ap strains as well as in M. avium 104 and M. avium DT 78 genomes were included in this analysis using the Neighbor-joining method (Tamura et al., 2011). The un-rooted tree showed a strong discriminatory power of SNP for all examined isolates based on their origin ( Figure 5A)    isolates. Such discriminatory power was not possible when singlegene genotypes were tried (see above). Nonetheless, when the tree was rooted to M. avium 104 genome, two distinct major branches within the M. ap genomes were easily discerned ( Figure 5B). In one branch within M. ap genomes (Figure 5B), an isolate from red deer (M. ap DT 3) was closely related to the standard cattle strains (M. ap K-10 and ATCC 19698). On the other hand, isolates from goat and oryx (M. ap JTC 1281 and JTC 1285, respectively) were more closely related to the recently isolated cattle type strain (M. ap K-10) than to other laboratory strain (ATCC 19698), suggesting a cattle source of infection. In the other branch of the tree, M. ap 4B and M. ap Env 210 isolates from human and dairy farm, respectively, were closely related to each other. It is noteworthy to mention here that the association of M. avium DT 78 genome to the M. avium 104 strain based on phylo-genomic analysis confirmed our earlier identification of this isolate to belong M. avium group despites its overall higher similarity to the M. ap K-10. Finally, when we tried additional three methods for tree construction (MP, ML, MLK) on independent lists of sSNPs and nSNPs, a congregant topology was obtained for all trees with a high bootstrap support, similar to the one showed in Figure 5B. The Log Likelihood Ratio test for MLK consensus tree against ML tree indicated that the molecular clock assumption was not valid (p < 0.007). Overall, the identified tree topology suggests that M. avium 104 as a common ancestor from which M. ap likely emerged and diversified into two lineages: a lineage that clustered Env 210 with M. ap 4B (Human) while the second clustered all type II strains of M. ap. In both lineages, infected cows are the most likely reservoir for spreading the type II M. ap strains.

DISCUSSION
Understanding the genome-wide variations among pathogenic Mycobacteria will improve our understanding of the pathogenesis and evolution of these important pathogens. Recently, Nextgeneration sequencing technologies provided us the opportunity to examine whole-genome variations on a much faster basis than traditional sequencing or DNA-microarray technologies. In this study, M. ap isolates were chosen from diverse hosts, sources, and locations to better assess the impact of these variations on pathogen genome composition. As expected, the six M. ap genomes sequenced in this study shared ∼99% sequence similarity M. ap K-10 reference genome with a modest number of SNPs (∼100) suggesting a stabilizing selective pressure. On the contrary, isolates of M. avium origin showed more diversity. For example, the M. avium DT 78 genome had a significant number of gaps and a large number of SNPs (∼6,000) compared to M. avium 104 despite its significant similarity to M. ap K-10 (>90%) on a whole-genome level. This isolate is likely to represents an intermediate strain between M. avium 104 and M. ap K-10. Generally, M. avium replicate faster than M. ap and survive in a more diverse environments, those factors are likely to contribute to adaptive polymorphism. Previous analyses showed that M. avium has more diversity than M. ap strains (Turenne et al., 2006;Wu et al., 2006). Additionally, the M. avium Env 77 genome BLAST search indicated its complex and mosaic structure, another indication of diversity among M. avium isolates. Although standard genotyping protocols used here (based on hsp65 and IS1311) clearly typed DT78 and Env 77 isolates as M. avium, our genome sequencing approach question the validity of genotyping of Mycobacteria based on a single or a few genes and advocate for a whole-genome based approach.
Because of the close relatedness of M. ap genomes, SNPs from each strain provided valuable information on the divergence and evolutionary process that control members of MAC. A wide range of studies used SNPs for studying drug resistance mutations in organisms (Xu et al., 2008), analysis of genomic evolution (Filliol et al., 2006), and association of M. ap infection to Crohn's disease patients (Wynne et al., 2011). In this study, SNPs were used www.frontiersin.org  for genome-wide typing of isolates to understand the dynamics of Johne's disease transmission. Examining the modest number of SNPs detected in M. ap identified the presence of a higher percentage of nSNPs in all six M. ap isolates, suggesting a close relatedness among strains (Gutacker et al., 2002;Holden et al., 2004;Rocha et al., 2006). This close relationship among M. ap isolates could indicate a "spillover" infection from cattle to other animals (in this study red deer). However, the observed higher percentage of nSNPs could indicate adaptive evolution of M. ap to different hosts with positive selective pressure. A significant number of nSNPs were located in genes that encode hypothetical proteins while others in genes that encode proteins with enzymatic functions, some of them involved in metabolism and energy pathways, such as Pks proteins and NuoL protein. Interestingly, a SNP in the glnE gene were identified in all six M. ap genomes and additional SNPs in this gene were observed in ATCC 19698 and M. ap 4B separately. In M. tuberculosis, GlnE is an adenylyl transferase modulating glutamine synthetase activity and it is essential for bacterial growth under alternative nitrogen sources (Carroll et al., 2008). SNPs within this gene could likely be an indication of common evolutionary ancestor with environmental isolates. Similarity, SNPs in cytochrome P450 enzymes ( Table 5) that catalyze mixed oxidation of hydrophobic compounds associated with free-living saprophyte (Arnold, 2007), another indication of a common environmental ancestor for M. ap. Finally, genes encoding cytochrome P450 were shown to play a role in the persistence of M. tuberculosis in tissues (McLean et al., 2010). SNPs found in M. ap counterpart could potentially contribute to the M. ap-host interactions. An interesting outcome of the employed phylo-genomic analysis provided here is the further support provided to the hypothesis of presence of common origin to both subspecies of M. avium complex, namely, M. ap and M. avium subsp. avium. Such hypothesis was supported before based on large genomic regions of insertions/deletions Alexander et al., 2009). This study provides further support to this hypothesis using SNPs on a genome-wide level. The whole-genome approach we employed here allowed us to explore the diversity among MAC isolates from different hosts and variable locations. It also provided more clues regarding the dynamic of mycobacterial transmission among animals. Sequencing the genome of more isolates will definitely enrich our understanding of the genome content and evolution of both environmental and pathogenic strains of mycobacteria and will eventually provide a comprehensive population genetic structure.  . (B) A rooted Neighbor-joining tree using M. ah 104 genome as out group. The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed. The bootstrap replicates are marked on each branch and a less than 50% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the branches.

Frontiers in Microbiology | Cellular and Infection Microbiology
Such knowledge base could elucidate the relationship between strains and host or some special environmental cues. In addition, sequencing more diverse isolates will help to evaluate the dynamic of disease transmission among animals or from animals to humans. Developing algorithms that can utilize the information gained from Next-generation sequencers will only improve www.frontiersin.org the phylo-genomic analysis and is greatly needed to advance our understanding of microbe-host interactions.