AUTHOR=Balamurugan Muthukumar , Banerjee Ruma , Kasibhatla Sunitha Manjari , Achalere Archana , Joshi Rajendra TITLE=Understanding the Genetic Diversity of Mycobacterium africanum Using Phylogenetics and Population Genomics Approaches JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.800083 DOI=10.3389/fgene.2022.800083 ISSN=1664-8021 ABSTRACT=Mycobacterium var. africanum (Maf), a member of the Mycobacterium Tuberculosis Complex (MTBC) is responsible for causing tuberculosis in West Africa. It consists of two lineages, viz., L5 and L6, which cause tuberculosis in Eastern and Western regions of West Africa respectively. Regions of difference (RDs) are usually used for delineation of MTBC, however with growing data availability, single nucleotide polymorphisms (SNP) may provide further resolution. To understand the genetic diversity of Maf, population genomics and phylogeny approaches were used in this study. Three sub-lineages were found to be present in L6 (L6.1, L6.2 & L6.3) whereas, L5 grouped as L5.1 and L5.2 based on the presence/absence of RD711. Using model-based and de novo approaches, L5.1 and L5.2 were further divided into two (L5.1.1 & L5.1.2) and four (L5.2.1, L5.2.2, L5.2.3 & L5.2.4) sub-clusters respectively. Isolates (#31) that remained unidentified initially using RD-based tool could be assigned to definite lineages/sub-lineages based on clustering observed in phylogenetic tree along with de novo methods and high confidence posterior membership scores obtained after population stratification using genome-wide SNPs. Synonymous SNPs belonging to L5 (#137) and L6 (#128) sub-clusters were identified as biomarkers and used for further validation. Cluster-specific missense variants obtained in this study mapped to existing growth attenuation studies further explaining their impaired fitness and specific adaptation to their defined ecological niches. Missense mutations belonging to genes part of central carbohydrate metabolism pathway in L5 and L6 include His6Tyr (Rv0946c), Glu255Ala (Rv1131), Ala309Gly (Rv2454c), Val425Ala & Ser112Ala (Rv1127c), Gly198Ala (Rv3293) and Ile137Val (Rv0363c), Thr421Ala (Rv0896), Arg442His (Rv1248c), Thr218Ile (Rv1122), Ser381Leu (Rv1449c) respectively may explain the growth attenuation differences existing in L5 and L6. Genes harboring multiple (sub)-lineage-specific“core-cluster” SNPs hint at association of these SNPs with selective advantage or (host)-adaptation. Rv0066c (icd2) includes three missense variants namely, Lys117Asn, Val447Met and Ala455Val, present across all isolates of L6, L6.1 and L5 respectively. Hence, such cluster-specific SNPs have potential to be used as additional markers along with RD-regions to improve characterization of Maf isolates and provide more insight into the genotype-phenotype correlation and clues to understand the endemicity of Maf in Africa.