Impact Factor 4.151
2017 JCR, Clarivate Analytics 2018

Frontiers journals are at the top of citation and impact metrics

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.00394

Population levels assessment of the distribution of disease associated variants with emphasis on Armenians - a machine learning approach

  • 1Institute of Biomedicine and Pharmacy, Russian-Armenian (Slavonic) University, Armenia
  • 2Bioinformatics Group, Institute of Molecular Biology (NAS RA), Armenia
  • 3Laboratory of Ethnogenomics, Institute of Molecular Biology (NAS RA), Armenia
  • 4Interdisciplinary Centre for Bioinformatics, Leipzig University, Germany

Background: During last decades a number of genome-wide association studies (GWAS) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied in other populations. One of the reasons could be that most of GWAS employ case-control design in one or a limited number of populations, but little attention was paid to global distribution of disease associated alleles across different populations. Moreover, the majority of GWAS have been performed on selected European, African and Chinese populations and considerable number of populations remains understudied.
Aims: We have investigated the global distribution of so far discovered disease associated SNPs across worldwide distributed populations of different ancestry and geographical regions with special focus on understudied population of Armenians.
Data and Methods: We have used genotyping data from Human Genome Diversity Project and of Armenian population and combined them with disease associated SNP data taken from public repositories leading to a final dataset of 44,234 markers. Their frequency distribution across 1039 individuals from 53 populations was analyzed using Self Organizing Maps (SOM) machine learning. Our SOM portrayal approach reduces data dimensionality, clusters SNPs with similar frequency profiles and it provides two-dimensional data images which enable visual evaluation of disease associated SNPs landscapes among human populations.
Results: We find that populations from Africa, Oceania and America show specific patterns of minor allele frequencies of disease associated SNPs, while populations from Europe, Middle East, Central South Asia and Armenia mostly share similar patterns. Importantly, different sets of SNPs associate with common polygenic diseases, such as cancer, diabetes, neurodegeneration in populations from different geographic regions. Armenians are characterized by set of SNPs that are distinct from other populations from the neighboring geographical regions.
Conclusions: Genetic associations of diseases considerably vary across populations which necessitates health-related genotyping efforts especially for so far understudied population. SOM portrayal represents novel promising methods in population genetic research with special strength in visualization-based comparison of SNP data.

Keywords: Complex Diseases, Genetic risk alleles, small populations, genome wide association study, machine learning, Self-organizing maps, population-level disease variant distribution, Single nucleotide polymorphisms

Received: 14 Dec 2018; Accepted: 11 Apr 2019.

Edited by:

Zané Lombard, University of the Witwatersrand, South Africa

Reviewed by:

Ruzong Fan, Georgetown University Medical Center, United States
Teri Manolio, National Human Genome Research Institute (NHGRI), United States  

Copyright: © 2019 Nikoghosyan, Hakobyan, Hovhannisyan, Löffler-Wirth, Binder and Arakelyan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Arsen Arakelyan, Russian-Armenian (Slavonic) University, Institute of Biomedicine and Pharmacy, Yerevan, 0051, Armenia,