Impact Factor 3.517 | CiteScore 3.60
More on impact ›

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.01091

Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean

  • 1University of Missouri, United States
  • 2Computer Science, University of Missouri, United States

Genomic selection uses single-nucleotide polymorphisms (SNPs) to predict quantitative phenotypes for enhancing traits in breeding populations, and it has been widely used to increase breeding efficiency for plants and animals. Existing statistical methods rely on a prior distribution assumption of imputed genotype effects, which may not fit experimental datasets. Emerging deep learning could serve as a powerful machine learning tool to predict quantitative phenotypes without imputation and also to discover potential associated genotype markers efficiently. We propose a deep-learning framework using convolutional neural networks to predict the quantitative traits from SNPs and also to investigate genotype contributions to the trait using saliency maps. The missing values of SNPs are treated as a new genotype for the input of the deep-learning model. We tested our framework on simulation data and on experimental datasets of soybean. The results show that the deep learning model can bypass the imputation of missing values and achieve more accurate results for predicting quantitative phenotypes than well-known statistical methods. It can also effectively and efficiently identify significant markers of SNPs and SNP combinations associated in genome wide association study.

Keywords: GWAS - genome-wide association study, CNN - convolutional neural network, phenotype, Soybean, saliency map, biomarker

Received: 22 Jul 2019; Accepted: 09 Oct 2019.

Copyright: © 2019 Liu, Wang, He, Wang, Joshi and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Prof. Dong Xu, University of Missouri, Computer Science, Columbia, 65211, MO, United States, xudong@missouri.edu