Impact Factor 3.517 | CiteScore 3.60
More on impact ›

Methods ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.01046

Complement genome annotation lift over using a weighted sequence alignment strategy

 Baoxing Song1, 2, 3, Qing Sang2, Huimin Pei1, Xiangchao Gan2* and Fen Wang1*
  • 1Qiannan Normal College For Nationalities, China
  • 2Max Planck Institute for Plant Breeding Research, Germany
  • 3Cornell University, United States

With the broad application of high throughput sequencing, more whole-genome resequencing data and de novo assemblies of natural populations are becoming available. For a particular species, in general, only the reference genome is well established and annotated. Computational tools based on sequence alignment have been developed to investigate the gene models of individuals belonging to the same or closely related species. During this process, inconsistent alignment often obscures genome annotation lift over and leads to improper functional impact prediction for a genomic variant, especially in plant species. Here, we proposed the zebraic striped dynamic programming (ZSDP) algorithm which provides different weights to genetic features to refine genome annotation lift over. Testing of our ZSDP algorithm on both plant and animal genomic data showed complementation to standard sequence approach for highly diverse individuals. Using the lift over genome annotation as anchors, a base-pair resolution genome-wide sequence alignment and variant calling pipeline for de novo assembly has been implemented in the GEAN software. GEAN could be used to compare haplotype diversity, refine the genetic variant functional annotation, annotate de novo assembly genome sequence, detect homologous syntenic blocks, improve the quantification of gene expression levels using RNA-seq data, and unify genomic variants for population genetic analysis. We expect that GEAN will be a standard tool for the coming age of de novo assembly population genetics.

Keywords: gene expression level quantification, Weighted sequence alignment, Genome annotation, genetic variants uniformization,, genome wide multiple-sequence alignment

Received: 28 Jun 2019; Accepted: 30 Sep 2019.

Copyright: © 2019 Song, Sang, Pei, Gan and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Dr. Xiangchao Gan, Max Planck Institute for Plant Breeding Research, Cologne, 50829, North Rhine-Westphalia, Germany, gan@mpipz.mpg.de
Dr. Fen Wang, Qiannan Normal College For Nationalities, Duyun, China, fenmin521@sgmtu.edu.cn