Frontiers journals are at the top of citation and impact metrics

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.00234

A Novel Approach to Clustering Genome Sequences Using Inter-Nucleotide Covariance

 Rui Dong1, Lily He1, Rong L. He2 and  Stephen S. Yau1, 3*
  • 1Department of Mathematical Sciences, Tsinghua University, China
  • 2Department of Biological Sciences, Chicago State University, United States
  • 3Tsinghua University, China

Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, while the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in R^18. By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in R^18. The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method.

Keywords: Accumulated Natural Vector, Phylogenenetic analysis, alignment-free, Genomes, inter-nucleotide covariance

Received: 07 Sep 2018; Accepted: 04 Mar 2019.

Edited by:

Alfredo Pulvirenti, Università degli Studi di Catania, Italy

Reviewed by:

Cheong Xin Chan, University of Queensland, Australia
Stefano Piotto, University of Salerno, Italy  

Copyright: © 2019 Dong, He, He and Yau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Prof. Stephen S. Yau, Tsinghua University, Beijing, China, yau@uic.edu