Impact Factor 3.517 | CiteScore 3.60
More on impact ›

Technology and Code ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.00842

iRO-PsekGCC: identify DNA replication origins based on Pseudo k-tuple GC Composition

 Bin Liu1*, Shengyu Chen2, Ke Yan3 and Fan Weng3
  • 1Beijing Institute of Technology, China
  • 2School of Informatics, Computing & Engineering, Indiana University Bloomington, United States
  • 3School of Computer Science and Technology, Harbin Institute of Technology, China

Summary: Identification of replication origins is playing a key role in understanding the mechanism of DNA replication. This task is of great significance in DNA sequence analysis. Because of its importance, some computational approaches have been introduced. Among these predictors, the iRO-3wPseKNC predictor is the first discriminative method that is able to correctly identify the entire replication origins. For further improving its predictive performance, we proposed the Pseudo k-tuple GC Composition (PsekGCC) approach to capture the “GC asymmetry bias” of yeast species by considering both the GC skew and the sequence order effects of k-tuple GC Composition (k-GCC) in this study. Based on PseKGCC, we proposed a new predictor called iRO-PsekGCC to identify the DNA replication origins. Rigorous jackknife test on two yeast species benchmark datasets (Saccharomyces cerevisiae, Pichia pastoris) indicated that iRO-PsekGCC outperformed iRO-3wPseKNC. It can be anticipated that iRO-PsekGCC will be a useful tool for DNA replication origin identification.
Availability and implementation: The web-server for the iRO-PsekGCC predictor was established, and it can be accessed at http://bliulab.net/iRO-PsekGCC/.

Keywords: Replication origin identification, Pseudo k-tuple GC Composition, random forest, Web-server, DNA sequence analysis

Received: 05 Jul 2019; Accepted: 13 Aug 2019.

Copyright: © 2019 Liu, Chen, Yan and Weng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Bin Liu, Beijing Institute of Technology, Beijing, China, bliu@bliulab.net