Impact Factor 3.517
2018 JCR, Web of Science Group 2019

Frontiers journals are at the top of citation and impact metrics

Technology and Code ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.00637

PredPRBA:Prediction of protein-RNA binding affinity using gradient boosted regression trees

 Lei Deng1, Wenyi Yang1 and  Hui Liu2*
  • 1Central South University, China
  • 2Changzhou University, China

Protein-RNA interactions play essential roles in many biological aspects. Quantifying the binding affinity of protein-RNA complexes is helpful to the understanding of protein-RNA recognition mechanism and identification of strong binding partners. Due to experimentally measured protein-RNA binding affinity data available is still limited to date, there is a pressing demand for accurate and reliable computational approaches.In this paper, we propose a computational approach, PredPRBA, which can effectively predict protein-RNA binding affinity using gradient boosted regression trees. We build a dataset of protein-RNA binding affinity that includes 103 protein-RNA complex structures manually collected from related literature. Then, we generate 37 kinds of sequence and structural features and explore the relationship between the features and protein-RNA binding affinity. We find that the binding affinity mainly depends on the structure of RNA molecules. According to the type of RNA associated with proteins composed of the protein-RNA complex, we split the 103 protein-RNA complexes into six categories. For each category, we build a gradient boosted regression tree (GBRT) model based on the generated features. We perform comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validations. We show that PredPRBA achieves correlations ranging from 0.723 to 0.897 among six categories, which is significantly better than other machine learning methods and the pioneer protein-RNA binding affinity predictor SPOT-Seq-RNA. The PredPRBA webserver is freely available at http://PredPRBA.denglab.org.

Keywords: protein-RNA interactions, binding affinity, Gradient Boosted Regression Tree, sequence and structural features, Computational approaches

Received: 03 Apr 2019; Accepted: 18 Jun 2019.

Edited by:

Gajendra P. Raghava, Indraprastha Institute of Information Technology Delhi, India

Reviewed by:

Zhi-Ping Liu, Shandong University, China
Leyi Wei, Tianjin University, China  

Copyright: © 2019 Deng, Yang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Hui Liu, Changzhou University, Changzhou, China, hliu@cczu.edu.cn