Original Research ARTICLE
Inferring Latent Disease-lncRNA Associations by Faster Matrix Completion on a Heterogeneous Network
- 1Other, China
- 2Hunan University, China
- 3Geneis (Beijing) Co. Ltd, China
Current studies have shown that long non-coding RNAs (lncRNAs) play a crucial role in a variety of important fundamental biological processes related to complex human diseases. The prediction of latent disease-lncRNA associations can help to understand the pathogenesis of complex human diseases at the level of lncRNA, which also contributes to the detection of disease biomarkers, and the diagnosis, treatment, prognosis, and prevention of disease. Nevertheless, it is still a challenging and urgent task to accurately identify the latent disease-lncRNA association. Discovering latent links on the basis of biological experiments is time-consuming and wasteful, which necessitates the development of computational prediction models. In this study, a computational prediction model has been remodeled as a matrix completion framework of the recommendation system by completing the unknown items in the rating matrix. A novel method named faster randomized matrix completion for latent disease-lncRNA association prediction (FRMCLDA) has been proposed by virtue of improved randomized partial SVD on a heterogeneous bilayer network. First, the correlated data source and experimentally validated information of diseases and lncRNAs are integrated to construct a heterogeneous bilayer network. Next, the integrated heterogeneous bilayer network can be formalized as a comprehensive adjacency matrix which includes lncRNA similarity matrix, disease similarity matrix, and disease-lncRNA association matrix where the uncertain disease-lncRNA associations are referred to as blank items. Then, a matrix approximate to the original adjacency matrix has been designed with predicted scores to retrieve the blank items. The construction of the approximate matrix could be equivalently resolved by the nuclear norm minimization. Finally, a faster singular value thresholding algorithm with a randomized partial SVD combing a new sub-space reuse technique has been utilized to complete the adjacency matrix. The results of leave-one-out cross-validation (LOOCV) experiments and 5-fold cross-validation (5-fold CV) experiments on three different benchmark databases have confirmed the availability and adaptability of FRMCLDA in inferring latent relationships of disease-lncRNA pairs, and in inferring lncRNAs correlated with novel diseases without any prior interaction information. Besides, case studies have shown that FRMCLDA is able to effectively predict latent lncRNAs correlated with three widespread malignancies: prostate cancer, colon cancer, and gastric cancer.
Keywords: Heterogeneous bilayer network, association prediction, Matrix completion, faster SVT, Randomized partial SVD, Similarity measurements
Received: 21 May 2019;
Accepted: 19 Jul 2019.
Edited by:Peilin Jia, University of Texas Health Science Center, United States
Reviewed by:Xin Zhou, Stanford University, United States
Lu Zhang, Hong Kong Baptist University, Hong Kong
Copyright: © 2019 Li, Wang, Xu, Mao, Tian and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Prof. Shulin Wang, Hunan University, Changsha, 410082, Hunan Province, China, firstname.lastname@example.org
Dr. Jialiang Yang, Geneis (Beijing) Co. Ltd, Beijing, 100006, China, email@example.com