AUTHOR=Duan Tao , Kuang Zhufang , Wang Jiaqi , Ma Zhihao TITLE=GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=Volume 9 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2021.753027 DOI=10.3389/fcell.2021.753027 ISSN=2296-634X ABSTRACT=In recent years, the long non-coding RNA (lncRNA) have been shown to be involved in many processes of disease. The prediction of the lncRNAs-diseases association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNAs-diseases association seldomly consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve these problem, a method based on the Gradient Boosting Decision Tree (GBDT) and Logistic Regression (LR) to predict the lncRNAs-diseases association (GBDTLRL2D) is proposed in this paper. Metagraph2Vec is used for feature learning, and negative samples sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of Metagraph2Vec can better retain the semantic and structural features in heterogeneous network. The average AUCs of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, 0.96 in 10-fold cross-validation respectively.