Impact Factor 3.517 | CiteScore 3.60
More on impact ›

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.01077

Gene Expression Value Prediction Based on XGBoost Algorithm

 Wei Li1, Yanbin Yin2,  Xiongwen Quan1 and  Han Zhang1*
  • 1Nankai University, China
  • 2University of Nebraska-Lincoln, United States

Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of our algorithm on the GEO dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

Keywords: Gene expression value, landmark gene, Target gene, Regression method, XGBoost, Absolute error

Received: 15 Jul 2019; Accepted: 09 Oct 2019.

Copyright: © 2019 Li, Yin, Quan and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Prof. Han Zhang, Nankai University, Tianjin, China, zhanghan@nankai.edu.cn