AUTHOR=Yu Xinghao , Wang Ting , Huang Shuiping , Zeng Ping TITLE=How Can Gene-Expression Information Improve Prognostic Prediction in TCGA Cancers: An Empirical Comparison Study on Regularization and Mixed Cox Models JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00920 DOI=10.3389/fgene.2020.00920 ISSN=1664-8021 ABSTRACT=Background: Previous cancer prognostic prediction models often consider most important transcriptomic expressions and the power is limited. It is unknown whether the prediction power can be further improved when the transcriptomic information is incorporated. Methods: To integrate transcriptome, four models were compared based on 32 types of cancer in The Cancer Genome Atlas, including the general Cox model with only clinical covariates, the Cox model with lasso penalty (coxlasso), the Cox model with elastic net penalty (coxenet) and the mixed-effects Cox model (coxlmm). Furthermore, we partition the survival variance into the relative contribution of clinical and transcriptomic components within the framework of coxlmm. Finally, the influence of different numbers of genes were evaluated in the context of coxlmm. Results: Compared with the clinical-covariates only Cox model, the average of prediction gain was 2.4% for coxlasso, 4.2% for coxenet and 7.2% for coxlmm across 16 low-censored cancers; a significant elevation of prediction power was observed for SARC, SKCM, LGG, PAAD and HNSC. Similar findings were observed for all the 32 cancers, with the average prediction gain of 2.7%, 3.8% and 5.8% for coxlasso, coxenet and coxlmm. Coxlmm always had comparable or better prediction performance relative to coxlasso and coxenet, with an average of 2.8% prediction improvement across the 16 low-censored cancers. In addition, it is shown that the predictive accuracy of coxlmm generally increase with the number of genes included. The survival variance partition analysis demonstrated the transcriptomic contribution was higher for some cancers (e.g. LGG, CESC, PAAD, SKCM and SARC) but lower for others (e.g. BRCA, COAD, KIRC and STAD). Conclusions: This study demonstrates the integration of transcriptomic information can substantially improve prognostic prediction accuracy, but the prediction performance is cancer-specific and varies across cancer types. It further reveals gene expression exhibits distinct contributions to survival variation across cancers.