AUTHOR=Niu Xiaohui , Yang Kun , Zhang Ge , Yang Zhiquan , Hu Xuehai 

TITLE=A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

JOURNAL=Frontiers in Genetics

VOLUME=Volume 10 - 2019

YEAR=2020

URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2019.01305

DOI=10.3389/fgene.2019.01305

ISSN=1664-8021

ABSTRACT=Deciphering the code of cis-regulatory element (CRE) is one of core issues of today’s biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome (discriminative enhancer predictions, DEP) is necessary, it is more important to predict in which specific tissues or cell types they will be activated and functional (tissue-specific enhancer predictions, TSEP). Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a specific tissue or cell type only has limited number of available enhancer samples for training. Here we first adopted a reported deep learning architecture and then developed a novel training strategy named ‘pretraining-retraining strategy’ (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-specific enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696 when testing on a larger-scale FANTOM5 enhancer dataset via a 5-fold-cross-validation. Interestingly, based on the trained pretraining model, a new finding is that only additional twenty epochs are needed to complete the retraining process on testing 23 specific tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is significantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven to be superior to another two state-of-the-art methods: DEEP and BiRen. In summary, PRS employed useful ideas from the domain of transfer learning and is a reliable method for tissue-specific enhancer predictions.