Front. Genet. | doi: 10.3389/fgene.2019.00774

A hybrid ensemble approach for identifying robust differentially methylated loci in pan-cancers

 Qi Tian1, Jianxiao Zou1, Fang Yuan1, Zhongli Yu1, Jianxiong Tang1, Ying Song1 and Shicai Fan1, 2*
  • 1University of Electronic Science and Technology of China, China
  • 2Center for Information in BioMedicine, University of Electronic Science and Technology of China, China

DNA methylation is a widely investigated epigenetic mark which plays a vital role in tumorigenesis. Advancements in high-throughput assays, such as the Infinium 450K platform, provide genome-scale DNA methylation landscapes in single-CpG-locus resolution, and the identification of differentially methylated loci has become an insightful approach to deepen our understanding of cancers. However, the situation that extremely unbalanced numbers of samples and loci (approximately 1:1000), makes it rather difficult to explore differential methylation between the sick and the normal. In this paper, a Hybird approach based on ensemble feature selection for identifying Differentially Methylated Loci (HyDML) was proposed by incorporating instance perturbation and multiple function models. Experiments on data from The Cancer Genome Atlas (TCGA) showed that HyDML not only achieved effective DML identification, but also outperformed the single feature selection approach in terms of classification performance and the robustness of feature selection. The intensive analysis of the DML indicated that different types of cancers have mutual patterns, and the stable DML sharing in pan-caners are of the great potential to be biomarkers, which may strengthen the confidence of domain experts to implement biological validations.

Keywords: DNA Methylation, Differentially methylated loci, Ensemble feature selection, robustness, pan-cancers

