AUTHOR=He Qi-en , Zhu Jun-xuan , Wang Li-yan , Ding En-ci , Song Kai TITLE=DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.940214 DOI=10.3389/fgene.2022.940214 ISSN=1664-8021 ABSTRACT=Aberrant methylation is one of early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce based distributed and parallel designed partial least squares approach was proposed. The large-scale high-dimensional methylation data was first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As results, 22,000 potential methylation loci were selected out as highly related loci with early-stage pan-cancer diagnosis. 67 methylation loci out of them were further identified pan-cancer signatures considering their gene expressions as well. The survival analysis as well as pathway enrichment analysis on them shows not only these loci may serve as potential drug targets but also the proposed method may serve as a uniform framework for signature identification with big data.