AUTHOR=Sarkar Jnanendra Prasad , Saha Indrajit , Lancucki Adrian , Ghosh Nimisha , Wlasnowolski Michal , Bokota Grzegorz , Dey Ashmita , Lipinski Piotr , Plewczynski Dariusz TITLE=Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00982 DOI=10.3389/fgene.2020.00982 ISSN=1664-8021 ABSTRACT=Genome-wide analysis of miRNA molecules reveals important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. Thus, this fact motivated us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis in order to identify potential miRNAs that can play a crucial role in the prediction of different types of tumors. Our method has two parts. In first part, a feature selection method named Stochastic Covariance Evolutionary Strategy with Forward Selection (SCES-FS) is developed by integrating Stochastic Neighbor Embedding (SNE), Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) and classifiers with primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. While in second part, the most important features acquired in the first part are used to perform survival analysis using Cox regression primarily to establish the effectiveness of selected features. Next Generation Sequencing data of miRNA expression for 1707 samples of ten diverse cancer types along with 333 normal samples are analysed from The Cancer Genome Atlas. The SCES-FS method is compared with well known feature selection methods where the multi-class classification with selected 17 miRNAs performs better with accuracy 96\%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering in form of heatmap, KEGG pathways analysis, GO enrichment and Protein Protein Interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators e.g. MYC, VEGFA, AKT1, CDKN1A, RHOA, PTEN through their targets. Therefore the selected miRNAs can be considered as putative biomarkers for ten diverse cancer types.