AUTHOR=Mallik Saurav , Sarkar Anasua , Nath Sagnik , Maulik Ujjwal , Das Supantha , Pati Soumen Kumar , Ghosh Soumadip , Zhao Zhongming TITLE=3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1095330 DOI=10.3389/fgene.2023.1095330 ISSN=1664-8021 ABSTRACT=In this current era, biomedical big data handling is a challenging task. Interestingly, integration of multi-modal data followed by significant feature mining (gene signature detection) becomes a daunting task. Here we introduce a novel approach, 3PNMF-MKL (Three Factor Penalized, Non-Negative Matrix Factorization based Multiple Kernel Learning with Soft Margin Hinge Loss) for multi-modal data integration and gene signature detection together. First, we apply a supervised learning approach Limma to each individual molecular profile and then extract the statistically significant features. Next, we fuse the reduced feature sets by the 3-Factor Penalized Non-Negative Matrix Factorization. Finally, we use a Multiple Kernel Learning model with Soft Margin Hinge Loss to calculate accuracy values for different combinations of class-labels. Class-labels that contain the highest area under curve (AUC), are combined, and then used for module detection. To construct gene modules, topological overlap matrix similarity score is calculated, followed by average linkage clustering and dynamic tree cut. The best module containing the highest correlation is considered the gene signature. To evaluate the performance, we applied 3PNMF-MKL to an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. In general, our algorithm can be extended and applied to any multi-modal dataset for the data integration and gene module detection.