AUTHOR=Zhang Yan , Xiang Ju , Tang Liang , Li Jianming , Lu Qingqing , Tian Geng , He Bin-Sheng , Yang Jialiang TITLE=Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.596794 DOI=10.3389/fgene.2021.596794 ISSN=1664-8021 ABSTRACT=Complex diseases such as breast cancer are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we developed a novel computational framework to analyze the network properties of the known breast cancer-associated genes, based on which to develop a random-walk-with-restart algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer-associated genes from the Genome-Wide Association Studies (GWAS) catalogue and Online Mendelian Inheritance in Man (OMIM) database, and then studied the distribution of these genes on an integrated Protein-Protein Interaction (PPI) network. We found that the breast cancer-associated genes are significantly close to each other than random, which confirmed the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning by top breast cancer-associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these pathways are activated non-uniformly. Taking the advantage of the non-random distribution of breast cancer-associated genes, we developed a novel random-walk-with-restart (RWR) algorithm to predict novel cancer genes based on subnetworks spanning by KEGG pathways. Compared with the disease gene prediction method without using the information of KEGG pathways, this method have a better prediction accuracy on inferring breast cancer-associated genes, and the top predicted genes are better enriched on known breast cancer-associated gene ontologies. Finally, we performed literature search on top predicted novel genes like HRAS and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we proposed a robust computational framework to prioritize novel breast cancer-associated genes, which could be used for further in-vitro and in-vivo experimental validation.