AUTHOR=Zhang Shichen , Zhang Lanlan , Wang Lu , Wang Hongqiu , Wu Jiaxin , Cai Haoyang , Mo Chunheng , Yang Jian TITLE=Machine learning identified MDK score has prognostic value for idiopathic pulmonary fibrosis based on integrated bulk and single cell expression data JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1246983 DOI=10.3389/fgene.2023.1246983 ISSN=1664-8021 ABSTRACT=Idiopathic pulmonary fibrosis (IPF) is a progressive and fatal lung disease that poses a significant challenge to medical professionals due to its increasing incidence and prevalence, coupled with the limited understanding of its underlying molecular mechanisms. In this study, we employed a novel approach by integrating five expression datasets from bulk tissue with single-cell datasets, which underwent pseudotime trajectory analysis, switch gene selection, and cell communication analysis. Utilizing the prognostic information derived from the GSE47460 dataset, we identified 22 differentially expressed switch genes that were correlated with clinical indicators as important genes. Among these genes, we found that the midkine (MDK) gene has the potential to serve as a marker of IPF, because its cellular communicating genes are differentially expressed in the epithelial cells. We then utilized MDK and its cellular communication related genes to calculate MDK score. Machine learning models were further constructed through MDK and related genes to predict IPF disease through the bulk gene expression datasets. The MDK score demonstrated a correlation with clinical indexes, and the machine learning model achieved an AUC of 0.94 and 0.86 in IPF classification task based on lung tissue samples and peripheral blood mononuclear cell samples, respectively. Our findings offer valuable insights into the pathogenesis of IPF, providing new therapeutic directions and target genes for further investigation.