AUTHOR=Liu Xiaojun , Li Lianxing , Peng Lihong , Wang Bo , Lang Jidong , Lu Qingqing , Zhang Xizhe , Sun Yi , Tian Geng , Zhang Huajun , Zhou Liqian TITLE=Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00674 DOI=10.3389/fgene.2020.00674 ISSN=1664-8021 ABSTRACT=Patients with Carcinoma of unknown primary (CUP) account for 3-5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine its tissue-of-origin. However, the diagnosis of CUP and identification of its primary site is challenging. Previous studies have suggested that the molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study is to evaluate the performance somatic mutations detected in a tumor to identify cancer tissue-of-origin. In practice, we downloaded the somatic mutation datasets from the ICGC project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions like cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3374 cancer samples across 13 cancer types reaches 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue-of-origin, and diagnosis and treatment of cancers.