AUTHOR=Zhou Dan , Chen Youli , Wang Zehao , Zhu Siran , Zhang Lei , Song Jun , Bai Tao , Hou Xiaohua TITLE=Integrating clinical and cross-cohort metagenomic features: a stable and non-invasive colorectal cancer and adenoma diagnostic model JOURNAL=Frontiers in Molecular Biosciences VOLUME=Volume 10 - 2023 YEAR=2024 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2023.1298679 DOI=10.3389/fmolb.2023.1298679 ISSN=2296-889X ABSTRACT=Dysbiosis is associated with colorectal cancer (CRC) and adenomas (CRA). However, the robustness of diagnostic models based on microbial signatures in multiple cohorts remains unsatisfactory. In this study, we used machine learning models to screen for metagenomic features of CRC and CRA in selected datasets from CuratedMetagenomicData. Single datasets with most information on clinical and demographic characteristics were used for identifying important metagenomic and clinical features. Model validation was carried out in multiple cohorts. We Integrated metagenomic features that performed consistently across different cohorts with clinical features to construct CRC and CRA risk prediction models. In total, 20 metagenomic features were selected to predict CRC and CRA, respectively. The performance of the selected cross-cohort metagenomic features was stable for multi-regional and multi-ethnic populations (CRC, AUC: 0.817-0.867; CRA, AUC: 0.766-0.833). After clinical feature combination, AUC of our integrated CRC diagnostic model reached 0.939 (95%CI: 0.932-0.947), and that of the CRA integrated model reached 0.925 (95%CI: 0.917-0.935). In conclusion, the integrated model performed significantly better than single microbiome or clinical feature models in all cohorts. Integrating cross-cohort common discriminative microbial features with clinical features could help construct stable diagnostic models for early non-invasive screening for CRC and CRA.