AUTHOR=Jiang Xiwen , Yan Jinghai , Huang Hao , Ai Lu , Yu Xuegao , Zhong Pengqiang , Chen Yili , Liang Zhikun , Qiu Wancen , Huang Huiying , Yan Wenyan , Liang Yan , Chen Peisong , Wang Ruizhi TITLE=Development of novel parameters for pathogen identification in clinical metagenomic next-generation sequencing JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1266990 DOI=10.3389/fgene.2023.1266990 ISSN=1664-8021 ABSTRACT=Introduction: Metagenomic Next-Generation Sequencing (mNGS) has emerged as a powerful tool for rapid pathogen identification in clinical practice. However, the parameters used to interpret mNGS data, such as read count, genus rank, and coverage, lack explicit performance evaluation. In this study, these indicators as well as novel parameters we developed were assessed for their performance in bacterium detection.Methods: We developed several relevant parameters, including 10M normalized reads, doublediscard reads, Genus Rank Ratio, King Genus Rank Ratio, Genus Rank Ratio*Genus Rank, and King Genus Rank Ratio*Genus Rank. These parameters, together with frequently-used including raw reads, reads per million mapped reads (RPM), transcript per kilobase per million mapped reads (TPM), Genus Rank and coverage were analyzed for their diagnostic efficiency in bronchoalveolar lavage fluid (BALF), a common source for detecting eight bacterium pathogens: Acinetobacter baumannii, Klebsiella pneumoniae, Streptococcus pneumoniae, Staphylococcus aureus, Haemophilus influenzae, Stenotrophomonas maltophilia, Pseudomonas aeruginosa, and Aspergillus fumigatus.Results: The results demonstrated that these indicators exhibited good diagnostic efficacy for the eight pathogens. The AUC values of all indicators were almost greater than 0.9 and the corresponding sensitivity and specificity values were almost greater than 0.8 excepted coverage.The negative predict value of all indicators were greater than 0.9. The results showed that the use of double-discarded reads, Genus Rank Ratio * Genus Rank, and King Genus Rank Ratio * Genus Rank exhibited better diagnostic efficiency compared to raw reads, RPM, TPM and in Genus Rank. These parameters can serve as a reference for interpreting mNGS data of BALF.Moreover, precision filters integrating our novel parameters were built to detected the eight bacterium pathogens in BALF samples through machine learning.Summary: In this study, we developed a set of novel parameters for pathogen identification in clinical mNGS based on reads and ranking. These parameters were found to be more effective in diagnosing pathogens than traditional approaches. The findings provide valuable insights for improving the interpretation of mNGS reports in clinical settings, specifically in BALF analysis.