AUTHOR=Liu Guang , Li Tong , Zhu Xiaoyan , Zhang Xuanping , Wang Jiayin 

TITLE=An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur

JOURNAL=Frontiers in Microbiology

VOLUME=Volume 14 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1178744

DOI=10.3389/fmicb.2023.1178744

ISSN=1664-302X

ABSTRACT=16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profile microbial communities with next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and bypass the reduction of sequencing errors, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs becomes more popular than OTUs and only one analysis methods usually chosen in different studies, there is a need for thoroughly comparison of OTU-clustering and denoising pipelines. In this study, three of the most widely-used 16S rRNA methods (two denoising algorithms (DADA2, and Deblur) and de novo OTU-clustering) were compared thoroughly using the 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) cohort. Our findings indicate that all approaches lead to similar taxonomic profiles (P>0.05 of PERMNAOVA analysis and P<0.001 of Mantel analysis), though the number of ASVs/OTUs and alpha-diversity index varied considerably. Disease-related analysis showed that all methods can result in similar conclusion, though there are considerable differences in disease-related markers. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella and Haemophilus are enriched in CRC group from three methods, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia enriched in healthy group from three methods. In addition, disease diagnostic model using machine learning algorithms based on the data from different methods could achieve good diagnostic efficiency (AUC: 0.87 - 0.89), and DADA2 obtained the highest AUC (0.8944 and 0.8907 in the training set and testing set respectively), however there was not any significant difference among the performance (P>0.05). In conclusion, this study demonstrates the DADA2, Deblur and de novo OTU-clustering manifest similar power in taxa assignment and could result in similar conclusion in CRC cohort.