AUTHOR=Lee Conard , Amini Fatemeh , Hu Guiping , Halverson Larry J. TITLE=Machine Learning Prediction of Nitrification From Ammonia- and Nitrite-Oxidizer Community Structure JOURNAL=Frontiers in Microbiology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2022.899565 DOI=10.3389/fmicb.2022.899565 ISSN=1664-302X ABSTRACT=Accurately modeling nitrification and understanding the role specific ammonia- or nitrite-oxidizing taxa play in it are of great interest and importance to microbial ecologists. In this study, we applied machine learning to 16S rRNA sequence and nitrification potential data from an experiment examining interactions between cropping system and rhizosphere on microbial community assembly and nitrogen cycling processes. Given the high dimensionality of microbiome datasets we only include nitrifers since only a few taxa are capable of ammonia- and nitrite-oxidation. We compared performance of linear and non-linear algorithms with and without qPCR measures of bacterial and archaea ammonia mono-oxygenase subunit A (amoA) gene abundance. Our feature selection process facilitated the identification of taxons most predictive of nitrification and for comparing habitats. We found that Nitrosomonas and Nitrospirae were more frequently identified as important predictors of nitrification in conventional systems whereas Thaumarchaeota were more important predictors in diversified systems. Our results suggest model performance was not substantively improved by incorporating additional time-consuming and expensive qPCR data on amoA gene abundance. We also identify several clades of nitrifiers important to nitrification in different cropping systems, though we were not able to detect system-or rhizosphere-specific patterns in OTU-level biomarkers for nitrification. Lastly, our results highlight the inherent risk of combining data from disparate habitats with the goal of increasing sample size to avoid overfitting models. This work represents a step towards developing machine learning approaches for microbiome research to identify nitrifier ecotypes that may be important for distinguishing between ecotypes with defining roles in different habitats.