Technology and Code ARTICLE
Embracing ambiguity in the taxonomic classification of microbiome sequencing data
- 1Department of Computer Science, University of Maryland, College Park, United States
- 2University of Maryland Institute for Advanced Computer Studies, United States
- 3Center for Health-related Informatics and Bioimaging, University of Maryland, United States
The advent of high throughput sequencing has enabled in-depth characterization of human and environmental microbiomes. Determining the taxonomic origin of microbial sequences is one of the first, and frequently only, analysis performed on microbiome samples. Substantial research has focused on the development of methods for taxonomic annotation, often making trade-offs in computational efficiency and classification accuracy. A side-effect of these efforts has been a reexamination of the bacterial taxonomy itself. Taxonomies developed prior to the genomic revolution captured complex relationships between organisms that went beyond uniform taxonomic levels such as species, genus, family, etc. Driven in part by the need to simplify computational workflows, the bacterial taxonomies used most commonly today have been regularized to fit within a standard seven taxonomic levels. Consequently, modern analyses of microbial communities are relatively coarse-grained. Few methods make classifications below the genus level, impacting our ability to capture biologically relevant signals. Here, we present ATLAS, a novel strategy for taxonomic annotation that uses significant outliers within database search results to group sequences in the database into partitions. These partitions capture the extent of taxonomic ambiguity within classification of a sample. The ATLAS pipeline can be found on GitHub [https://github.com/shahnidhi/outlier_in_BLAST_hits]. We demonstrate that ATLAS provides similar annotations to phylogenetic placement methods, but with higher computational efficiency. When applied to human microbiome data, ATLAS is able to identify previously characterized taxonomic groupings, such as those in the class Clostridia and the genus Bacillus. Furthermore, the majority of partitions identified by ATLAS are at the sub-genus level, replacing higher-level annotations with specific groups of species. These more precise partitions improve our detection power in determining differential abundance in microbiome association studies.
Keywords: microbiome, Taxonomy, Classification, 16S rRNA marker gene, high-throughput sequencing
Received: 23 Apr 2019;
Accepted: 24 Sep 2019.
Copyright: © 2019 Shah, Meisel and Pop. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Mihai Pop, University of Maryland, College Park, Department of Computer Science, College Park, 20742, Maryland, United States, firstname.lastname@example.org