Original Research ARTICLE
Taxonomically informed scoring enhances confidence in natural products annotation
- 1Université de Genève, Switzerland
- 2Tokushima Bunri University, Japan
- 3University of Illinois at Chicago, United States
- 4Shahid Beheshti University, Iran
- 5Naresuan University, Thailand
The extensive characterization of metabolomes allows to better understand organisms and their interactions, but also to discover bioactive compounds that may lead to novel drugs with applications in human health. Such appealing perspectives come with significant challenges inherent to the complexity of the studied systems. Mass spectrometry (MS) offers unrivalled sensitivity for the metabolite profiling of complex biological matrices encountered in natural products (NP) research. The massive and complex sets of spectral data generated by such platforms require computational approaches for their interpretation. Computational metabolite annotation automatically links spectral data to candidate structures via a score, which is usually established between the acquired data and experimental or theoretical spectral databases (DB). This process leads to various candidate structures for each MS features. However, at this stage, obtaining high annotation confidence level remains a challenge notably due to the extensive chemodiversity of specialized metabolomes. The design of a metascore is a way to capture complementary experimental attributes and improve the annotation process. Here, we show that integrating the taxonomic position of analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and complement the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5 to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata, particularly biological sources, is timely and critical for the NP research community.
Keywords: Metabolite annotation, Chemotaxonomy, taxonomically informed scoring system, Natural Products, Metabolomics, Taxonomic distance, computational metabolomics, Specialized Metabolism
Received: 14 Jul 2019;
Accepted: 24 Sep 2019.
Copyright: © 2019 Rutz, Dounoue-Kubo, Ollivier, Bisson, Bagheri, Saesong, Ebrahimi, Ingkaninan, Jean-Luc and Allard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Pierre-Marie Allard, Université de Genève, Geneva, 1211, Geneva, Switzerland, email@example.com