Technology and Code ARTICLE
SeqDeχ: a sequence deconvolution tool for genome separation of endosymbionts from mixed sequencing samples
- 1Department of Earth and Environmental Sciences, University of Pavia, Italy
- 2Department of Biosciences, Faculty of Science and Technology, University of Milan, Italy
- 3Pediatric Research Center Romeo and Enrica Invernizzi Hospital, Italy
- 4Department of Biology and Biotechnology Lazzaro Spallanzani, University of Pavia, Italy
- 5Department of Biology, University of Pisa, Italy
In recent years, the advent of NGS technology have made genome sequencing much cheaper than in the past; the high parallelization capability and the possibility to sequence more than one organism at once have opened the door to processing whole symbiotic consortia. However, this approach needs the development of specific bioinformatic tools able to analyze these data. In this work we describe SeqDex, a tool that starts from a preliminary assembly obtained from sequencing a mixture of DNA from different organisms, to identify the contigs coming from one organism of interest. SeqDex is a fully automated machine learning-based tool exploiting partial taxonomic affiliations and compositional analysis to predict the taxonomic affiliations of contigs in an assembly.
In literature there are few methods able to deconvolve host-symbiont datasets, and most of them heavily rely on user curation and are therefore time consuming. The problem has strong similarities with metagenomic studies, where mixed samples are sequenced and the bioinformatic challenge is trying to separate contigs on the basis of their source organism; however, in symbiotic systems, additional information can be exploited to improve the output. To assess the ability of SeqDex to deconvolve host-symbionts datasets we compared it to state of the art methods for metagenomic binning and for host-symbionts deconvolution on three study cases. The results point out the good performances of the presented tool that, in addition to the ease of use and customization potential, make SeqDex a useful tool for rapid identification of endosymbiont sequences.
Keywords: Binning, Symbiosis, deconvolution, machine learning, NGS
Received: 11 Jun 2019;
Accepted: 15 Aug 2019.
Copyright: © 2019 Chiodi, Comandatore, Sassera, Petroni, Bandi and Brilli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Dr. Alice Chiodi, Department of Earth and Environmental Sciences, University of Pavia, Pavia, 27100, Lombardy, Italy, firstname.lastname@example.org
PhD. Matteo Brilli, Department of Biosciences, Faculty of Science and Technology, University of Milan, Milano, 20122, Lombardy, Italy, email@example.com