AUTHOR=Fiedler Lisa , Middendorf Martin , Bernt Matthias TITLE=Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1250907 DOI=10.3389/fgene.2023.1250907 ISSN=1664-8021 ABSTRACT=A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefit from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. Accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To keep up with the growth of available mitochondrial sequence data, highly efficient automatic computational methods are hence needed. Automatic annotation methods are typically based on databases that contain knowledge on already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: (i) they do not scale well in the size of the database, (ii) they do not allow for a fast (and easy) update of the database, and/or (iii) can be applied only to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of the shortcomings (i), (ii), and (iii). The reference database of mitogenomes is represented as a richly annotated de-Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method employs a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI. For a large set of mitogenomes for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.