Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Evolutionary Bioinformatics

This article is part of the Research TopicAI in Evolutionary ScienceView all articles

Assessment of Phylogenetic Informativeness in Mitochondrial and Nuclear Genes for Mammalian Systematics Using Sparse Learning

Provisionally accepted
  • Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil

The final, formatted version of the article will be published soon.

Despite the growing availability of nuclear genomic data, mitochondrial genes remain the most widely used molecular markers in mammalian systematics. However, a quantitative assessment of the phylogenetic information content of mitochondrial loci compared to nuclear loci has never been carried out. Here, we apply a sparse learning approach based on Lasso regression to evaluate the contribution of alignment sites to phylogenetic likelihoods, providing the first estimates of phylogenetically effective lengths for markers commonly used in mammalian systematics. Analyzing more than 30,000 complete mammalian mitochondrial genomes and nuclear panels composed of either 100 randomly selected complete coding sequences or of partial gene segments from conventional markers, we examined phylogenetic informativeness at two taxonomic levels: within-species and among-species. On average, ~32% of mitochondrial sites and ~38% of nuclear sites were classified as phylogenetically informative. We found that the number of phylogenetically informative sites were positively correlated with total gene length. Therefore, longer mitochondrial genes, particularly ND5, COX1, and CYTB, harbored the largest numbers of informative sites. Although nuclear coding sequences contained, on average, more informative sites, mitochondrial genes also yielded consistent resolution of among-species relationships. Overall, our results provide the first large-scale, quantitative comparison of phylogenetic information content across mammalian mitochondrial and nuclear genes, offering a principled framework for marker selection in future systematics studies that can be broadly applied to any lineage.

Keywords: Phylogenetic informativeness, Phylogenetic signal, Mitochondrial Genome, species delimitation, molecular taxonomy, Sparse Learning, LASSO regression

Received: 12 Sep 2025; Accepted: 17 Dec 2025.

Copyright: © 2025 Schrago and Mello. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Carlos G Schrago
Beatriz Mello

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.