<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Bioinformatics | Evolutionary Bioinformatics section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/bioinformatics/sections/evolutionary-bioinformatics</link>
        <description>RSS Feed for Evolutionary Bioinformatics section in the Frontiers in Bioinformatics journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-11T17:48:23.293+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1839097</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1839097</link>
        <title><![CDATA[Correction: Protein embeddings reveal a continuous molecular landscape of host adaptation in waterfowl parvoviruses]]></title>
        <pubdate>2026-04-16T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Nihui Shao</author><author>Yunfei Guo</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1821711</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1821711</link>
        <title><![CDATA[Editorial: Evolution of short genomic regions: discoveries, methods, and challenges]]></title>
        <pubdate>2026-03-18T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Helen Piontkivska</author><author>Fabia Ursula Battistuzzi</author><author>Tzu-Chiao Chao</author><author>Nicole Hansmeier</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2026.1735360</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2026.1735360</link>
        <title><![CDATA[Evolutionary dynamics of early and late mutational signatures in metastatic cancer]]></title>
        <pubdate>2026-03-05T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Anastasia Yankovskiy</author><author>Sudhir Kumar</author><author>Sayaka Miura</author>
        <description><![CDATA[Cancer genomes accumulate somatic mutations over time, influenced by both intrinsic and extrinsic mutational processes. In metastatic cancer, disseminated tumor cells may acquire additional mutations at metastatic sites, shaped by extrinsic factors distinct from those at the primary tumor. As a result, cancer genomes at metastatic sites may bear mutational signatures originating from both primary and metastatic environments. However, the patterns and relative contributions of mutational signatures specific to metastatic sites remain poorly understood. To investigate this, we analyzed mutational signatures from seven metastatic cancer patients. We observed distinct mutational patterns between early and late mutation profiles within individual patients, where the early and late categories were based on their relative timing during tumor evolution. Early mutations were often dominated by a single mutational signature that accounted for more than half of the total signature burden. These dominant signatures tended to be shared among tumors of the same cancer type, suggesting that early mutations in metastatic cancers may be shaped by a single, highly active mutational process at the primary tumor site. In contrast, late mutations were often more poorly decomposed into distinct mutational signatures, reflecting more complex and diverse compositions. Overall, early mutations tended to preserve clearer signals of their origin.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1738737</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1738737</link>
        <title><![CDATA[Protein embeddings reveal a continuous molecular landscape of host adaptation in waterfowl parvoviruses]]></title>
        <pubdate>2026-01-27T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Nihui Shao</author><author>Yunfei Guo</author>
        <description><![CDATA[Viral adaptation across closely related hosts often proceeds through subtle molecular changes that escape detection by classical phylogenetic analyses. In waterfowl parvoviruses, we integrate AI-based protein language modeling, structural biophysics, and infection assays to reveal a continuous trajectory of host adaptation linking Goose parvovirus (GPV) and Muscovy duck parvovirus (MDPV). Protein embeddings of VP1 sequences reveal a smooth manifold bridging GPV and MDPV, which softens an apparent phylogenetic dichotomy into a graded molecular topology. Structural modeling identifies a flexible surface loop (residues 300–420) as a biophysical pivot. Along the embedding trajectory, this loop undergoes gradual conformational expansion and electrostatic neutralization, quantitatively linking embedding coordinates to capsid surface remodeling. Experimentally, a GPV-type isolate recovered from naturally diseased ducks replicated efficiently in duck embryos, duck embryo fibroblasts, and live ducklings, producing characteristic lesions. These results show that waterfowl parvoviruses evolve along a continuous molecular–electrostatic landscape in which cumulative structural adjustments enable cross-host infectivity. Our framework connects AI-derived molecular representations to biophysical mechanisms and biological function, supporting a model of viral host adaptation as a predominantly continuous process and providing a foundation for predicting cross-host potential in emerging viral systems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1704212</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1704212</link>
        <title><![CDATA[Assessment of phylogenetic informativeness in mitochondrial and nuclear genes for mammalian systematics using sparse learning]]></title>
        <pubdate>2026-01-08T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Carlos G. Schrago</author><author>Beatriz Mello</author>
        <description><![CDATA[Despite the growing availability of nuclear genomic data, mitochondrial genes remain the most widely used molecular markers in mammalian systematics. However, a quantitative assessment of the phylogenetic information content of mitochondrial loci compared to nuclear loci has never been carried out. Here, we apply a sparse learning approach based on Lasso regression to evaluate the contribution of alignment sites to phylogenetic likelihoods, providing the first estimates of phylogenetically effective lengths for markers commonly used in mammalian systematics. Analyzing more than 30,000 complete mammalian mitochondrial genomes and nuclear panels composed of either 100 randomly selected complete coding sequences or of partial gene segments from conventional markers, we examined phylogenetic informativeness at two taxonomic levels: within-species and among-species. On average, ∼32% of mitochondrial sites and ∼38% of nuclear sites were classified as phylogenetically informative. We found that the number of phylogenetically informative sites were positively correlated with total gene length. Therefore, longer mitochondrial genes, particularly ND5, COX1, and CYTB, harbored the largest numbers of informative sites. Although nuclear coding sequences contained, on average, more informative sites, mitochondrial genes also yielded consistent resolution of among-species relationships. Overall, our results provide the first large-scale, quantitative comparison of phylogenetic information content across mammalian mitochondrial and nuclear genes, offering a principled framework for marker selection in future systematics studies that can be broadly applied to any lineage.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1673480</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1673480</link>
        <title><![CDATA[Higher frequency of prokaryotic low complexity regions in core and orthologous genes]]></title>
        <pubdate>2025-11-27T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Vineet Saravanan</author><author>Alexander Kravetz</author><author>Fabia Ursula Battistuzzi</author>
        <description><![CDATA[Prokaryotic genome evolution is shaped by mutation, gene duplication, and horizontal gene transfer, yet the interaction of these mechanisms, particularly in relation to low complexity regions (LCRs), remains poorly understood. LCRs are known to be mutation-prone and have been proposed to promote genetic innovation. However, the interaction between LCR-mediated and paralogy-mediated genetic innovation is still unclear. To clarify the interplay between these two evolutionary forces, we analyzed the distribution of LCRs in protein-coding genes from three closely related enterobacteria (Escherichia coli, Salmonella enterica, and Klebsiella pneumoniae) at both species and population levels. Using pangenomic and orthology-based approaches, we categorized genes by duplication history and conservation status and assessed LCR frequencies across these groups. We found that LCRs were consistently enriched in core and orthologous genes rather than in accessory or paralogous ones. This pattern was stable across evolutionary timescales and particularly pronounced in genes involved in cell cycle control and defense. These results suggest that, contrary to prior assumptions, LCRs may serve conserved functional roles rather than acting primarily as agents of evolutionary plasticity even at population-level timescales.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1710926</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1710926</link>
        <title><![CDATA[Completing a molecular timetree of Afrotheria]]></title>
        <pubdate>2025-11-19T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Jack M. Craig</author><author>Whitney L. Fisher</author><author>Allan S. Thomas</author><author>S. Blair Hedges</author><author>Sudhir Kumar</author>
        <description><![CDATA[Afrotheria, the superorder that includes aardvarks, elephants, elephant shrews, hyraxes, manatees, and tenrecs, is home to some of the most charismatic and well-studied animals on Earth. Here, we assemble a nearly taxonomically complete molecular timetree of Afrotheria using an integrative approach that combines a literature search for published timetrees, de novo dating of untimed molecular phylogenies, and inference of timetrees from new alignments. The resulting timetree sheds light on the impact of the Cretaceous-Paleogene (K-Pg) role ∼66 million years ago in the diversification of Afrotherian orders. The earliest divergence in the timetree of Afrotherian mammals predates the K-Pg event by 12 million years, followed by five interordinal divergences that occurred gradually over a 16-million-year period encompassing the K-Pg event.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1563786</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1563786</link>
        <title><![CDATA[Internal fossil constraints have more effect on the age estimates of crown Palaeognathae than different phylogenomic data type]]></title>
        <pubdate>2025-08-07T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Alexandre Pedro Selvatti</author><author>Naoko Takezaki</author>
        <description><![CDATA[Palaeognathae is an ancient bird lineage that includes the volant tinamous and six flightless lineages: ostrich, rhea, cassowary, emu, kiwi (extant) and moa, elephant bird (extinct). Over the past decade, a consensus has emerged on the relationships within the group. In this consensus, the ostrich branch splits first, followed by rheas, a clade containing tinamou and moa and a clade with the emu and cassowary sister to the kiwi and elephant bird. However, the timing of the origin of these major clades remains uncertain. In phylogenomic studies, the origin of the crown Palaeognathae is typically dated to the K–Pg boundary (∼66 Ma), though one study suggested a younger Early Eocene age (∼51 Ma). This discrepancy might result from the number and position of fossil priors (calibration strategies) or by differences in genomic regions sampled (data types). We investigated the impact of calibration strategies and data types on the timing of the Palaeognathae root using genomic sequences from nuclear (noncoding [CNEE and UCE] and coding [first and second codon positions]) and mitogenomic datasets. The nuclear dataset included 14 Palaeognathae species (13 extant and the extinct moa), while the mitogenomic included 31 species, covering all extant and extinct lineages. The datasets were analyzed with and without internal calibrations. The age estimates were more influenced by calibration strategy than data type, although some nuclear data (CNEE) produced substantially younger ages except for the Casuariiformes node, whilst another dataset (PRM) from a previous study estimated younger ages for Casuariiformes compared to the other datasets. Nevertheless, our results consistently placed the origin of crown Palaeognathae around the K–Pg boundary (62–68 Ma), even when using the original dataset that produced the Eocene age. These findings demonstrate that multiple internal calibrations yield consistent results across different sequence types and taxon schemes, providing robust estimates of the crown Palaeognathae age. This improved timing enhances our understanding of the early evolutionary history of this clade, particularly regarding the placement of enigmatic Paleocene fossils, such as Lithornithidae and Diogenornis, which in this timeframe can be assigned to internal branches within the crown Palaeognathae.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1571568</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1571568</link>
        <title><![CDATA[Integrating phylogenies with chronology to assemble the tree of life]]></title>
        <pubdate>2025-04-30T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Jose Barba-Montoya</author><author>Jack M. Craig</author><author>Sudhir Kumar</author>
        <description><![CDATA[Reconstructing the global Tree of Life necessitates computational approaches to integrate numerous molecular phylogenies with limited species overlap into a comprehensive supertree. Our survey of published literature shows that individual phylogenies are frequently restricted to specific taxonomic groups due to investigators’ expertise and molecular evolutionary considerations, resulting in any given species present in a minuscule fraction of phylogenies. We present a novel approach, called the chronological supertree algorithm (Chrono-STA), that can build a supertree of species from such data by using node ages in published molecular phylogenies scaled to time. Chrono-STA builds a supertree by integrating chronological data from molecular timetrees. It fundamentally differs from existing approaches that generate consensus phylogenies from gene trees with missing taxa, as Chrono-STA does not impute nodal distances, use a guide tree as a backbone, or reduce phylogenies to quartets. Analyses of simulated and empirical datasets show that Chrono-STA can combine taxonomically restricted timetrees with extremely limited species overlap. For such data, approaches that impute missing distances or assemble phylogenetic quartets did not perform well. We conclude that integrating phylogenies via temporal dimension enhances the accuracy of reconstructed supertrees that are also scaled to time.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1532981</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1532981</link>
        <title><![CDATA[Transcripts derived from AmnSINE1 repetitive sequences are depleted in the cortex of autism spectrum disorder patients]]></title>
        <pubdate>2025-04-09T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Nicolina Sciaraffa</author><author>Daniele Santoni</author><author>Andrea Li Greci</author><author>Swonild Ilenia Genovese</author><author>Claudia Coronnello</author><author>Walter Arancio</author>
        <description><![CDATA[AimsAutism spectrum disorder (ASD) is a brain developmental disability with a not-fully clarified etiogenesis. Current ASD research largely focuses on coding regions of the genome, but up to date much less is known about the contribution of non-coding elements to ASD risk. The non-coding genome is largely made of DNA repetitive sequences (RS). Although RS were considered slightly more than “junk DNA”, today RS have a recognized role in almost every aspect of human biology, especially in developing human brain. Our aim was to test if RS transcription may play a role in ASD.MethodsGlobal RS transcription was firstly investigated in postmortem dorsolateral prefrontal cortex of 13 ASD patients and 39 matched controls. Results were validated in independent datasets.ResultsAmnSINE1 was the only RS significantly downregulated in ASD specimens. The role of AmnSINE1 in ASD has been investigated at multiple levels, showing that the 1,416 genes containing AmnSINE1 are associated with nervous system development and autism susceptibility. This has been confirmed in a different experimental setting, such as in organoid models of the human cerebral cortex, harboring different ASD causative mutations. AmnSINE1 related genes are transcriptionally co-regulated and are involved not only in brain formation but can specifically be involved in ASD development. Looking for a possible direct role of AmnSINE1 non-coding transcripts in ASD, we report that AmnSINE1 transcripts may alter the miRNA regulatory landscape for genes involved in neurogenesis.ConclusionOur findings provide preliminary evidence supporting a role for AmnSINE1 in ASD development.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2025.1491735</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2025.1491735</link>
        <title><![CDATA[Divergent evolution of low-complexity regions in the vertebrate CPEB protein family]]></title>
        <pubdate>2025-03-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Serena Vaglietti</author><author>Stefania Boggio Bozzo</author><author>Mirella Ghirardi</author><author>Ferdinando Fiumara</author>
        <description><![CDATA[The cytoplasmic polyadenylation element-binding proteins (CPEBs) are a family of translational regulators involved in multiple biological processes, including memory-related synaptic plasticity. In vertebrates, four paralogous genes (CPEB1-4) encode proteins with phylogenetically conserved C-terminal RNA-binding domains and variable N-terminal regions (NTRs). The CPEB NTRs are characterized by low-complexity regions (LCRs), including homopolymeric amino acid repeats (AARs), and have been identified as mediators of liquid-liquid phase separation (LLPS) and prion-like aggregation. After their appearance following gene duplication, the four paralogous CPEB proteins functionally diverged in terms of activation mechanisms and modes of mRNA binding. The paralog-specific NTRs may have contributed substantially to such functional diversification but their evolutionary history remains largely unexplored. Here, we traced the evolution of vertebrate CPEBs and their LCRs/AARs focusing on primary sequence composition, complexity, repetitiveness, and their possible functional impact on LLPS propensity and prion-likeness. We initially defined these composition- and function-related quantitative parameters for the four human CPEB paralogs and then systematically analyzed their evolutionary variation across more than 500 species belonging to nine major clades of different stem age, from Chondrichthyes to Euarchontoglires, along the vertebrate lineage. We found that the four CPEB proteins display highly divergent, paralog-specific evolutionary trends in composition- and function-related parameters, primarily driven by variation in their LCRs/AARs and largely related to clade stem ages. These findings shed new light on the molecular and functional evolution of LCRs in the CPEB protein family, in both quantitative and qualitative terms, highlighting the emergence of CPEB2 as a proline-rich prion-like protein in younger vertebrate clades, including Primates.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1495417</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1495417</link>
        <title><![CDATA[Completing a molecular timetree of primates]]></title>
        <pubdate>2024-12-16T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Jack M. Craig</author><author>S. Blair Hedges</author><author>Sudhir Kumar</author>
        <description><![CDATA[Primates, consisting of apes, monkeys, tarsiers, and lemurs, are among the most charismatic and well-studied animals on Earth, yet there is no taxonomically complete molecular timetree for the group. Combining the latest large-scale genomic primate phylogeny of 205 recognized species with the 400-species literature consensus tree available from TimeTree.org yields a phylogeny of just 405 primates, with 50 species still missing despite having molecular sequence data in the NCBI GenBank. In this study, we assemble a timetree of 455 primates, incorporating every species for which molecular data are available. We use a synthetic approach consisting of a literature review for published timetrees, de novo dating of untimed trees, and assembly of timetrees from novel alignments. The resulting near-complete molecular timetree of primates allows testing of two long-standing alternate hypotheses for the origins of primate biodiversity: whether species richness arises at a constant rate, in which case older clades have more species, or whether some clades exhibit faster rates of speciation than others, in which case, these fast clades would be more species-rich. Consistent with other large-scale macroevolutionary analyses, we found that the speciation rate is similar across the primate tree of life, albeit with some variation in smaller clades.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1433995</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1433995</link>
        <title><![CDATA[Time-calibrated phylogeny of neotropical freshwater fishes]]></title>
        <pubdate>2024-12-06T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Victor A. Tagliacollo</author><author>Milton Tan</author><author>Roberto E. Reis</author><author>Ronielson Gaia</author><author>Virgilio Carrijo</author><author>Marcelo Ranuzi</author><author>Jack M. Craig</author><author>James S. Albert</author>
        <description><![CDATA[Neotropical Freshwater Fish (NFF) fauna exhibits the greatest phenotypic disparity and species richness among all continental aquatic vertebrate faunas, with more than 6,345 species distributed across the mostly tropical regions of Central and South America. The last two decades have seen a proliferation of molecular phylogenies, often at the species level, covering almost all 875 valid NFF genera. This study presents the most comprehensive genome-wide, time-calibrated phylogenetic hypothesis of NFF species to date, based on DNA sequences generated over decades through the collaborative efforts of the multinational ichthyological research community. Our purpose is to build and curate an extensive molecular dataset allowing researchers to evaluate macroevolutionary hypotheses in the NFF while facilitating continuous refinement and expansion. Using thousands of DNA sequences from dozens of studies, we compiled a supermatrix of 51 markers for 5,984 taxa, representing 3,167 NFF species. Based on this dataset, we built the most species-rich time-calibrated phylogeny of the NFF taxa to date, summarizing the collective efforts of the ichthyological research community since the midpoint of the last century. We provide a summary review of this remarkable evolutionary history and hope this dataset provides a framework for forthcoming studies of the NFF fauna, documenting compelling, emergent patterns in the world’s most diverse continental vertebrate fauna.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1441373</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1441373</link>
        <title><![CDATA[A time-calibrated phylogeny of the diversification of Holoadeninae frogs]]></title>
        <pubdate>2024-10-02T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Júlio C. M. Chaves</author><author>Fábio Hepp</author><author>Carlos G. Schrago</author><author>Beatriz Mello</author>
        <description><![CDATA[The phylogeny of the major lineages of Amphibia has received significant attention in recent years, although evolutionary relationships within families remain largely neglected. One such overlooked group is the subfamily Holoadeninae, comprising 73 species across nine genera and characterized by a disjunct geographical distribution. The lack of a fossil record for this subfamily hampers the formulation of a comprehensive evolutionary hypothesis for their diversification. Aiming to fill this gap, we inferred the phylogenetic relationships and divergence times for Holoadeninae using molecular data and calibration information derived from the fossil record of Neobatrachia. Our inferred phylogeny confirmed most genus-level associations, and molecular dating analysis placed the origin of Holoadeninae in the Eocene, with subsequent splits also occurring during this period. The climatic and geological events that occurred during the Oligocene-Miocene transition were crucial to the dynamic biogeographical history of the subfamily. However, the wide highest posterior density intervals in our divergence time estimates are primarily attributed to the absence of Holoadeninae fossil information and, secondarily, to the limited number of sampled nucleotide sites.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1400003</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1400003</link>
        <title><![CDATA[AUTO-TUNE: selecting the distance threshold for inferring HIV transmission clusters]]></title>
        <pubdate>2024-07-10T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Steven Weaver</author><author>Vanessa M. Dávila Conn</author><author>Daniel Ji</author><author>Hannah Verdonk</author><author>Santiago Ávila-Ríos</author><author>Andrew J. Leigh Brown</author><author>Joel O. Wertheim</author><author>Sergei L. Kosakovsky Pond</author>
        <description><![CDATA[Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism’s suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained heterosexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1381540</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1381540</link>
        <title><![CDATA[The evolution of mammalian Rem2: unraveling the impact of purifying selection and coevolution on protein function, and implications for human disorders]]></title>
        <pubdate>2024-06-24T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Alexander G. Lucaci</author><author>William E. Brew</author><author>Jason Lamanna</author><author>Avery Selberg</author><author>Vincenzo Carnevale</author><author>Anna R. Moore</author><author>Sergei L. Kosakovsky Pond</author>
        <description><![CDATA[Rad And Gem-Like GTP-Binding Protein 2 (Rem2), a member of the RGK family of Ras-like GTPases, is implicated in Huntington’s disease and Long QT Syndrome and is highly expressed in the brain and endocrine cells. We examine the evolutionary history of Rem2 identified in various mammalian species, focusing on the role of purifying selection and coevolution in shaping its sequence and protein structural constraints. Our analysis of Rem2 sequences across 175 mammalian species found evidence for strong purifying selection in 70% of non-invariant codon sites which is characteristic of essential proteins that play critical roles in biological processes and is consistent with Rem2’s role in the regulation of neuronal development and function. We inferred epistatic effects in 50 pairs of codon sites in Rem2, some of which are predicted to have deleterious effects on human health. Additionally, we reconstructed the ancestral evolutionary history of mammalian Rem2 using protein structure prediction of extinct and extant sequences which revealed the dynamics of how substitutions that change the gene sequence of Rem2 can impact protein structure in variable regions while maintaining core functional mechanisms. By understanding the selective pressures, protein- and gene - interactions that have shaped the sequence and structure of the Rem2 protein, we gain a stronger understanding of its biological and functional constraints.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1351620</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1351620</link>
        <title><![CDATA[ursaPGx: a new R package to annotate pharmacogenetic star alleles using phased whole-genome sequencing data]]></title>
        <pubdate>2024-03-12T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Gennaro Calendo</author><author>Dara Kusic</author><author>Jozef Madzo</author><author>Neda Gharani</author><author>Laura Scheinfeldt</author>
        <description><![CDATA[Long-read sequencing technologies offer new opportunities to generate high-confidence phased whole-genome sequencing data for robust pharmacogenetic annotation. Here, we describe a new user-friendly R package, ursaPGx, designed to accept multi-sample phased whole-genome sequencing data VCF input files and output star allele annotations for pharmacogenes annotated in PharmVar.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2024.1305969</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2024.1305969</link>
        <title><![CDATA[MetaWin 3: open-source software for meta-analysis]]></title>
        <pubdate>2024-02-08T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Michael S. Rosenberg</author>
        <description><![CDATA[The rise of research synthesis and systematic reviews over the last 25 years has been aided by a series of software packages providing simple and accessible GUI interfaces which are intuitively easy to use by novice analysts and users. Development of many of these packages has been abandoned over time due to a variety of factors, leaving a gap in the software infrastructure available for meta-analysis. To fulfill the continued demand for a GUI-based meta-analytic system, we have now released MetaWin 3 as free, open-source, multi-platform software. MetaWin3 is written in Python and developed from scratch relative to earlier versions. The codebase is available on Github, with pre-compiled executables for both Windows and macOS available from the MetaWin website. MetaWin includes standardized effect size calculations, exploratory and publication bias analyses, and allows for both simple and complex explanatory models of variation within a meta-analytic framework, including meta-regression, using traditional least-squares/moments estimation.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2023.1284744</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2023.1284744</link>
        <title><![CDATA[Completing a molecular timetree of apes and monkeys]]></title>
        <pubdate>2023-12-15T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Jack M. Craig</author><author>Grace L. Bamba</author><author>Jose Barba-Montoya</author><author>S. Blair Hedges</author><author>Sudhir Kumar</author>
        <description><![CDATA[The primate infraorder Simiiformes, comprising Old and New World monkeys and apes, includes the most well-studied species on earth. Their most comprehensive molecular timetree, assembled from thousands of published studies, is found in the TimeTree database and contains 268 simiiform species. It is, however, missing 38 out of 306 named species in the NCBI taxonomy for which at least one molecular sequence exists in the NCBI GenBank. We developed a three-pronged approach to expanding the timetree of Simiiformes to contain 306 species. First, molecular divergence times were searched and found for 21 missing species in timetrees published across 15 studies. Second, untimed molecular phylogenies were searched and scaled to time using relaxed clocks to add four more species. Third, we reconstructed ten new timetrees from genetic data in GenBank, allowing us to incorporate 13 more species. Finally, we assembled the most comprehensive molecular timetree of Simiiformes containing all 306 species for which any molecular data exists. We compared the species divergence times with those previously imputed using statistical approaches in the absence of molecular data. The latter data-less imputed times were not significantly correlated with those derived from the molecular data. Also, using phylogenies containing imputed times produced different trends of evolutionary distinctiveness and speciation rates over time than those produced using the molecular timetree. These results demonstrate that more complete clade-specific timetrees can be produced by analyzing existing information, which we hope will encourage future efforts to fill in the missing taxa in the global timetree of life.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fbinf.2023.1233281</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fbinf.2023.1233281</link>
        <title><![CDATA[The origin of eukaryotes and rise in complexity were synchronous with the rise in oxygen]]></title>
        <pubdate>2023-09-01T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jack M. Craig</author><author>Sudhir Kumar</author><author>S. Blair Hedges</author>
        <description><![CDATA[The origin of eukaryotes was among the most important events in the history of life, spawning a new evolutionary lineage that led to all complex multicellular organisms. However, the timing of this event, crucial for understanding its environmental context, has been difficult to establish. The fossil and biomarker records are sparse and molecular clocks have thus far not reached a consensus, with dates spanning 2.1–0.91 billion years ago (Ga) for critical nodes. Notably, molecular time estimates for the last common ancestor of eukaryotes are typically hundreds of millions of years younger than the Great Oxidation Event (GOE, 2.43–2.22 Ga), leading researchers to question the presumptive link between eukaryotes and oxygen. We obtained a new time estimate for the origin of eukaryotes using genetic data of both archaeal and bacterial origin, the latter rarely used in past studies. We also avoided potential calibration biases that may have affected earlier studies. We obtained a conservative interval of 2.2–1.5 Ga, with an even narrower core interval of 2.0–1.8 Ga, for the origin of eukaryotes, a period closely aligned with the rise in oxygen. We further reconstructed the history of biological complexity across the tree of life using three universal measures: cell types, genes, and genome size. We found that the rise in complexity was temporally consistent with and followed a pattern similar to the rise in oxygen. This suggests a causal relationship stemming from the increased energy needs of complex life fulfilled by oxygen.]]></description>
      </item>
      </channel>
    </rss>