Skip to main content

ORIGINAL RESEARCH article

Front. Bioinform.
Sec. Evolutionary Bioinformatics
Volume 4 - 2024 | doi: 10.3389/fbinf.2024.1400003
This article is part of the Research Topic Evolution of Short Genomic Regions: Discoveries, Methods, and Challenges View all articles

AUTO-TUNE: SELECTING THE DISTANCE THRESHOLD FOR INFERRING HIV TRANSMISSION CLUSTERS

Provisionally accepted
Steven Weaver Steven Weaver 1Vanessa M. Davila Conn Vanessa M. Davila Conn 2Daniel Ji Daniel Ji 3Hannah Verdonk Hannah Verdonk 1Santiago Ávila-Ríos Santiago Ávila-Ríos 2Joel Wertheim Joel Wertheim 4Andrew J. Leigh Brown Andrew J. Leigh Brown 5Sergei L. Kosakovsky Pond Sergei L. Kosakovsky Pond 1*
  • 1 Center for Viral Evolution, Temple University, Philadelphia, United States
  • 2 Center for Research in Infectious Diseases, National Institute of Respiratory Diseases, Mexico City, Mexico
  • 3 Department of Computer Science and Engineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, California, United States
  • 4 School of Medicine, University of California San Diego, La Jolla, California, United States
  • 5 School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom

The final, formatted version of the article will be published soon.

    Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret.Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying.Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network.We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07 BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained heterosexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.

    Keywords: Molecular Epidemiology, HIV, network, transmission cluster, surveillance

    Received: 12 Mar 2024; Accepted: 17 May 2024.

    Copyright: © 2024 Weaver, Davila Conn, Ji, Verdonk, Ávila-Ríos, Wertheim, Leigh Brown and Kosakovsky Pond. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Sergei L. Kosakovsky Pond, Center for Viral Evolution, Temple University, Philadelphia, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.