Priority Intervention Targets Identified Using an In-Depth Sampling HIV Molecular Network in a Non-Subtype B Epidemics Area

Molecular network analysis based on the genetic similarity of HIV-1 is increasingly used to guide targeted interventions. Nevertheless, there is a lack of experience regarding molecular network inferences and targeted interventions in combination with epidemiological information in areas with diverse epidemic strains of HIV-1.We collected 2,173 pol sequences covering 84% of the total newly diagnosed HIV-1 infections in Shenyang city, Northeast China, between 2016 and 2018. Molecular networks were constructed using the optimized genetic distance threshold for main subtypes obtained using sensitivity analysis of plausible threshold ranges. The transmission rates (TR) of each large cluster were assessed using Bayesian analyses. Molecular clusters with the characteristics of ≥5 newly diagnosed cases in 2018, high TR, injection drug users (IDUs), and transmitted drug resistance (TDR) were defined as priority clusters. Several HIV-1 subtypes were identified, with a predominance of CRF01_AE (71.0%, 1,542/2,173), followed by CRF07_BC (18.1%, 393/2,173), subtype B (4.5%, 97/2,173), other subtypes (2.6%, 56/2,173), and unique recombinant forms (3.9%, 85/2,173). The overall optimal genetic distance thresholds for CRF01_AE and CRF07_BC were both 0.007 subs/site. For subtype B, it was 0.013 subs/site. 861 (42.4%) sequences of the top three subtypes formed 239 clusters (size: 2-77 sequences), including eight large clusters (size ≥10 sequences). All the eight large clusters had higher TR (median TR = 52.4/100 person-years) than that of the general HIV infections in Shenyang (10.9/100 person-years). A total of ten clusters including 231 individuals were determined as priority clusters for targeted intervention, including eight large clusters (five clusters with≥5 newly diagnosed cases in 2018, one cluster with IDUs, and two clusters with TDR (K103N, Q58E/V179D), one cluster with≥5 newly diagnosed cases in 2018, and one IDUs cluster. In conclusion, a comprehensive analysis combining in-depth sampling HIV-1 molecular networks construction using subtype-specific optimal genetic distance thresholds, and baseline epidemiological information can help to identify the targets of priority intervention in an area epidemic for non-subtype B.

Molecular network analysis based on the genetic similarity of HIV-1 is increasingly used to guide targeted interventions. Nevertheless, there is a lack of experience regarding molecular network inferences and targeted interventions in combination with epidemiological information in areas with diverse epidemic strains of HIV-1.We collected 2,173 pol sequences covering 84% of the total newly diagnosed HIV-1 infections in Shenyang city, Northeast China, between 2016 and 2018. Molecular networks were constructed using the optimized genetic distance threshold for main subtypes obtained using sensitivity analysis of plausible threshold ranges. The transmission rates (TR) of each large cluster were assessed using Bayesian analyses. Molecular clusters with the characteristics of ≥5 newly diagnosed cases in 2018, high TR, injection drug users (IDUs), and transmitted drug resistance (TDR) were defined as priority clusters. Several HIV-1 subtypes were identified, with a predominance of CRF01_AE (71.0%, 1,542/ 2,173), followed by CRF07_BC (18.1%,393/2,173), subtype B (4.5%, 97/2,173), other subtypes (2.6%, 56/2,173), and unique recombinant forms (3.9%, 85/2,173). The overall optimal genetic distance thresholds for CRF01_AE and CRF07_BC were both 0.007 subs/site. For subtype B, it was 0.013 subs/site. 861 (42.4%) sequences of the top three subtypes formed 239 clusters (size: 2-77 sequences), including eight large clusters (size ≥10 sequences). All the eight large clusters had higher TR (median TR = 52.4/100 person-years) than that of the general HIV infections in Shenyang (10.9/100 personyears). A total of ten clusters including 231 individuals were determined as priority clusters for targeted intervention, including eight large clusters (five clusters with≥5 newly diagnosed cases in 2018, one cluster with IDUs, and two clusters with TDR (K103N, Q58E/V179D), one cluster with≥5 newly diagnosed cases in 2018, and one IDUs cluster. In conclusion, a comprehensive analysis combining in-depth sampling HIV-1 molecular INTRODUCTION Molecular epidemiology analysis with viral sequences among HIV-1-infected patients has long been used to explore the origin of HIV and to track the course of HIV spread (Aldous et al., 2012;Wertheim et al., 2014;Wertheim et al., 2017).In recent years, many studies inferred HIV-1 transmission clusters among populations using phylogenetic (Ragonnet-Cronin et al., 2013) or genetic distance-based methods (Kosakovsky Pond et al., 2018). Some studies even used these methods to inform near real-time interventions for HIV-1 infections (Pasquale et al., 2018). On-line tools for molecular cluster inference are available, including the HIV Transmission Cluster Engine (HIV-TRACE) (Kosakovsky Pond et al., 2018), a simplified genetic distance (GD)-based program developed by Kosakovsky Pond, that rapidly processes tens of thousands of data to infer potential HIV transmission clusters among a population. The partial HIV-1 pol gene is most commonly used for molecular network analysis because of the rich data from HIV drug resistance testing worldwide (Hassan et al., 2017). Most transmission network studies have been conducted in the USA (Little et al., 2014;Wertheim et al., 2017;Prevention and Prevention, 2017) and Europe (Chaillon et al., 2017;Stecher et al., 2018;Chaillon et al., 2019), where subtype B acts as the predominant strain and has shown a rather stable transmission rate in recent years (Hightower et al., 2013).
By the end of 2018, approximately 1.25 million people were living with HIV in China (Lyu and Chen, 2019). China is plagued with complex HIV-1 strains. According to the latest National HIV Molecular Epidemiological Survey, 18 known HIV subtypes were detected, predominant of four subtypes, including CRF07_BC(41.9%), CRF01_AE(33.2%), CRF08_BC(10.9%), and subtype B (4.0%). Among men who have sex with men (MSM), CRF07_BC and CRF01_AE were all over 40%, while subtype B and new recombinants were all 5.6% (Zhong, 2019). These findings suggest that the HIV epidemic situation in China is quite different from those of the USA and Europe. Recently, several studies in China have inferred the HIV molecular networks of CRF01_AE using publicly available sequences around China (Wang et al., 2019), local MSM population (Chen et al., 2018;Yan et al., 2020), and patients suffering from virologic failure (Yuan et al., 2019). Nevertheless, these studies usually use a phylogenetic approach or follow the genetic distance threshold of subtype B and the sampling depth remains low. More importantly, there is a lack of molecular networks combined with epidemiological information to guide the accurate targeting of the intervention population.
Shenyang city is reported to have the second-largest MSM population in China (after Beijing) with an estimated 140,000 MSM (DH and J, 2006). In the last 10 years, MSM in Shenyang was the main severely affected population, accounting for more than 80% of new HIV infections and CRF01_AE and CRF07_BC were the most prevalent strains Han et al., 2013;Zhao et al., 2015). Moreover, there were close links between MSM in Shenyang and other regions in China according to previous reports (Han et al., 2013;Han et al., 2015). In the present study, molecular networks of major HIV pandemic strains were constructed retrospectively among over 90% of newly diagnosed patients between 2016-2018 in Shenyang, where diverse HIV subtypes and circulating recombination forms (CRFs) have been reported among the various high-risk populations (Zhao et al., 2011;Zhao et al., 2015). A transmission network study of the whole HIV-infected population combined with epidemiological data in Shenyang might offer experiences regarding transmission network-based targeted prevention in areas experiencing complex HIV-1 epidemics.

Study Population
A retrospective molecular epidemiology study was conducted in Shenyang city. Between 2016 and 2018, a total of 2,577 HIV patients living in Shenyang were newly diagnosed, from which 2,354 (91.3%, 2,354/2,577) cryopreserved HIV-positive plasma samples at the time of diagnosis were available for drug resistance detection and HIV-1 LAg-Avidity EIA. All participants were HIV treatment-naïve at the time of sample collection. The corresponding demographic data were collected simultaneously. The study was approved by the Institutional Review Board of China Medical University. Recent HIV infections (RHI) were determined using records of seroconversion within 6 months  or based on the results of a limiting antigen avidity enzyme immunoassay (HIV-1 LAg-Avidity EIA).
published method (Stecher et al., 2018). Only the first sequence was selected if more than one sequence was available for a participant. The subtype of each sequence was determined using the Maximum-Likelihood phylogeny analysis using Mega 7.0.14 (Kumar et al., 2016). All reference sequences were downloaded from the Los Alamos database (LANL). Maximum-Likelihood phylogenetic trees (GTR nucleotide substitution model) were constructed using the SH-aLRT and the ultrafast bootstrap in IQ-TREE (Nguyen et al., 2015) with 1,000 replicates. Trees were visualized using Figtree (version 1.4.2) (http://beast.bio.ed.ac.uk). The clusters with SH-aLRT >90 (Anisimova et al., 2011) and ultrafast bootstrap values >90 were considered as a lineage (Feng et al., 2013). To identify the potential recombinants, candidate sequences were analyzed using the Recombination Identification Program (RIP) v3.0 (https://www.hiv.lanl.gov/content/sequence/RIP/RIP.html).

Molecular Network Analyses
The molecular networks of three main subtypes, CRF01_AE, CRF07_BC, and B were inferred based on the nucleotide GD by HIV-TRACE (www.hivtrace.org) (Kosakovsky Pond et al., 2018). All pairwise distances were computed and a putative linkage between two individuals was considered whenever distance measures between two sequences (the Tamura-Nei 93 substitution model) fell below the GD threshold (Little et al., 2014). Clusters were defined as connected components of the network comprising ≥2 nodes.
To infer the high-resolution networks of the three main HIV-1 subtypes in Shenyang, we applied a sensitivity analysis across the range of epidemiologically plausible GD thresholds, from more conservative (0.001 subs/site) to more liberal thresholds (0.020 subs/site). High resolution was defined as the minimum GD threshold that could identify the maximum number (not size) of clusters in each subtype (Wertheim et al., 2017). We used HIV-TRACE (Kosakovsky Pond et al., 2018) to infer the molecular network using the above range of plausible GD thresholds.

Estimated Transmission Rate
We applied a transmission rate (TR) formula to analyze the dynamics of the large clusters (n ≥10 sequences) and to estimate HIV transmission efficiency within each large cluster over time. Bayesian molecular clock phylogenetic inference was conducted to estimate the ages of internal nodes within each cluster using BEAST v.2.6.2 (Bouckaert et al., 2019). A series of parameters (TN93 substitution model, a strict molecular clock, and Bayesian skyline prior to 10 million generations) were set as previously described (Oster et al., 2018). The TR was defined from phylogenetic inferences as to the number of transmission events in the cluster, divided by the total HIV-infected persontime (Oster et al., 2018). (no : people in cluster − 1) ∑ (all node ages) + longest node age = XX transm=100 person years

Statistical Analyses
Chi-square tests were used to compare the demographic characteristics and clinical data between the different subtypes. Multivariate logistic regression analysis was performed to identify factors associated with the population within the priority clusters. Categorical variables included demographics, HIV risk factors, and HIV-1 subtype. P-values <0.05 were considered statistically significant. The calculations of all statistical tests were performed using SPSS version 25.0. Graph of the parameters used to infer molecular networks were generated using GraphPad Prism 7.04 (GraphPad Software, La Jolla, CA, USA).

Inferring Molecular Networks of Three Major Subtypes
Three main subtypes (CRF01_AE, CRF07_BC, and B) accounted for 93.5% (2,032/2,173) of newly diagnosed patients during 2016-2018 in Shenyang city. Considering the different epidemic situations of the three main subtypes in Shenyang, and the highest resolution of molecular networks which could provide the best information to target HIV prevention interventions, we performed a sensitivity analysis across a range of GD thresholds to infer the HIV molecular networks for each of the three main HIV subtypes. For an example of CRF01_AE strains, the most conservative GD of 0.001 subs/site inferred 85 clusters (size range: 2-8), including 199 sequences (cluster rate: 12.9%). By increasing the GD threshold, the number of clusters, cluster size, and the links between clustering persons increased. The maximum cluster number (179, size range: 2-77) was obtained for a GD threshold of 0.007 subs/site. Above this threshold, clusters began to coalesce and the inferred network lost resolution. In other words, the number of inferred clusters began to decrease in conjunction with a sharp increase of clustering individuals (nodes increasing from 617 to 1382) and genetic links (the number of edges increasing from 749 to 16949) ( Figure 1). Similar to CRF01_AE, 0.007 subs/site was also determined as the optimal GD threshold for CRF07_BC strains, with which the maximum number of clusters of 46 were identified (size: 2-48). However, the optimal GD for subtype B (0.013 subs/site) was much higher than for CRF01_AE and CRF07_BC strains, with which 14 clusters (size: 2-14) were inferred ( Figure 1). Overall, 861 sequences (42.4%) formed 239 clusters (size: 2-77 sequences), including eight large clusters with more than ten members (four of CRF01_AE clusters, three of CRF07_BC clusters, and one of subtype B cluster). The two largest clusters were the CRF01_AE cluster and CRF07_BC cluster, with member number of 77 and 48, respectively. Patients with subtype B infection had a higher cluster rate (51.5%, 50/97) than did patients with CRF07_BC (49.4%, 194/393) (p = 0.700) or CRF01_AE (40.0%, 617/1,542) infection (p = 0.026).
The goal of inferring molecular networks is to capture more recent infection events. Therefore, we further explored the performance of the optimal threshold on the identification of RHI in clusters, compared to the GD thresholds (0.005/0.015 subs/site) of B subtype recommended by the guidelines (National Center for HIV/AIDS, JUNE 2018). We found that a slightly lower percentage of RHI were included in clusters with the optimal threshold (0.007 subs/site) than the conservative threshold (0.005 subs/site) for both CRF01_AE and CRF07_BC. However, for subtype B strains, over 10% of RHI was missed with the optimal threshold (0.013 subs/site) than with the strict threshold (0.005 subs/site), but were closed to the performance of the relaxed threshold (0.015 subs/site) ( Table 2).

Estimation of TR for Eight Large Clusters (Size≥10)
Estimation of TR can provide insight into the epidemic dynamics of transmission clusters (Oster et al., 2018). Therefore, we calculated the TR of the clusters with sequence number ≥10. The median TR of eight large clusters (01AE1 to 01AE4, 07BC18 to 07BC20, and B23) was 52.4/100 person-years (IQR 49.0-55.1 years), which was 4.8 times higher than that of the general TR level of Shenyang [10.9/100 person-years (Zhang et al., 2018)] and ten times higher than national estimates in 2018 [4.9/100 person-years (NCAIDS et al., 2018)]. Among the eight large clusters, a TDR cluster (07BC20) had the highest TR of 59.6 per 100 person-years (Table 3).

Priority Intervention Clusters Identified by Molecular Networks Construction Combined With Sociodemographic Information
To identify targets for intervention, we used some risk factors related to active transmission clusters (at least five new diagnosed cases in the cluster in the most recent 12-month period (National Center for HIV/AIDS, JUNE 2018) or high TR (Oster et al.,   Prevention and center, 2019)]. According to the results above, all eight larger clusters having high TR should be included as a priority for targeted intervention, which contained five clusters with ≥5 newly diagnosed cases in 2018 (01AE1,01AE2, 07BC18, 07BC20, B23), one cluster (01AE2) with IDUs, and two clusters with TDR (07BC19: Q58E/V179D, 100% and 07BC20: K103N, 66.7%, Figure 2). one cluster (01AE12) with ≥5 newly diagnosed cases in 2018 and one IDUs cluster (01AE6) were also identified as priority intervention clusters. In summary, a total of ten clusters (cluster 01AE1-4, 01AE6, 01AE12, 07BC18-20, and B23) including 231 individuals were identified as priority intervention targets. The characteristics of the clusters (size >5) including priority intervention clusters are shown in Table 3.

DISCUSSION
We determined the priority clusters for targeted interventions in an area with multiple HIV-1 strain epidemics using HIV molecular network inference with deep sampling sequences and optimized GD thresholds. It has been noted that the sampling depth is a key factor for the high resolution of transmission network inference; low sampling coverage might lead to missing the potential linkages among infections (Little et al., 2014), and even missing some important small molecular clusters with transmission potential. For example, in the present study, some of the eight cases carrying K103N mutations in cluster 07BC20 might have been missed. Therefore, at least 60% coverage of persons with diagnosed HIV infection is recommended in the guidelines (National Center for HIV/AIDS, JUNE 2018). In this study, pol sequences were obtained from 84% (2,173/2,577) of newly diagnosed HIV-infected cases during the study period, much higher than most molecular network studies in the world by far. The deep sampling viral sequences and the demographic information of all participants provided a solid foundation for further molecular network analysis. For countries with limited sources such as China, HIV drug resistance testing is not routine for all newly diagnosed HIV infections and is usually recommended only when antiviral therapy (ART) fails . From the view of both the medical care of patients and public health, HIV drug resistance testing should be recommended for all newly diagnosed HIV infections prior to initiation of ART. This is because it helps not only to understand the drug resistance among HIV-infected persons to guide the rational use of antiviral drugs, but it is also of great significance for the construction of local HIV molecular networks to understand local HIV transmission dynamics and to guide prevention and interventions of HIV infection.
The optimized pairwise GD threshold for the HIV molecular network is believed to be very important for better defining transmission clusters (National Center for HIV/AIDS, JUNE 2018). It has been reported that the evolutionary rate of HIV-1 differs across subtypes (Patino-Galindo and Gonzalez-Candelas,  2017), as well as in different epidemiological settings of the same genes (Salemi et al., 2001). For example, the evolutionary rate of HIV-1 subtype A was found to be 8.4 times lower in fast spread among IDUs than in slow spread among MSM (Maljkovic Berry et al., 2007). Therefore, a single GD threshold would likely be unfit for diverse HIV-1 subtypes, especially when these networks were used to target prevention efforts. Three main subtypes (CRF01_AE, CRF07_BC, and B, 93.5%) in Shenyang were the focus of this study and the epidemic situation, and their demographic characteristics were quite different ( Table 1). We evaluated a GD threshold that would give the maximal number of clusters, i.e., highest resolution of the network with the sequences available. We found that the optimal GD threshold for CRF01_AE and CRF07_BC was distinctly different from that of subtype B and was more conservative than that of subtype B. We also found that similar to the conservative GD threshold (0.005 subs/site), the optimal GD threshold (0.007subs/site) in CRF01_AE and CRF07_BC could be used to identify the higher rate of RHI within defined clusters than the liberal GD threshold (0.015 subs/site) ( Table 2). The reason may be that they spread quickly and have been the top two prevalent subtypes, accounting for >80% of HIV-1 infected MSM in China (Han et al., 2013). The low GD threshold was close to the criteria for the recent and rapid growth of clusters (0.005 subs/site) provided by Oster et al. (Oster et al., 2018). It was not difficult to understand that the optimal GD threshold for the main subtypes in this study were similar with our previous study with the data collected before 2016 because of the similar HIV epidemics . Above all, a sensitivity analysis across plausible threshold ranges should be recommended for non-B subtypes before inferring an HIV molecular network. As an alternative strategy, a strict GD threshold (0.005 subs/site) could also be used for inferring HIV molecular network of CRF01_AE and CRF07_BC directly in Shenyang. Once inferred, molecular networks can identify characteristics of fast-spreading HIV epidemics (Aldous et al., 2012), which can be used to efficiently target prevention efforts. To make the targeted intervention targets more precise, we defined the priority intervention clusters in combination with some risk factors related to active transmission clusters and poor outcomes clusters (National Center for HIV/AIDS, JUNE 2018; Prevention and center, 2019), instead of roughly comparing the demographic information of individuals in and out of clusters. We also utilized transmission rates to estimate the large clusters (size ≥10) to FIGURE 2 | Phylogenetic analysis of subtype CRF07_BC. The phylogenetic tree was constructed using the maximum-likelihood method based on the pol region (HXB2: 2,253 to 3,300nt). HIV-1 subtype CRF01_AE was chosen as an out-group in the CRF07_BC rooted tree. Only the sequences within the two lineages of CRF07_BC (88.8%, 349/393) were included in this ML tree. The molecular cluster (pink) including the strains with K103N (green) located in the main lineage of CRF07_BC (red).
provide targets for intervention which has been previously used to study HIV phylodynamics (Holtgrave et al., 2012). This could be an effective way to prevent new infections by providing more effective interventions in various ways in addition to ART (such as enhanced HIV testing or pre-exposure prophylaxis within identified risk populations or rapid ART or enhanced ART adherence to HIV infections identified within priority clusters (Wang et al., 2015)) to the individuals in the priority intervention clusters. In this study, all eight large clusters were included in the priority clusters because of multiple risk factors. For example, clusters 01AE1, 01AE2, 07BC18, and 07BC20 had both more newly diagnosed HIV-1 infected cases in 2018 (>5 cases) and high TR. Moreover, clusters 01AE2 and 01AE6 contained 80% and 66.7% of IDUs, respectively. Cluster 07BC19 was a TDR cluster with Q58E/V179D. Q58E could confer to low-level resistance to TPV (tipranavir) which is not recommended in China. V179D is a polymorphic accessory NNRTI-selected mutation and may not affect NNRTI sensitivity alone (Rhee et al., 2003). This cluster was identified as priority cluster because of these drug-resistant mutations and the higher transmission rate. Cluster 07BC20 was a TDR cluster with K103N. The latest study showed NNRTIassociated mutations including K103N/S had high percentages in ART-naïve and ART-treated individuals in China because of the currently available first-line ART regimens containing EFV or NVP (Zuo et al., 2020). Phylogenetic analysis showed that strains with K103N were concentrated in the main lineage of CRF07_BC among MSM circulated in Liaoning (Figure 2) (Han et al., 2013), and the median age of all individuals in this cluster was only 25 years (range 22-36); six individuals with K103N were diagnosed at RHI stage (75%, 6/8) and this TDR cluster also had the highest TR. These were favorable factors for HIV transmission. This could be the first report of concentrated transmission of HIV drug-resistant strains in China.
These results suggest that the infections in the priority intervention clusters with multiple risk factors might be the important high-risk individuals for HIV transmission in Shenyang and should deserve more attention. Some medium (5<size<10) or small clusters (size ≤5) might also have the potential for further expansion. Due to the limited number of HIV infections in this cluster in a relatively short time, the evaluation criteria of the priority intervention clusters cannot be reached in a timely fashion. According to the proportion of RHI in the cluster (>50%), we can estimate their potential for further expansion and carry out real-time monitoring in the future.

LIMITATION
There are still some limitations in this study. First, the delay between HIV diagnosis and sequence acquisition could limit network-based intervention. Second, in this study, only baseline information of all infected individuals was analyzed. Lack of more follow-up information including clinical data and treatment information could lead to missing some ART failure individuals. Third, the inferred transmission network from viral sequences does not definitively determine epidemiologic linkage or imply direct transmission. At last, not all the HIV infected individuals could be found and included in this molecular network analysis, therefore, real-world HIV transmission may be underestimated by our molecular network analysis, including the transmission rate of molecular cluster.

CONCLUSION
Overall, we identified priority targets for intervention efforts in Shenyang using a comprehensive analysis combining in-depth sampling HIV-1 molecular networks construction and baseline epidemiological information and provided some useful experience regarding using molecular epidemiological methods for complex local HIV epidemics with multiple circulating HIV-1 subtypes.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Institutional Review Board of China Medical University. The patients/participants provided their written informed consent to participate in this study.