Local Epidemics Gone Viral: Evolution and Diffusion of the Italian HIV-1 Recombinant Form CRF60_BC

The molecular epidemiology of HIV-1 in Italy is becoming increasingly complex, mainly due to the spread of non-B subtypes and the emergence of new recombinant forms. We previously characterized the outbreak of the first Italian circulating recombinant form (CRF60_BC), occurring among young MSM living in Apulia between the years 2009 and 2011. Here we show a 5-year follow-up surveillance to trace the evolution of CRF60_BC and to investigate its further spread in Italy. We collected additional sequences and clinical data from patients harboring CRF60_BC, enrolled at the Infectious Diseases Clinic of the University of Bari. In addition to the 24 previously identified sequences, we retrieved 27 CRF60_BC sequences from patients residing in Apulia, whose epidemiological and clinical features did not differ from those of the initial outbreak, i.e., the Italian origin, young age at HIV diagnosis (median: 24 years; range: 18–37), MSM risk factor (23/25, 92%) and recent infection (from 2008 to 2017). Sequence analysis revealed a growing overall nucleotide diversity, with few nucleotide changes that were fixed over time. Twenty-seven additional sequences were detected across Italy, spanning multiple distant regions. Using a BLAST search, we also identified a CRF60_BC sequence isolated in United Kingdom in 2013. Three patients harbored a unique second generation recombinant form in which CRF60_BC was one of the parental strains. Our data show that CRF60_BC gained epidemic importance, spreading among young MSM in multiple Italian regions and increasing its population size in few years, as the number of sequences identified so far has triplicated since our first report. The observed further divergence of CRF60_BC is likely due to evolutionary bottlenecks and host adaptation during transmission chains. Of note, we detected three second-generation recombinants, further supporting a widespread circulation of CRF60_BC and the increasing complexity of the HIV-1 epidemic in Italy.


INTRODUCTION
The human immunodeficiency virus (HIV-1) is a well-known moving target. HIV-1 high variability and rapid adaptation impose an additional challenge to its diagnosis, treatment, and prevention (Hemelaar, 2013). However, HIV-1 evolutionary potential can be harnessed to understand its complex dynamics, both at cellular and population level. Thanks to viral diversity, molecular epidemiology allows to track HIV-1 global and local epidemics and provide insights on how to control their further diffusion (Hemelaar, 2012).
The pandemic group M spread worldwide and differentiated over time into nine distinct subtypes (A-D, F-H, J, and K) and 8 sub-subtypes (A1-A6, F1, and F2) (Peeters et al., 2014). In addition to HIV-1 error prone reverse transcriptase and high replicative potential, recombination plays a key role in its variability (Hu and Hughes, 2012). Due to recombination, when two or more subtypes co-circulate among high risk groups in the same geographical area, unique recombinant forms (URFs) emerge and, if they are detected in at least three epidemiologically unrelated patients, they gain the role of circulating recombinant forms (CRFs) (Robertson et al., 1995). The number of identified CRFs has grown dramatically; currently, 98 CRFs have been characterized and their virological and epidemiological data are available at the Los Alamos HIV database 1 . The first CRFs have been found mainly in central Africa and South America, where multiple subtypes co-circulate at high prevalence . In the past few years, some of these recombinant progenies have caused outbreaks (Monno et al., 2012;Gonzalez-Domenech et al., 2018) and in some cases have shown an increased fitness (Rawson et al., 2018). For example, the Cuban CRF19_cpx, has been associated with a higher viral load, a more rapid progression to AIDS (Kouri et al., 2015) and antiretroviral resistance mutations (Kouri et al., 2014). HIV-1 M chimeric forms, URFs and CRFs altogether, are estimated to represent at least 20% of infections globally, with CRF01_AE and CRF02_AG being the most prevalent, accounting for about 13% of all infections (Osmanov et al., 2002). Despite their origin, some CRFs gained epidemiological relevance in previously B subtype-restricted countries with lower HIV-1 prevalence. Within the last years, multiple new CRFs have been identified in European countries such as Russia (Liitsola et al., 1998;Lukashov et al., 1999;Baryshev et al., 2014), Portugal, Spain (Delgado et al., 2002;Fernandez-Garcia et al., 2016), Italy (Simonetti et al., 2014), United Kingdom (Foster et al., 2014), Luxembourg (Struck et al., 2015), and France (Leoz et al., 2013;Recordon-Pinson et al., 2018).
The molecular epidemiology of HIV-1 in Italy, previously a B-restricted area, has become complex, as migratory waves and travels favored the penetration of HIV-1 forms from countries where non-B subtypes prevail (Lai et al., , 2014a(Lai et al., ,b, 2016. A comprehensive study reported that non-B variants increased dramatically since the beginning of the epidemic with a prevalence of 15.8 and 29.7% of all HIV-1 diagnoses from 2006 to 2016 (Rossetti et al., 2018). After subtype B, the three 1 https://www.hiv.lanl.gov/content/sequence/HIV/CRFs/CRFs.html most abundant pure subtypes in Italy were F, A and C, and CRFs represented almost 50% of non-B forms (Lai et al., 2010). Monno et al. (2012) reported a sudden HIV-1 outbreak among young men-who-have-sex-with-men (MSM) residing in Apulia, a region of southern Italy. This outbreak was caused by an intersubtype recombinant, which was identified as the first Italian CRF, named CRF60_BC, by a subsequent study (Simonetti et al., 2014). This recombinant form is mostly composed of subtype C with the exception of three fragments from a parental B subtype which covered a part of the 5 LTR (HXB2 positions 0-462), a small region of RT pol gene (HXB2 positions 2,804-3,037) and the last part of env gene. Nef gene and most of the 3 LTR (HXB2 positions 8,662-9,548) were also subtype B. At the time of its characterization, the outbreak involved 22 individuals residing in Apulia and additional few patients from Emilia-Romagna, Tuscany and Sicily, all diagnosed between 2009 and 2011.
Here, we present a follow up study in which we investigated the further circulation and evolution of CRF60_BC. We combined epidemiological data with phylogenetic analyses on sequences retrieved in Apulia and those available from a nationwide database. We observed an increasing population size and evolution of CRF60_BC, whose spread resulted also in novel recombination events. Overall, the present study portrays an increasingly complex and dynamic landscape of the HIV-1 Italian epidemic.

Patients
We collected sequences and clinical data from patients, harboring CRF60_BC, enrolled at the Infectious Diseases Clinic of the University of Bari. We found additional sequences from the national ARCA cohort. Putative CRF60_BC sequences were searched among those assigned to subtype C or CRF31_BC in pol gene, according to a preliminary subtyping based on BLAST. As the CRF60_BC contains a small portion of B subtype in this region, the BLAST search lead to a subtype misclassification. Sequences were retrieved from the genotypic drug resistance assay performed by the local Services at diagnosis or prior to the start of therapy or at treatment failure. Only the first available resistance genotype was considered for downstream analyses. Genotypic resistance from plasma HIV-RNA was determined by bulk Sanger sequencing using commercially available or homebrew methods, depending on the contributing laboratory.
ARCA is an observational HIV cohort approved by the Regional Ethical Committee of Tuscany (Comitato Etico Area Vasta Toscana Sudest). The study was conducted in accordance with the 1964 Declaration of Helsinki and the ethical standards of the Italian Ministry of Health. The national ARCA database, currently contained demographic and viroimmunological data of 40,654 patients enrolled at 90 Clinical Centers, and at list one sequence before the start of antiretroviral therapy for all subjects. All patients belonging to the ARCA database provided informed consent to have their anonymized data stored on a central server and used for research studies. For all patients, epidemiological data (gender, risk category, country of origin, date of diagnosis, and age) were collected by physicians from medical records and then included in the databases together with virological, immunological, and treatment information through standard procedures cleared by the Southeast Tuscany Ethics Committee. When available, analyses were conducted on multiple HIV-1 regions: protease (PRO), reverse transcriptase (RT), integrase (INT), gp120 and gp41. The estimated time from infection was calculated by computing the fraction of ambiguous nucleotides in the PRO-RT sequences for all patients, as previously reported (Kouyos et al., 2011).

Alignment and Phylogenetic Analysis
BioEdit version 7.2.6.1 2 was used to manually align query sequences with pure subtypes and CRF reference sequences. The alignment of a comprehensive list of reference sequences of different subtypes (A1, A2, B, C, D, F1, F2, G, H, J, K) was retrieved from the Los Alamos HIV database 3 . In particular, we included 47, 37, 36, and 37 reference sequences for PRO/RT, INT, gp120, and gp41 fragments, respectively. The evolutionary model that best fitted the data was selected using the information criterion implemented in JmodelTest v2.1.7 (Posada, 2008). We then conducted phylogenetic analyses for subtyping on more than 10,000 sequences from the national ARCA cohort with maximum likelihood approach implemented in MEGA v7.0 program to retrieve additional CRF60_BC circulating in Italy.
The monophyletic nature of the sequences was evaluated by means of the MrBayes program (Huelsenbeck and Ronquist, 2001) using a general-time reversible (GTR) or Hasegawa-Kishino-Yano (HKY) model of nucleotide substitution, a proportion of invariant sites, and gamma distributed rates among sites for PRO, RT, gp120 and gp41 regions, respectively. For INT region HKY plus a proportion of invariant sites was used. The final alignment encompassed 1,259, 1,029, 514, and 858 nucleotides for PRO/RT, gp120, gp41, and INT regions, respectively.
A Markov Chain Monte Carlo (MCMC) search was made for 2 × 10 6 generations using tree sampling every 100 th generation and a highly conservative burn-in fraction of 50% to obtain a better signal of convergence. The analysis was run until reaching the average standard deviation < 0.01. Statistical support for specific clades was obtained by calculating the posterior probability of each monophyletic clade, and a posterior consensus tree was generated after a 50% burn-in.
Pairwise distances were measured with MEGA v7.0 (Kumar et al., 2016), as the p-distance of each taxa from the CRF60_BC sequence harbored by the patient with the oldest time of diagnosis (patient BAV275, year 2005), with variance estimation performed using 1,000 bootstrap replicates. Spearman test of correlation between genetic distance and time of diagnosis was conducted with GraphPad Prism v7.0. Nucleotide differences across CFR60_BC sequences from Apulia were plotted with Highlighter (Keele et al., 2008) 4 , with sequences sorted according to tree topology; ambiguous IUPAC codes were excluded from the graph.
Similarity and bootscanning plots implemented in Simplot software v3.5.2 5 were used to identify the subtypes involved in the recombination and their breakpoints to reconstruct recombination pattern. Maximum Likelihood phylogenetic trees were constructed for individual segments identified within each breakpoint with bootstrapping on 1,000 replicates using MEGA v7.0 program. Recombination was also characterized by split decomposition method and neighbor-net network generated with SplitsTree program 6 . Final trees were visualized and annotated with FigTree v1.4.0 7 .
HIV BLAST tool 8 was used to retrieve sequences showing high similarity and the same BC mosaic pattern within the Los Alamos HIV Sequence Database. The prediction of HIV-1 co-receptor usage was performed using Geno2Pheno 9 .

Phylodynamic Analysis
Dated trees of the CRF60_BC portions (INT, gp120, and gp41) not containing recombination breakpoints were estimated by BEAST software v1.8.4 (Drummond and Rambaut, 2007) using the previously selected models. For this purpose, only sequences with known date of collection were used. Chains were run for 10-30 million generations with sampling every 1,000-3,000 generations. The results were visualized in Tracer v1.5 and uncertainty in the estimates was indicated by 95% highest probability density (HPD) intervals. Convergence was assessed based on the ESS (effective-sample size) value and only parameter estimates with ESS > 300 were accepted. The maximum clade credibility (MCC) tree was then selected from the posterior tree distribution using TreeAnnotator v1.8.4 available in the BEAST software package. Final trees were visualized and edited with FigTree v1.4.0.

Evolution of the CRF60_BC Outbreak in Apulia
We first investigated whether the CRF60_BC had spread further locally, where the original outbreak occurred in 2009-2011. In addition to the 24 sequences previously described (Monno et al., 2012), we retrieved from local clinical centers, 27 new CRF60_BC sequences from patients residing in Apulia. The epidemiological and clinical features of the patients carrying this variant did not differ from those of the initial outbreak (Table 1). Similarly to the patients previously described, these individuals were Italian (100%) with young age at HIV-1 diagnosis (median: 26 years; range: 18-53) and all but one were MSM (25/26, 96%). The majority of these patients had a recent diagnosis in 2012-2017 and a CD4+ T-cell count at diagnosis above Male 91.7 (22) 100 (27)  350 cells/µl in 23 out 27 cases (median: 506 cells/µl; range: 46-1,187). Three subjects (BAV718, BAV731, BAV851) had levels of plasma HIV-RNA above 10 6 copies/ml, suggesting that they were diagnosed before reaching a viral set point. These data support that most of the patients carrying the CRF60_BC variant were infected recently, during the last few years after the initial outbreak. However, we identified three subjects, BAV275, BAV334, BAV420, whose HIV-1 infection occurred before the initial outbreak, as their diagnosis dated back to 2005, 2006, respectively (Simonetti et al., 2014. These patients were recorded in the ARCA database after the characterization of CRF60_BC. Last negative test was available for 10 subjects and confirmed a seroconversion time within 12 months in six cases. The analysis of the fraction of ambiguous nucleotides provided strong evidence of a recent infection (<1 year) for 77.8% (n = 21) of subjects. The Bayesian tree of patients from Apulia is shown in Figure 1A. Twelve out 27 (44.4%) recently identified strains showed high similarity with the majority of previous described sequences (15/24, 62.5%). We identified six significant clusters supported by a posterior probability higher than 0.80. Two of them corresponded to known partners or epidemiological links (pp = 1, BA889-BA921, BAV563-BAV564). The largest cluster (pp = 1) involved one of the sequences identified in 2011 and eight newly detected strains (year of diagnosis ranging between 2012 and 2017). Another large cluster (pp = 0.83) included four previously described CRF60_BC sequences and one sequence identified in 2012. The remaining two clusters (pp = 1 and pp = 0.98, respectively) involved two strains each; the first included two new sequences (BA840 and BA753) and the second a previously described strain (BA667) with a new one (BA765). The emergence of new cases of HIV-1 infection due to CRF60_BC could be explained both by an increase of diagnoses after the initial outbreak (more effective sampling) and actual onward dissemination. To further understand whether CRF60_BC has been involved in ongoing transmission chains, we looked at sequence diversity and accumulation of mutations over time (Figures 1B,C). We observed the emergence of non-synonymous mutations strongly associated with CD8+ T-cell immune pressure against epitopes in gag (HXB2 position 498) and pol (HXB2 positions 69, 72, 128, and 314) (Addo et al., 2003;Allen et al., 2005). Some mutations were fixed over time, like in the case of the large transmission cluster encompassing sequences from 2011 to 2017, in which we observed reversion to subtype B consensus of a known conserved pol epitope (HXB2 pol 314) associated with HLA-I responses (Addo et al., 2003;Apps et al., 2013). In addition, when we estimated pairwise distances of all sequences (n = 47) from the oldest isolate (2005), we detected a growing overall diversity that strongly correlated with time of HIV-1 diagnosis (r = 0.82, p < 0.0001), further supporting ongoing evolution of CRF60_BC over time.

Diffusion of CRF60_BC Outside Apulia
To investigate the further diffusion of CRF60_BC, we conducted subtyping analyses on the ARCA database retrieving 25 Frontiers in Microbiology | www.frontiersin.org sequences belonging to CRF60_BC (data not shown). Figure 2 shows the Bayesian tree containing 78 sequences: the original and the newly identified sequences from Apulia (total, n = 51), sequences from different Italian regions obtained from the ARCA database (n = 25) and two sequences detected using BLAST search (GU969560 and MF109718).
Sequences assigned to CRF60_BC formed a highly supported cluster (pp = 1). Twenty-five sequences were detected across the Italian territory (Rome, Florence, Modena, Arezzo, Fermo, Pavia, Genoa, Perugia, Trento, and Turin). Some sub clusters were observed, containing 2 to 9 strains. Sequences from Apulia segregated alone, except for four sequences. Two strains (BA840 and BA753) grouped significantly (pp = 1) with a sequence from Modena (MOV-MO-1890) while the remaining two (BA889 and BA921) formed a sub cluster with a strain from Perugia (SI-19344) and a sequence detected from a patient in United Kingdom (MF109718). This latter sequence, retrieved by BLAST, had no demographic information apart from the sampling date (November 2013), preventing us to rule out whether the patient acquired the infection in his/her country of origin or abroad. All sequences from Rome (n = 7), with the exception of RM9-03323, formed a significant (pp = 0.82) sub cluster. Three sequences from Tuscany (one from Siena: SI-12520 and two from Florence: FIV-7490 and FIV-7502) significantly grouped (pp = 0.95) with a sequence from central Italy (ANV-1610-229). A second strain was found by BLAST, isolated in Sicily from an African man with unknown risk factors.
The epidemiological characteristics of subjects carrying CRF60_BC residing outside Apulia (Table 1) did not differ substantially from those residing in Apulia. All but one patients were males (22/23, 96%), with a median age of 25 years (range: 21-52), diagnosed from 2009 to 2014 and, for subjects with known risk factors, predominantly MSM (13/14, 92.8%). The last negative HIV-1 test, available only for three patients, suggested a recent seroconversion (less than 12 months) in two cases. However, the analysis of the fraction of ambiguous nucleotides indicated that the majority of patients (59.3%, n = 16) had been infected for more than 1 year.

Emergence of Second Generation Recombinant Forms
Three patients harbored URFs in which CRF60_BC was one of the parental strains, as indicated by the phylogenetic trees shown in Supplementary Figures S1, S2, S3A. Figure 3 shows the bootscan analysis of these sequences and the CRF60_BC bootscan of pol region. Protease and RT regions of two of these sequences clustered significantly with CRF60_BC confirming the recombinant BC pattern, but different recombination patterns were detected in INT for strain TNV-0015 ( Figure 3A) and in gp120 for BA712, in which CRF60_BC is subtype C. The bootscan analysis identified a CRF60/G mosaic pattern with two breakpoints at positions 4,848 and 4,907 (as mapped to HXB2 reference) with a 60 base-pair fragment of subtype G in the middle of the integrase region. Recombination analysis on BA712 identified a CRF60/B recombinant with one breakpoint at position 6,710 ( Figure 3B). This recombination structure in gp120 resulted in a switch in co-receptor viral tropism from CCR5 to CXCR4. The sequence SRV8810, showed a B/CFR60 mosaicism in the PRO-RT region ( Figure 3C) with a portion belonging to B subtype at the 3 end of protease (position 2,489). Supplementary Figures S1, S2, S3B,C show the maximum likelihood trees for individual segments identified within each breakpoint confirming the recombination pattern for all recombinants. Despite the presence of TNV-0015 (Supplementary Figure S1C) and SRV8810 (Supplementary Figure S3B) in subtype G and B clades respectively, these clusters are not bootstrap supported due to the limited length of the analyzed fragments.

Phylodynamic Analysis
Evolutionary population dynamics were estimated based on the data sets of non-recombinant regions (gp120 and gp41) of CRF60_BC sequences from Apulia with known data. The INT portion was not evaluated due to the limited availability of sequences. After testing different coalescent priors (constant population size, exponential growth, logistic growth, and Bayesian skyline plot), using both strict and relaxed molecular clock models, the Bayes factor (BF) analysis showed that the relaxed lognormal molecular clock fitted the data better than the strict clock model (2lnBF = 114.5) and the Bayesian skyline plot fitted the data better than the other models. The estimated mean rate of evolution was 7 × 10 −3 substitutions per site per year (s/s/y) (95%HPD: 5.1 × 10 −3 -9.2 × 10 −3 ) and 6.2 × 10 −3 s/s/y (95%HPD: 3.4 × 10 −3 -9.3 × 10 −3 ) for the gp120 and gp41, respectively. The time of the most recent common ancestor (tMRCA) corresponding to the root of the trees was estimated to be 11.4 years before the latest sampling data for both regions (95%HPD: 11-12.85 years), corresponding to mid 2004, very close to the year of diagnosis of the oldest sequence in our data set (BAV275, 2005).
The Bayesian Skyline plot (Supplementary Figure S4) shows the changes in population size at different times from the root of the tree to the time of the most recent samples (year 2016 and 2017). The estimated effective number of infections exponentially grew from the middle of 2006 to 2011, when the initial outbreak was identified; after this period, the curve reached a plateau and remained relatively constant until the latest calendar years.

DISCUSSION
Although the first CRFs have been found in Africa and mainly Central Africa, inter-subtype recombination events have also occurred in other continents, leading to the spread of new CRFs. Several CRFs have been described in Europe, as different non-B subtypes circulate together with the founder B clade and are detected at increasing prevalence almost in all countries (Tebit and Arts, 2011). Several European CRFs show a mosaic structure due to multiple recombination events with different parental subtypes; however, the B clade is the most frequent parental subtype. Indeed, HIV-1 superinfection, which underlies the recombination events, has been frequently reported in MSM due to the numerous risk contacts (Redd et al., 2013). As the B variant had a founder effect within intravenous drug users in some European countries including Italy (Lukashov et al., 1996), some of these CRFs (CRF03_AB and CRF14_BG) mostly segregate in this specific high risk population (Hemelaar, 2013). Some recombinant forms (i.e., CRF05_DF and CRF47_BF) have generated limited epidemics both in frequency and regional dispersion (Fernandez-Garcia et al., 2010), while others (CRF03_AB and CRF14_BG) have had a relevant impact on the HIV-1 epidemic, both in the countries of first detection and neighboring countries (Liitsola et al., 1998;Delgado et al., 2002).
We previously described a new CRF epidemic in a single Italian region (Apulia) mainly involving young MSM seeking sexual partners through on-line dating websites, who acquired HIV-1 infection in 2009-2011 and were diagnosed soon after seroconversion (Monno et al., 2012;Simonetti et al., 2014). The subtype C regions included in CRF60_BC are highly similar to the South American variant, suggesting that CRF60_BC originated in South America where subtypes B and C largely co-circulate and other BC inter-subtype recombinants have been documented (Hemelaar, 2013). Alternatively, CRF60_BC might have originated in Italy where, subtype C, mostly harbored by recent immigrants and Italian individuals who traveled to endemic regions, accounts for 13% of non-B variants (Lai et al., 2010). Additionally, our group recently conducted a phylogeographic analysis revealing that the occurrence of subtype C in Italy is related to the South American variant, likely due to transmission chains among MSM (Lai et al., 2014a). To follow-up the epidemic potential of the originally described CRF60_BC, we traced its spread through a large Italian database collecting HIV-1 sequences prospectively.
CRF60_BC was originally limited to the Apulia region but then reached distant regions of Italy. Increased sequence diversity after the initial outbreak suggests evolutionary bottlenecks along the transmission chains and adaptation to the host immune pressure. This observation is supported by the fact that some mutations, predominantly non-synonymous mutations, were fixed overtime and that the growing diversity among strains significantly correlates with the time of HIV-1 diagnosis. In this study, most CRF60_BC variants circulate among young MSM who recently acquired HIV-1 infection during the 2005-2017 period. These characteristics did not differ from those reported for patients described in the initial outbreak, indicating that CRF60_BC circulation is actually closely linked to MSM. Due to the higher prevalence in this population and HIV-1 increasing genetic diversity, the MSM population might become a new recombination hotspot, as observed in previous studies in China reporting several CRFs among MSM population (Han et al., 2013;Wu et al., 2013;Zhang et al., 2014). Noteworthy, the number of sequences identified so far is three fold higher compared to our first report.
In addition, we detected three second-generation recombinants which further support a widespread circulation of CRF60_BC, all of which showed recombination in different portions of the HIV-1 genome. In one case, the recombination pattern involved subtype G, an uncommon subtype that is present only at 6% of non-B subtype strains (Lai et al., 2010). In another case, recombination in gp120 entailed the coreceptor switch to CXCR4, despite the recent acquisition of HIV-1 infection in 2012 confirmed by CD4+ T-cell count and viral load measure at diagnosis. By identifying new recombinant forms with a BC pattern in MSM, we demonstrated that these individuals are likely to represent a preferential group that may give rise to HIV-1 recombinants.
The finding of a patient carrying CRF60_BC diagnosed in 2005 indicates that the circulation of this strain probably began in the early 2000s. The phylodinamic analysis set the origin of CRF60_BC in the mid 2004 (range: 2003-2005) suggesting that this subject could be one of the first cases of CRF60_BC. The exponential growth of infections sustained by CRF60_BC was observed in 2006-2011 period when most patients were identified, suggesting a rapid spread among young MSM with recent infection through Italy.
This study has some limitations. First, information related to studied patients was not always available or complete. Second, the data set in the current study may not be representative of the entire country and we may have missed a number of recently infected individuals present in additional local networks. Regarding the unique recombinants, additional studies involving single genome amplification and full-length genome sequencing, are required to extend the characterization of the recombination patterns and to exclude the role of superinfection. Despite our analysis might represent an underestimate, our data strongly suggest that CRF60_BC gained epidemic importance, spreading in multiple Italian regions and increasing its population size in very few years (2007)(2008)(2009)(2010)(2011)(2012)(2013), among young MSM. This is valuable information for public health agencies developing strategies to prevent the HIV-1 epidemic to spread into a more complex epidemiological landscape. In addition, our results further highlight the need for better prevention campaigns in young MSM, who represent a population with a poor control of HIV-1 transmission (Volz et al., 2018). Finally, future studies should continue to focus on transmission clusters and pay special attention to recombinant forms, as these variants not only reflect the changing landscape of HIV-1 diversity, but can also unveil the onset of new epidemic bursts.

ETHICS STATEMENT
ARCA is an observational HIV cohort approved by the Regional Ethical Committee of Tuscany (Comitato Etico Area Vasta Toscana Sudest). The study was conducted in accordance with the 1964 Declaration of Helsinki and the ethical standards of the Italian Ministry of Health. Patients included on ARCA database signed an informed consent and agreed to have their anonymized data stored on a central server and used for research studies.

AUTHOR CONTRIBUTIONS
AL and FS conceived and designed the study. GB, GA, and LM collected and analyzed the epidemiological and viral data of patients from Apulia. SG, GS, CM, MZ, SM, and PB provided epidemiological and viral data of patients from ARCA database. AL, FS, AB, and CB wrote the first draft of the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.