Development and validation of a scoring system for the prediction of HIV drug resistance in Hubei province, China

Objective The present study aimed to build and validate a new nomogram-based scoring system for the prediction of HIV drug resistance (HIVDR). Design and methods Totally 618 patients with HIV/AIDS were included. The predictive model was created using a retrospective set (N = 427) and internally validated with the remaining cases (N = 191). Multivariable logistic regression analysis was carried out to fit a model using candidate variables selected by Least absolute shrinkage and selection operator (LASSO) regression. The predictive model was first presented as a nomogram, then transformed into a simple and convenient scoring system and tested in the internal validation set. Results The developed scoring system consisted of age (2 points), duration of ART (5 points), treatment adherence (4 points), CD4 T cells (1 point) and HIV viral load (1 point). With a cutoff value of 7.5 points, the AUC, sensitivity, specificity, PLR and NLR values were 0.812, 82.13%, 64.55%, 2.32 and 0.28, respectively, in the training set. The novel scoring system exhibited a favorable diagnostic performance in both the training and validation sets. Conclusion The novel scoring system can be used for individualized prediction of HIVDR patients. It has satisfactory accuracy and good calibration, which is beneficial for clinical practice.


Introduction
Antiretroviral therapy (ART) has decreased global mortality and morbidity, while also improving the life expectancy of people living with HIV (PLWH) (GBD 2017HIV collaborators, 2019Koay et al., 2021). Initially, ART was required to be used only in immunologically suppressed patients with CD4 counts <200 cells/ mm 3 , but has been applied to all PLWH regardless of CD4 count since 2016 (World Health Organization, 2015;World Health Organization, 2016;Koay et al., 2021). However, with the popularity of ART, the issue of HIV drug resistance is receiving more and more attention. HIV, as an RNA virus, is characterized by an increased error rate during reverse transcription of the RNA helix, producing strains of the virus with drug-resistant mutations. This is defined as acquired HIV drug resistance (ADR) (Wei et al., 1995;McCluskey et al., 2019;Giacomelli et al., 2020). In addition, ADR variants can transmit the mutated virus to untreated individuals, resulting in transmitted HIV drug resistance (TDR) (Zhukova et al., 2017;Blassel et al., 2021;World Health Organization, 2021b).
HIV drug resistance (HIVDR) is one of the problems in combating the HIV epidemic, elevating the risk of PLHIV whose disease continues to be infectious. In 2021, the Joint United Nations Programme on HIV/AIDS (UNAIDS) released the "Fast-Track strategy" initiative with the following goals. By the year 2030, there is a global consensus to aim for 95% of all PLWH knowing this diagnosis, 95% of those diagnosed receiving treatment, and 95% of those on treatment achieving sustained virologic suppression (Joint United Nations Programme on HIV/AIDS, 2015; Collier et al., 2019). For these goals, we need not only HIV drugs with durable efficacy, tolerability and safety (Mbhele et al., 2021), but also a rapid and convenient screening testing strategy for people at high risk with DRM.
At present, there are many methods for drug resistance gene detection, including Illumina NGS, Sanger population sequencing, AS-PCR, 454 pyrosequencing, and other new approaches (Mbunkah et al., 2020;Metzner, 2022;Pyne et al., 2022). However, assays assessing HIV genotypic and phenotypic drug resistance are more complex, with problems such as high testing cost and no uniform quality index guidance (Harrigan and Cote, 2000;Hirsch et al., 2008). Meanwhile, despite the relatively elevated diagnostic accuracy, the above methods could not be applied in most hospitals, especially primary hospitals, due to high cost and strict requirements for medical equipment.
In recently years, the creation of mathematical models based on various markers has been increasingly used in medicine with the development of analytical methodologies. Unfortunately. previous studies have focused only on risk factors for drug resistance, using a single indicator with poor predictive power. Several studies have demonstrated that HIV-1 is prone to drug resistance mutations (DRMs) due to prolonged ART exposure and poor adherence, leading to viral rebound and treatment failure (Larder and Kemp, 1989;Rhee et al., 2020;Blassel et al., 2021;Koay et al., 2021). In addition, the extent of DRM may also depend on individual characteristics, the type of regimen, baseline CD4 + T cell count and viral load (Karade et al., 2018). As proven by these studies, the occurrence of HIVDR varies from patient to patient, which illustrates the importance of developing and using risk prediction models for HIV drug resistance. Therefore, our main objective was to generate a multivariable logistic regression prediction model based on a mixture of clinical factors to predict HIVDR. Then, a unique scoring system was build using the primary prediction model's modified nomogram for easy clinical application. Additionally, in the retrospective analysis, we internally verified the diagnostic value of the improved scoring model.

Study design and participants
The study was a retrospective HIV/AIDS cohort collected from the AIDS Prevention and Control Information System (AIDS-PCIS). We incorporated patients with HIV/AIDS registered in four regions of Hubei province, including Wuhan, Huangshi, Jingmen and Xianning, from June 2017 to June 2022. Totally, 618 PLWH were enrolled and divided into two groups. 70% of participants (N=427) were randomly allocated to the training set and the remaining 30% (N=191) in the validation set. We performed genotypic drug resistance testing in all participants. The sequences were sent to the Stanford HIV Drug Resistance Database (http://hivdb.stanford.edu) to be evaluated for antiretroviral resistance using the list of major HIV-1 resistance mutations (major HIV-1 drug resistance mutations) standardized by the Stanford HIV Database (http://hivdb.stanford.edu/assets/ media/resistance-mutation-handout feb2019.b0204a57.pdf). We defined patients who were potentially resistant or differentially resistant to any of the antiretroviral drugs as the drug-resistant (DR) group, and patients who were drug-sensitive as the drugsensitive (DS) group.
Inclusion criteria were: (1) confirmed as a PLWH; and (2) ≥18 years old. Exclusion criteria were: (1) pregnancy and lactation in women; (2) refusal of drug resistance testing. Variables of PLWH including age, sex, body mass index (BMI), transmission route, duration of ART, adherence, CD4 T-cell count, HIV viral load and laboratory parameters were collected before the drug resistance testing. Treatment adherence was assessed by the proportion of days covered, which is the sum of days during follow-up. Poor adherence was defined as the proportion of days covered less than 80%. HIV viral load was logarithmically transformed before being included. The Wuhan Jinyintan Hospital's ethics committee approved the study protocol, and all participant data were obtained anonymously.

Statistical analysis
Although we did not officially compute sample size, we evaluated the data sufficiency using the event per variable method (Moons et al., 2015). We used multiple imputation by chained equation (mice) package to impute missing values of baseline parameters (Zhang, 2016). Continuous and categorical data were presented as number (percentage, %) and median (interquartile range, IQR), respectively. Group comparisons between drug resistance (DR) and drug sensitive (DS) participants were performed by the Mann-Whitney U test, Chi square test or Fisher's exact test, as suitable. Additionally, the best effective predictors were selected by the Least Absolute Shrinkage and Selection Operator (LASSO) technique. Then, the variables extracted were assessed by multivariable logistic regression analysis (MLRA) to construct a predictive model. With the independent variables obtained in the MLRA, a scoring system was created based on a nomogram using the RMS package in R. The scores of each variable and individual were obtained by the following steps: ①Variables scoring: the highest scoring variable in the MLAR was designated as 10 points (HIV viral load *7 in this study). Scores for the other variables were calculated by equating them according to the model coefficients and rounded to integers at the last. ②Patients scoring: the scores of each patient were the sum of the scores of all variables in the scoring system. The predictive power of the scoring system was evaluated by computing the area under the curve (AUC). The Hosmer-Lemeshow test was used to assess the model's calibration, and decision curve analysis was employed to evaluate its clinical utility (DCA). Another independent dataset was employed for further validation. All analyses were conducted with SPSS version 26.0 (IBM Inc., Chicago, IL, USA) and R Project version 4.2.0 (http://cran.r-project.org). Two-sided p<0.05 was considered statistically significant.

Study population
In this study, of the 618 patients included, most were male (83.2%, 514/618), and mean patient age was 39.5 years (IQR: 24.0-55.0 years). HIV drug resistance testing were performed in 303 patients before starting ART therapy, while the remaining 315 patients were tested after more than 6 months of ART therapy. The major infection route was sexual transmission: heterosexual transmission (50.2%) and MSM (48.1%). Among the DRMs detected, antiretroviral resistance was observed in 47.4% (293/ 618) of PLWH undergoing ART therapy ( Table 1). As presented in Figure 1, NNRTIs showed elevated frequencies of at least one primary mutation, i.e., in 45.1% (279/618) of patients, while for NRTIs and PIs, these frequencies were 24.5% (152/618) and 1.6% (10/618), respectively.

Construction of nomogram and scoring system
In the training set, 33.5% (207/618) were HIV-DR patients, and most of the variables included in this study were significantly different between the DR and DS groups (Supplementary Table). To develop a highly accurate predictive model, LASSO regression was employed to select the most potent parameters ( Figure 2). Finally, five independent risks, i.e., age, duration of ART, treatment adherence, CD4 T cell count and HIV viral load, were selected to establish the predictive model ( Figure 3). And then, a nomogram ( Figure 4A) based on the multivariable logistic model was generated, which showed a good calibration ( Figure 4B). Decision curve analysis (DCA) was used to evaluate the clinical value of the diagnostic nomogram. As shown in Figure 4C, HIV patients would benefit more from utilizing this diagnostic nomogram than from acting on the all-or-none principle at a threshold probability of 0.3. Furthermore, we transformed the nomogram into a scoring system with integer points to make this predictive model more accessible for doctors to utilize in clinical practice: age (2 points), duration of ART (5 points), adherence (4 points), CD4 T cells (1 point) and HIV viral load (1 point) ( Table 2).

Predictive effectiveness of the scoring system in the training set and validation set
Based on a cutoff value of 7.5 points, PLWH were more likely to be diagnosed with DR in the training set with a total number of points greater than 7.5, whereas they were less likely with a total number of points below 7.5. The corresponding sensitivity, specificity, PLR and NLR values for 7.5 points as the ideal cutoff were 82.13%, 64.55%, 2.32, and 0.28, respectively (Table 3). In addition, the AUC of this scoring system were 0.812(95%CI=0.772-0.867) and 0.808(95%CI=0.747-0.868) in the training set and validation set ( Figures 5A, B), respectively. The scoring system also showed satisfactory calibration in both datasets. (Figures 5C, D).
pathogens has caused therapeutic failure under low adherence and long duration of treatment (Zhang et al., 2020). In our study, the rate of drug resistance is 33.5%, with NNRTIs accounting for 45.2%, which is in concordance with other research (Beyrer and Pozniak, 2017). As transmitted or acquired DRMs constitute significant risk factors for ART effectiveness and AIDS therapy (Günthard et al., 2019;Li et al., 2021), HIV genotypic resistance testing has been advised for ART initiation, failure, and modification (Department of Health and Human Services, 2020). The development of DRM is a serious danger to the ongoing management of HIV replication as well as the possibly associated rise in viral strain transmission, which could increase the incidence of TDR (Tanaka et al., 2019;Santos-Pereira et al., 2021).
A first-line regimen for adults and adolescents was proposed in 2016 by World Health Organization, containing two nucleoside reverse-transcriptase inhibitors (NRTIs) with either a nonnucleoside reverse-transcriptase inhibitor (NNRTI) or an integrase inhibitor (INSTI) (World Health Organization, 2016).  In 2021, WHO recommended DTG combination with an NRTI backbone as the preferred first-line regimen for PLHIV-infected individuals initiating ART (World Health Organization, 2021a).Many studies (Rocheleau et al., 2018;Crowell et al., 2021;Hackett et al., 2021;World Health Organization, 2021b) have shown that both NNRTIs and NRTIs induce high levels of HIV drug resistance among individuals with treatment failure (VL>1000 copies/ml), which collectively corroborate our results. In the present cohort, the DRM rates of NRTIs, NNRTIs, PIs and INSTIs were 24.8%, 45.2%, 0.2%, and 0%, respectively ( Figure 5). These data also indicated that INSTIs represented by DTG induce a lower prevalence of drug resistance compared with NNRTIs. Furthermore, a study (McClung et al., 2022) determined a prevalence for transmitted drug-resistance mutations (TDRM) of 18.9% among individuals developing drug resistance within 3 months of treatment in the United States from 2014-2018. Based on the same criterion, we determined the proportion of DRM developed prior to treatment was 29.7%, and the main drugs causing resistance were NNRTIs and NRTIs. Furthermore, most of the methods used to predict HIVDR are not sufficiently generalized for normal clinical practice due to inconvenience. Therefore, it is crucial to design a feasible and simple method to diagnose drug resistance. Based on several major parameters, we constructed a predictive model by selecting the most important indicators by the LASSO regression. This nomogram for predicting HIVDR in PLWH patients Discrimination and calibration of the scoring system for the prediction of HIVDR in the train and validation sets. ROC curves of the scoring system in the training set (A) and validation set (B). Calibration curves of the scoring system in the training set (C) and validation set (D).
incorporated 5 variables, including age, duration of ART, adherence, CD4 T cells and HIV viral load, was built. In the training and validation sets, this nomogram had good calibration, diagnostic performance, and clinical utility. We transformed the nomogram into a scoring system for clinical application; this scoring system also demonstrated excellent diagnostic performance in the training and validation sets. It is important to note that our scoring method was based on a variety of clinical and laboratory indicators that are readily available in most hospitals, even community hospitals, with an acceptable overall cost. In addition, we analyzed the clinical significance of these predictors. Regarding "duration of ART", it is widely admitted that the longer the ART administration, the higher the risk of DRM. Nicholas (Nii-Trebi et al., 2013) found the duration of ART affects the development of DRM, and prolonged ART treatment increases the rates of virological failure (VF) and drug-resistant mutations. As shown in the current study, with a duration of treatment below 6 months, the incidence of DRM was 29.7% (90/303). Once the time surpassed 6 months, this incidence increased to 64.4% (203/315). Poor adherence (e.g., an irregular and unregulated use of medication) could have a significant impact on DRM. Evidence suggests PLWH face great challenges associated with poor ART adherence and HIV-1 drug resistance (Benson et al., 2020;Myer et al., 2020). Moreover, CD4 T cell count, serving as a crucial immunological indicator, could reflect immunological function and immune reconstitution partly; some studies also reported a negative association between elevated CD4 T cell count and HIV-MDR (Xiao et al., 2017;Lombardi et al., 2021). However, another study confirmed the opposite outcomes (Schultze et al., 2019), which might suggest CD4 count is affected by influencing factors, including incomplete immune reconstitution and acute inflammatory response. In addition, HIV viral load also contributed to the mutation of drug resistance genes. Jonah et al. (Omooja et al., 2019) found ADR prevalence could be as high as 73.2% among VF cases. In addition, studies also confirmed HIVDR in more than half of patients with VF (Tchouwa et al., 2018;Yan et al., 2022).
Interestingly, there were strong correlations (P<0.01) between drug resistance and some factors such as delayed treatment, Hb and BMI (Table 1 and Supplementary Table 1), although the latter were not included in the prediction model by multivariable regression analysis. Their effects on DRM need further excavation and validation. Regarding treatment delay, we speculated that the earlier the ART initiation, the higher the CD4 T-cell count, the lower the HIV viral load and the better immune reconstitution, which may lead to a stable intra-organismal environment under antiviral therapy.
Our study has several advantages. Firstly, individual indicators either have low sensitivity or low specificity, and the combined indicators have better comprehensive predictive ability. Secondly, the indicators selected for the model are simple, convenient, rapid and inexpensive, which can be promoted and used in primary hospitals. Thirdly, a scoring model for facilitates clinical application is established to help clinicians identify HIV drug resistance highrisk groups and thus to determine them for gene testing, the gold standard of drug resistance. This way could reduce treatment failure and avoids wasting medical resources.
However, the patient selection in our study was biased and not randomized. To gain high-level evidence for the potential clinical applicability of the scoring system in the future, multicenter validation of the scoring system with a sizable research population is urgently required.
In conclusion, the novel scoring system is based on five easily accessible clinical parameters and shows excellent diagnostic performance and favorable calibration in determining the susceptibility to DRM in PLHIV. We recommend the widespread application of this novel scoring model in HIV-designated hospitals to identify patients at increased risk of DRM quickly and cost-effectively.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.