Investigating the effects of cytokine biomarkers on HIV incidence: a case study for individuals randomized to pre-exposure prophylaxis vs. control

Introduction Understanding and identifying the immunological markers and clinical information linked with HIV acquisition is crucial for effectively implementing Pre-Exposure Prophylaxis (PrEP) to prevent HIV acquisition. Prior analysis on HIV incidence outcomes have predominantly employed proportional hazards (PH) models, adjusting solely for baseline covariates. Therefore, models that integrate cytokine biomarkers, particularly as time-varying covariates, are sorely needed. Methods We built a simple model using the Cox PH to investigate the impact of specific cytokine profiles in predicting the overall HIV incidence. Further, Kaplan-Meier curves were used to compare HIV incidence rates between the treatment and placebo groups while assessing the overall treatment effectiveness. Utilizing stepwise regression, we developed a series of Cox PH models to analyze 48 longitudinally measured cytokine profiles. We considered three kinds of effects in the cytokine profile measurements: average, difference, and time-dependent covariate. These effects were combined with baseline covariates to explore their influence on predictors of HIV incidence. Results Comparing the predictive performance of the Cox PH models developed using the AIC metric, model 4 (Cox PH model with time-dependent cytokine) outperformed the others. The results indicated that the cytokines, interleukin (IL-2, IL-3, IL-5, IL-10, IL-16, IL-12P70, and IL-17 alpha), stem cell factor (SCF), beta nerve growth factor (B-NGF), tumor necrosis factor alpha (TNF-A), interferon (IFN) alpha-2, serum stem cell growth factor (SCG)-beta, platelet-derived growth factor (PDGF)-BB, granulocyte macrophage colony-stimulating factor (GM-CSF), tumor necrosis factor-related apoptosis-inducing ligand (TRAIL), and cutaneous T-cell-attracting chemokine (CTACK) were significantly associated with HIV incidence. Baseline predictors significantly associated with HIV incidence when considering cytokine effects included: age of oldest sex partner, age at enrollment, salary, years with a stable partner, sex partner having any other sex partner, husband's income, other income source, age at debut, years lived in Durban, and sex in the last 30 days. Discussion Overall, the inclusion of cytokine effects enhanced the predictive performance of the models, and the PrEP group exhibited reduced HIV incidences compared to the placebo group.


Introduction
HIV continues to be a serious worldwide health concern, with South Africa having the world's highest HIV epidemic, with an estimated 8.45 million people living with HIV (1).The primary mode of transmission in this endemic setting is heterosexual intercourse, and women 18-40 years account for more than 60% of new infections where the young women bear the greatest burden (2).It is well known that sex workers are at a greater risk of HIV acquisition (3).Significant effort has been made in South Africa over the last decade to search for new technologies that prevent sexually transmitted HIV infections in women such as the pre-exposure prophylaxis (PrEP) products.Initiatives have been undertaken to scale up education and access to these products for example the tenofovir gel an antiretroviral microbicide that can be applied to the vagina or rectum with intentions of reducing the acquisition of HIV (4).
The initial stages of HIV infection are characterized by inflammation and profound immune dysregulation in the gut mucosa (5,6) and genital inflammation at this stage also correlates with an increased plasma viral load (7).Taken together, inflammation is a key mediator of HIV pathogenesis.The levels of inflammatory cytokines and chemokines, which signal the presence of infection and recruit activated immune cells to the mucosa, are frequently used as biomarkers of inflammation in the female reproductive tract (FRT) (8).As such, we hypothesize it might be expected that elevated mucosal cytokines would be correlated with increased rates of HIV acquisition.The increased levels of pro-inflammatory cytokine is associated with increased rates of HIV acquisition (9) and cytokine profile is a strong predictor of subsequent HIV acquisition.Understanding the interplay between cytokine biomarkers and HIV incidence by identifying specific cytokine profiles associated with increased or decreased HIV susceptibility is crucial for optimizing PrEP strategies.
In clinical research, it is a common phenomenon for covariate data to be collected longitudinally and for the covariates to change over time during the follow up period.For example, patients in a clinical trial to asses the safety and effectiveness of tenofovir gel, a vaginal microbicide in sexually active women at risk for HIV, cytokine profiles were measured repeatedly up to infection or until censorship (4).In many instances, while examining the relationship between time to HIV infection and covariate(s), investigators will only consider the baseline covariates, leaving out covariates that change over time hence failing to consider the relation of the survival outcome as a function of the change of the time dependent covariates (15).It appears natural and suitable to use time-varying covariate information in an appropriate statistical model.The Cox PH model can be used to link survival times with either fixed covariates whose values remain constant during the followup period or predictor variables that fluctuate over time (16).The mentioned covariates can be dealt with as a time dependent covariates into the Cox PH model or incorporated as a derived longitudinal variables as further elaborated in the Section 2.2.
A previous analysis was conducted by Abdool Karim et al. (4) and Mansoor et al. (17) to investigate the effectiveness, safety and adherence in the CAPRISA 004 tenofovir gel microbicide trial.They used Proportional Hazards (PH) regression model to calculate the hazard ratios while adjusting for potentially important baseline covariates (age, site, anal sex history, contraceptive method, HSV-2, antibody status and condom use).They reported a hazard ratio of 0.63 (CI: 0.42,0.94,p = 0.025).In their analysis they did not include cytokine profile neither did they report on significant baseline covariates associated with HIV incidence.Masson et al. (18) used the same dataset to investigate whether genital inflammation influenced HIV acquisition in women.Their study selected 12 cytokines for their analysis.They employed conditional logistic regression and unsupervised hierarchical clustering in their statistical analysis.
Naranbhai et al. (19) investigated the role of immune activation in HIV acquisition in the CAPRISA 004 trial.They selected 13 cytokines and used logistic regression and principal component analysis (PCA) in their statistical analysis.On the other hand, Ngcobo et al. (20), in their study examining whether preinfection plasma cytokine concentrations predicted the rate of HIV disease progression in the same study cohort, considered all 48 cytokines.They used linear regression to assess the impact of each cytokine on viral load (VL) and the CD4:CD8 ratio in both bivariate and multivariable models, adjusting for age, contraceptive use, HSV-2 status at baseline, study site, and study arm at randomization.Ignacio et al. (21) used the Sabes dataset and LASSO machine learning algorithms to study how dynamic immune markers predict HIV acquisition and strengthen associations with sociobehavioral factors related to HIV exposure.They selected 10 cytokines for their analysis.Other studies (22,23) that have utilized CAPRISA 004 data set to investigate HIV progression, did not include time varying cytokine profile as a covariate in their analysis.To the best of our knowledge, cytokine profile as a time-varying covariate or as derived covariate has not been used with baseline covariate in previous studies to identify significant predictors of HIV incidence.
This study therefore, seeks to investigate the effect of timevarying cytokine biomarkers in determining significant predictors of HIV incidence among individuals randomized to PrEP vs. control exposure.We achieved that by building a series of Cox PH models that include different forms of the covariates that change over time and further asses the overall effectiveness of the tenofovir treatment by comparing the two groups using Kaplan-Meier estimator and survival curves.The variations in individual immune responses, particularly in cytokine profiles, may influence the efficacy of PrEP therefore this research aims to contribute to

. Dataset
The data was accessed from Center for the Aids Programme of Research in South Africa (CAPRISA 004) (4), a two arm double blinded randomized trial, placebo and tenofovir group conducted on HIV negative and sexually active women aged 18-40 years in South Africa for a period of 30 months; 18 months Accrual period and 12 months follow up.It was conducted between May 2007 and March 2010, and the dataset consist of survival and longitudinal data.The variables considered in this study were baseline characteristics and longitudinally measured cytokine profiles as described in the Supplementary Table S1.

. . Cytokine measurement
Plasma samples and cervicovaginal lavage specimens from cases and control were collected and stored for assessment.There were a total of 48 cytokines from 812 (tenofovir group = 405, placebo group = 407) women with 96 HIV infections (tenofovir group = 37, placebo = 59).The measurements were taken at irregular follow up times as shown in Figure 1 where majority of patients had their cytokine measurements recorded three times during the course of study.The average interval between the first and second cytokine measurements was 12 months, while the interval between subsequent cytokine measurements was 6 months.

. . Data pre-processing
The data underwent pre-processing to prepare it for subsequent analysis.The pre-processing steps involved eliminating variables with excessive missing values (Figure 2) i.e with more than 50% missingness and very small frequency percentages for the levels of some categorical variables.Additionally, in our efforts to enhance the robustness of our statistical analysis, we appropriately combined certain levels of categorical variables and renamed the strings.This step is crucial because a categorical variable with too many levels can compromise the model's performance due to small frequencies in some of the levels (24).Moreover, variables with only one level fail to positively impact the model due to very low variation, while levels that rarely occur have minimal chance of significantly affecting the model fit (25).These adjustments ensure that our analysis accurately captures the relationships within the data.Furthermore, Figure 2 demonstrates the completeness of our dataset, with almost 84.93% of variables containing no missing information, 10.46% missing income value data, and the remaining variables displaying other missing patterns.
The data preparation and the statistical analysis was done using the R version (R-4.3.2).The R code file for this analysis is available in the Supplementary Table S2.As a result of the pre-processing step, 24 baseline characteristics and 48 cytokines covariates were used as initial variables at the start of the analysis.The categorical variables were summarized using frequency and percentages depicted in the Supplementary Table S2.The patient baseline characteristics in relation to HIV status and treatment group is summarized in Table 1, where the number of years with stable partner (p = 0.034) and the patients receiving income from husband (p = 0.026) were significantly associated with HIV status and treatment group.The statistical analysis was conducted on complete cases only in two stages; the first is survival analysis on baseline covariates without the cytokine covariates effects then survival analysis when including the cytokine covariate effects.Cytokine variable profiles are time-varying covariates since they change over time through the follow-up period.Therefore, the cytokine information was included in three ways; firstly we averaged all measurements throughout the follow-up time to better capture their average effects, secondly we took the difference between the last and first measurement to model the effect of change and lastly we treated the cytokine as a time-dependent covariate.

. Statistical methods
Four separate Cox PH models were fitted in an increasing complexity based on how the cytokine effects are included.Model 1 (Equation 8): Cox regression model with baseline variables only, model 2 (Equation 9): Cox regression model with baseline variables plus cytokine effects using the mean value of the cytokine measurements as covariate, model 3 (Equation 10): Cox regression model with baseline variables plus cytokine effects using the difference between the last observed cytokine value and the first value as covariate in the model and model 4 (Equation 11): Cox regression model with baseline variables plus time dependent cytokine effects.

. . Kaplan-Meier survival curves
The Kaplan-Meier estimator is a non-parametric statistic that is used to estimate the survival function based on lifetime data (26).The estimate is frequently used in medical research to examine recovery rates, likelihood of deaths and whether or not a treatment was effective.Furthermore, it is used to compare two groups of subjects, the control group and treatment group (27).The Kaplan-Meier survival curve is a graphical representation of the survival function defined as the probability of surviving in a given length of time while considering time in many small intervals (28).
To estimate the survival function S(t) (the probability that life is longer than t), we consider survival time t i = t 1 , t 2 , ..., t n including censored observations (ordered by increasing observation) of a group of n subjects.The proportion of individuals, S(t), who survive after any follow up time t i is estimated by (Equation 1) where t i is the largest survival time less than or equal to t, n i is the number of individuals uninfected just before time t i (the i th ordered survival time) and d i denotes the number who got HIV infection at time t i (29).S(t 0 ) = 1 before the first infection of

FIGURE
Total count for the frequency of cytokine profile measurements.

FIGURE
Missing data aggregation plot.Proportion of missing values for all variables in the dataset, sorted by decreasing order (left).Combinations of missing values (right): yellow squares in a matrix entry denote the presence of missing values for the variable associated to the column in the samples corresponding to the row; the bars on the right show the cardinality of each set of points.The x-axis displays the variable names (not all variables are displayed due to limited space).
HIV.The survival S(t) at time t i given the number of infections d i and the number of uninfected patients n i just before t i is given by (Equation 2), Maximum likelihood estimation of the discrete hazard function h i , (the probability of an individual experiencing an event at time t i ), yields the Kaplan-Meier estimator as shown (Equation 3), estimators such as Greenwoods's formula (30) that gives (Equation 4), The log-Rank test: Is used to compare the hazards between two groups or more by testing the null hypothesis that the probability of an event at any time point between the two or more populations do not differ.Thus, log-rank test compares the survival function of the two groups (27).The null hypothesis will be rejected if the p-value is <0.05.

. . Stepwise Cox proportional hazard model (Cox PH)
Stepwise Cox proportional hazards regression is a method of selecting a subset of relevant variables for a Cox regression model from a larger set (31).Cox PH is the most widely used statistical method for analyzing the time-to-event data (16).The Cox PH model assesses the impact of multiple factors on survival simultaneously.Essentially, it enables one to investigate how specified predictors influence the rate of a specific event happening such as infection or death at a given point in time (32).This rate is commonly known as hazard rate.
In order to evaluate the association of the baseline and cytokine effects covariate and survival time, consider sample size n from sample k = 1, 2, ..., n and C k = (C k1 , C k2 , ..., C kp ) is a vector of p covariates (baseline plus cytokine effect covariates) of the different models.The k th patient survival data can be represented by (T k , θ k , C k ), where T k and θ k are the survival time and censor status, respectively.Mathematically, the general Cox PH (33) in Equation 5 is represented as where β is the parameter vector of the regression coefficients and C k is the covariate (baseline and cytokine effects) vector.h 0 (t) is an unspecified baseline hazard function that corresponds to the value of the hazard if all C k are equal to zero.The hazard ratio for two patients (Equation 6), k and i is and is independent of time t.Cox PH model parameters are estimated by the maximum partial likelihood method given below (Equation 7); where E is the indices of the HIV infection and R r represent vector of indices for individuals at risk at time t r .The stepwise Cox proportional hazards regression method adds or removes predictor variables from the model based on some criteria, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) (34).The AIC and BIC are measures of how well the model fits the data, and they penalize models that have too many parameters.The lower the AIC or BIC, the better the model (35).The stepwise Cox PH regression method can be performed using different methods, such as forward selection, backward elimination, bidirectional selection, or score selection (36).Forward selection starts with an empty model and adds one variable at a time until it reaches a stopping criterion, such as a minimum AIC or BIC value.Backward elimination starts with a full model and removes one variable at a time until it reaches a stopping criterion.Bidirectional selection starts with an empty model and adds one variable at a time in both directions until it reaches a stopping criterion.Score selection starts with an empty model and adds one variable at a time based on its score in terms of AIC or BIC.
We used the packages StepReg (37) to implement stepwise regression, Survival and survminer to implement Cox PH model in R. The specific Cox PH models for model 1, model 2, model 3, and model 4, as described above, are formulated as follows: . .Model The model that includes the baseline covariates only.We call this the naive Cox PH model.The assumption is that regression parameters remain constant over time (38).Consequently, the hazard ratio for any two individuals remains constant over time.The Model is given by, with h 0 (t) as the baseline hazard function, β is the vector of regression coefficients for baseline covariates X k . .

. Model
The model that includes the baseline covariates plus cytokine effects using the mean value of the cytokine measurements as the covariate.Here the derived cytokine is the average of all the cytokine measurements for an individual patient recorded at different follow up time.It models the average effect of the timevarying cytokine covariate (15).The Cox model is; where h 0 (t) is the baseline hazard function, β is the regression coefficient vector for time invariant covariates X k , Ḡkj = 1 m kj m kj r=1 G krj for r = 1, 2, ..., m kj , represents the average value of the cytokine level measured longitudinally for the k th subject with m kj observations for cytokine j (j = 1, 2, ..., v).The scalar δ j ∈ R is the parameter that links the average to the hazard.

. . Model
The model that includes the baseline covariates plus cytokine effects using the difference between the last observed cytokine value and the first value as the covariate in the model.It models the effect of change in the cytokine covariates (39).(10) where h 0 (t) is the baseline hazard function, β is the regression coefficient vector for the time invariant covariates X k , D kj = G km kj − G k1j represents the difference between the last and first cytokine levels observed longitudinally for the k th subject, with m kj measurements for cytokine j (j = 1, 2, ..., v).The scalar δ j ∈ R is the parameter that links the change to the hazard.The model answers the question whether a big or small change in cytokine has an effect on HIV acquisition. .

. Model
The model that includes the baseline covariates plus time dependent cytokine effects.When a covariate changes over time throughout the follow-up period, this is referred to as timevarying/time dependent covariate (40).For this it is critical to structure the data in a counting process format.We code the time-dependent covariate using time intervals (41).The hazard is assumed to be proportional to the instantaneous probability of an event at a specific time conditional on the variables at that time (42).The interpretation of the results of this approach is more complicated than a naïve baseline approach as the covariate information changes over time.Here we consider sample size n subjects, consisting of T k , θ k , G k (t), 0 ≤ t ≤ T k , k = 1, 2, ..., n , T k is the time-to event for the kth subject, θ k is the event indicator and G k (t) is the time varying covariate.The Cox PH model becomes, where h 0 (t) is the baseline hazard function, β is the regression coefficient vector for time invariant covariates, G kj (t) = G k1j (t), G k2j (t), ..., G km kj (t) is a set of covariates for the number of longitudinal measures m kj for the k th subject of cytokine j (j = 1, 2, ..., v).The scalar δ j ∈ R represent the parameter that links the time dependent covariates to the hazard.

Kaplan-Meier survival curves analysis
Figure 3 shows the overall survival curve over time and the overall survival comparison between the two treatment group.It is clear that patients from Tenofovir treatment arm have a better chance of surviving (less probability of HIV acquisition) more than the patients from placebo group.The placebo curve has a steeper slope indicating a higher HIV infection rate, therefore a worse survival prognosis.The curve have plateaus from 24th month indicating no change in survival.The curves comparing the two treatment cross in the first few months and consistently separate afterwards.The log-rank test performed gives χ 2 = 5.7, df = 1 and p-value = 0.02.Since p-value is <0.05 we reject the null hypothesis to conclude that there is sufficient evidence indicating that the two treatment groups are significantly different in terms of survival.

. Stepwise Cox proportional hazard analysis
The results of the survival problem based on the effects of cytokine biomarkers (mean, difference, and time dependent effect) were obtained.As a first step, we employed the stepwise regression using the stepwiseCox function.Within the function we specified the following arguments; model selection procedure to be bidirectional, model selection metric as the AIC, significance level of entry and exit value in the model as 0.15 and model approximation method as Efron.Bidirectional selection procedure is the appropriate since it adds variables in both directions.Moreover, backward selection produces same results as bidirectional while forward selection produces results with more covariates and larger AIC.The model selection criterion AIC was used to determine the order in which effects enter and leave at each step of the specified model selection procedure (bidirection).The value 0.15 is a commonly used p value threshold which is a statistical significance level that a predictor variable must meet to be included or to stay in the model.Several approximation methods have been proposed to handle tied events in cox regression such as Breslow, Efron, and Exact (methods of obtaining Cox partial likelihood estimate of the baseline hazard function).However, the Efron method performs better in terms of time, fit statistics, and differences in parameters estimates (43).We then tested the Cox PH assumption of the selected covariates using Schoenfeld residuals test (44) by applying the cox.zph function.The analysis of the results for model 1-4 are shown in Tables 2-5.
The analysis result of model 1 in Table 2 indicates that age at enrolment was the only significant predictor of HIV hazard.Tenofovir treatment group reduced the hazard of HIV infection as compared to the Placebo treatment group (HR: 0.629, 95% CI: 0.405,0.977).The adjusted hazard ratio for a 1 year increase in age at enrolment is 0.949 (95% CI: 0.902, 0.998).This implies that HIV incidence decreases with increasing age.
The overall performance of the models (model 1-4) shown in Table 6 indicate that model 4 had the lowest AIC, while model 1 the highest AIC.The overall survival of the models over time are depicted in Figure 4.The plot in Figure 5 show how the effects of the covariates in model 4 (with the lowest model fit scores as shown in Table 6) change over time.The intercept of the model 4 in Figure 5 had a smooth increasing slope over time.The time dependent cytokine covariates; G-CSF, GM-CSF, IL-15, MIP-1B, SCGF-B, and baseline covariates; age at debut, sex in the last 30 days, age of oldest sex partner and site had a decreasing slope over time.Increasing slope over time is observed in the time-dependent covariate SCF and baseline covariates; abnormal discharge, other income source, salary and years lived in Durban.Table 7 indicate which cytokines overlap between model 2-4, or which are no longer significant in the subsequent models.Figure 6 illustrates the direction of change for the significant cytokines identified in the analysis of models 2-4.

Discussion
The global HIV pandemic remains a significant public health challenge, necessitating the continuous exploration of innovative preventive strategies (45).Pre-exposure prophylaxis (PrEP) particularly Antiretroviral Microbicide has emerged as a promising intervention for individuals at high risk of HIV acquisition (4).However, variations in individual immune responses, particularly in cytokine profiles, may influence the efficacy of PrEP.It is known that dynamic changes in immune states are linked with HIV acquisition, and biomarkers, demographic and behavioral data add complementary details to HIV risk (21).Recent research has highlighted the potential of cytokines as biomarkers in the Pre-exposure prophylaxis.Cytokines have been suggested as potential predictors of HIV acquisition.
This study investigated the effect of individual cytokine biomarkers that changes over time in determining HIV incidence among individuals randomized to PrEP vs. control exposure by building a series of Cox Proportional Hazard models.The Cox PH is essentially a regression model commonly used statistical method in medical research and in other applications for investigating the association between the survival time and one or more predictor variables (16).The simple form of Cox model is when it models time fixed covariates.One of the strengths of the extended Cox model is its ability to incorporate covariates that change over time.This functionality is practical because, at each event time, the Cox model compares the current covariate values of the subject experiencing the event with the current values of all other subjects who were at risk at that time (41).We incorporate stepwise regression in the Cox PH model to eliminate noisy variables and remain with the best model fit (31).
The cytokine biomarkers in our data set changes over time i.e they were longitudinally measured, indicating the presence of a time-varying covariates.When such covariates exist, an analyst should consider taking them into account in survival modeling in order to improve estimation (15).The presence of time-dependent covariates in a model offers exciting opportunities for exploring associations and potentially causal mechanism (46).However, the use of these variables is technically difficult in the choice of covariate form, might have great potential for bias and violates the assumption that the hazard ratio for any two individual remains constant over time.We therefore, improve the model fit by using derived cytokine variables from the longitudinal measurements.As a starting point in modeling, we started with Model 1 (Equation 8), a traditional time-invariant (baseline covariates) Cox PH model.In this model the initial variables were 24 which were further reduced to seven variables that contributed to the best model fit and it estimated age at enrollment as the only significant predictors of HIV risk.
The first improved model (model 2-Equation 9) we used baseline covariates plus the individual level average of the cytokine measurements to better describe the average effect of the timevarying cytokine covariate.Through stepwise regression the covariates were reduced from 72 to 30 in the final best fit model.When comparing model 1 with model 2 we are able to identify four other different baseline covariate (Age of oldest sex partner, salary, years with stable partner and sex partner having other sex partner) and twelve individual average cytokines (IL-3, IL-5, IL-10, IL-16, IL-17A, IL-12P70, CTACK, SCF, B-NGF, SCGF-B, TNF-A, IFN-A2) that are significantly associated with HIV risk.Therefore, the predictive performance of model 2 was better than model 1 with lower AIC (919.3) in comparison to model 1 AIC (1,064.5).This clearly showed that not accounting for cytokine effect in model 1 confounded the effect of other significant baseline characteristics.
Model 3 (Equation 10) is the second improved Cox PH model which consisted of baseline covariates plus individual cytokine difference between the first and the last observed measurement.The final best model fit in model 3 had 19 covariates from an initial total of 72.Notably the model had a better predictive performance compared to model 1 as it had three additional baseline covariates (sex partner having other partners, husband's income, age of the oldest sex partner) and seven individual difference cytokines (IL-2, IL-5, IL-16, CTACK, PDGF-BB, BNG-F and TRAIL) that were significant predictors of HIV infection.Furthermore, when compared to model 2, there were three similar baseline covariates (age of the oldest sex partner, treatment group and sex partner having other partner) that were significant predictors in both models.However, there were fewer cytokine covariates than in model 2, with IL-5, IL-16, CTACK, and BNG-F all being significant cytokine covariates in both models.When compairing the AIC of the models, model 3 had a lower AIC than model 1 but slightly higher than AIC of model 2. Model 3 predicted the individual level changes of the cytokines and its association with HIV risk therefore accounting for time.The major drawback of the model was some individuals had single measurements hence no change effect observed.Additionally, the model ignores the intermediate changes between the first and the last observed cytokine measurement which implies loss of information within individual cytokine measurements.
The last improved Cox PH model fit was model 4 (Equation 11) that used baseline covariates plus time-dependent cytokine covariates.The final best model fit consisted of 14 variables out of 72.When compared to model 1, there were five additional baseline covariates (age of oldest sex partner, age at debut, other income source, sex in the last 30 days and years lived in Durban) and four time-dependent cytokines (SCF, IL-5, SCGF-B and GM-CSF).Moreover, Age of oldest sex partner and IL-5 were significant predictors estimated by all the improved models while SCF and SCGF-B were both predictors by model 2 and 4. Likewise CTACK, IL-5, IL-16 and B-NGF were significant predictors estimated in both model 3 and 4. Table 7 indicate which cytokines overlap between models 2-4, or which are no longer significant in the subsequent models.
Overall, model 2 produced the greatest number of significant cytokine predictor variables, giving a wider perspective to a researcher which cytokine biomarkers are associated with HIV Hazard.However, there is loss of time information in this model for the derived cytokine variables.Model 4 on the other hand had the lowest AIC compared to the other models making it the best model.This emphasizes that timedependent covariates is a powerful tool for exploring predictive relationships.Nevertheless, their use and interpretation is much more complicated in practice than the fixed (baseline) covariates.

FIGURE
Overall trend of the covariate e ects (only significant baseline and cytokine covariates) for model over time.Abreviations of the baseline characteristics: agedebu, age at debut; p v _SEX_ DAYS, number of times had sex in days; p v _AGE_OLDEST_SEX_PART, age of oldest sex partner; p v _ABNORMAL_DISCHARGEYes, abnormal discharge variable for yes category; p v _OTHER_INCOME_SOURCEYes, other income sources for category yes; p v _SALARYYes, salary variable for category yes; p v _LENGTH_IN_DBNVUL, number of years lived in Durban Vulindlela area; SiteVulindlela, site variable for category Vulindlela.
Furthermore the potential for erroneous inference and modeling is increased (46).
Our findings reveal that incorporating cytokine biomarkers into the PH regression model not only enhances the model's predictive performance but also provides more insightful information about significant predictors linked to HIV incidence.These results are consistent with a recent study by Ignacio et al. (21), which found that changes in cytokine levels over time are highly predictive of HIV acquisition and that cytokines influence the effects of sociobehavioral risk factors on HIV acquisition.Although Ignacio et al. (21) used a different model (LASSO machine learning algorithms), a different dataset (Sabes study), and selected fewer biomarkers (10 cytokines), their study also highlighted the importance of immune activation markers in HIV beyond traditional demographic and behavioral factors, aligning with our objective.Our analysis identified and reported several baseline predictors such as the age of the oldest sex partner, participant's age at enrollment, earning a salary or not, years with a stable partner, income source, whether the sex partner has other partners, and frequency of sex in the last 30 days as significantly associated with HIV incidence.These findings align with those of other research studies (47)(48)(49)(50)(51)(52)(53)(54)(55).
In the previous analysis by Masson et al. (18) to investigate whether genital inflammation influenced HIV acquisition in women, they used 12 cytokines out of 48 available cytokine measurements.This selection was disadvantageous as it excluded other potentially relevant cytokine covariates.They utilized conditional logistic regression which has limitations because the risk sets and time-dependent covariates are predefined, unlike in Cox regression, where these factors are calculated at the time of each case failure (56).Moreover, Cox models that was employed in our study, offers more statistical power than logistic regression models because they account for the time until the event occurs (57).Naranbhai et al. using the same dataset, employed logistic regression and PCA to investigate the role of immune activation in HIV acquisition.The PCA's assumption of linearity limits its effectiveness in interpreting the components, as they are linear combinations of the original variables (58).Ngcobo et al. (20)

GM-CSF
The table indicate which cytokines overlap between the different models, or which are no longer significant in the subsequent models.Blank: cytokine absent in the model.: cytokine was significant in the model.
their study of examining whether pre-infection plasma cytokine concentrations predicted the rate of HIV disease progression in the same study cohort, used linear regression to assess the impact of each cytokine on viral load (VL) and the CD4:CD8 ratio in both bivariate and multivariable models.The major drawback of linear regression is its lack of consideration for time continuity (56).Notably, none of the previous studies exploring predictors of HIV progression (20,22,23) using CAPRISA 004 trial considered cytokine biomarkers as time-varying covariates.This study underscores the importance of incorporating longitudinal risk factor information in predicting HIV incidence.
Our study results successfully confirmed the cytokines Interleukin (IL-2, IL-3, IL-5, IL-10, IL-16, IL-12P70, and IL-17 alpha), Stem cell factor (SCF), Beta Nerve growth factor (B-NGF), Tumor necrosis factor alpha (TNF-A), interferon (IFN) alpha-2, serum stem cell growth factor (SCG)-beta, plateletderived growth factor (PDGF)-BB, Granulocyte macrophage colony stimulating factor (GM-CSF), tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) and cutaneous T-cell-attracting chemokine (CTACK) are associated directly to HIV infection and identified new cytokine biomarkers to enrich the field's literature further.Figure 6 shows the direction of change for the cytokines mentioned.Therefore, better understanding of the role of cytokines before, during, and after HIV infection could enable for the development of new therapeutic approaches based on the use of either recombinant cytokines or particular antagonists, with the goal of limiting both HIV spread and clinical manifestations of this infection (59).
Different cytokines play significant roles in HIV prevention and management with PrEP.Interleukins (ILs) such as IL-2 enhance T-cell proliferation and activation, aiding the immune response against HIV, and its levels can help assess immune activation efficacy in PrEP users.IL-3 and IL-5 regulate hematopoiesis and immune responses, with elevated levels indicating an ongoing immune response relevant for those exposed to HIV (60).IL-10, an anti-inflammatory cytokine, prevents excessive inflammation, with high levels suggesting reduced inflammation in PrEP patients.IL-12P70 and IL-17 alpha aid in differentiating and activating Thelper cells, promoting cell-mediated immunity, and protecting mucosal barriers, respectively (60).Monitoring these cytokines helps understand immune modulation in PrEP users.IL-16 attracts T-cells and other immune cells to infection or inflammation sites, a marker for immune activation in PrEP.
SCF and SCG-beta are essential for hematopoietic stem cell proliferation and differentiation, indicating bone marrow activity and the ability to replenish immune cells in PrEP users (61).B-NGF supports neuron survival and maintenance and has immunomodulatory effects.In PrEP users, B-NGF might influence neuroimmune interactions, affecting the nervous system's response to HIV exposure.TNF-A and TRAIL are significant in immune regulation and inflammation (10).TNF-A, a pro-inflammatory cytokine, indicates inflammation and immune activation, which is crucial for those at risk of HIV.TRAIL induces apoptosis in cancer and infected cells, helping eliminate HIV-infected cells in PrEP users.IFN Alpha-2 has antiviral properties, inhibiting HIV replication and modulating the immune response, providing additional protection in PrEP users.PDGF-BB aids in wound healing and tissue repair, helping maintain mucosal integrity and prevent HIV entry through mucosal surfaces in PrEP users.GM-CSF stimulates granulocyte and macrophage production, providing insights into immune readiness (62).CTACK directs T-cells to the skin, indicating immune surveillance at mucosal and skin surfaces to prevent initial HIV infection.
In clinical practice, these cytokines are useful biomarkers to monitor individuals' immune status and response using PrEP.Regularly measuring cytokines like IL-2, IL-10, TNF-A, IFN Alpha-2, IL-12P70, IL-17 alpha, and TRAIL can help assess immune activation, regulation, and the body's response to HIV exposure (60).This monitoring allows clinicians to evaluate the balance between immune activation and regulation, ensuring optimal immune response without excessive inflammation (11).Personalized PrEP strategies

Conclusion
In this article we investigated the effect of individual cytokine biomarker, a time varying covariate, where we provided ways of handling the covariate in the stepwise Cox PH modeling by using a derived variable from the longitudinal measurements (mean and difference) and as a time dependent covariate (model 2-4).The presence of a cytokine effect in a model improved the predictive performance of the model hence the improved models were more informative about predictors that are associated with HIV hazard.Moreover, the tenofovir treatment exposure significantly lowered the hazard of HIV compared to the Placebo treatment group.Furthermore, Kaplan-Meier estimator indicated that the patients who received tenofovir antiretroviral microbicide treatment had a significantly lower risk of HIV infection compared to the placebo group hence an effective treatment in reducing the risk of HIV in women between the age of 18-40 years.
Further investigation of the cytokine biomarker could involve utilizing the standard deviation of longitudinal measurements or lagged observations.Additionally, with internal time-varying covariates, one might explore employing joint modeling of longitudinal and survival data.The aim is to apply a model to a continually changing covariate that is measured longitudinally, potentially with error.This longitudinal model is linked to survival times by modeling the joint distribution of longitudinal and survival data.complete a data request form.Requests to access these datasets should be directed to https://www.caprisa.org/Pages/CAPRISAStudies.

FIGURE
FIGUREKaplan-Meier survival curves: left panel showing overall survival curve for all participants and right panel compares overall survival curves by treatment groups, placebo vs. tenofovir participants.

FIGURE
FIGURE Comparative overall survival curves for model (upper left panel), model (upper right panel), model (lower left panel) and model (lower right panel).

FIGURE
FIGUREDirection of change for the cytokines of interest confirmed by our study results.
TABLE Summary description of patient's baseline characteristics stratified by HIV status and treatment group.
TABLE Multivariable Cox PH results for predictors of HIV survival among women aged -years (model ).

TABLE Multivariable Cox
PH results for predictors of HIV survival among women aged -years (model ).

TABLE Multivariable Cox
PH results for predictors of HIV survival among women aged -years (model ).
TABLE Comparative tests to evaluate Cox PH model performances.
, in