Progression of HIV Disease Among Patients on ART in Ethiopia: Application of Longitudinal Count Models

Although the world has been fighting HIV disease in unity and patients are getting antiretroviral therapy treatment, HIV disease continues to be a serious health issue for some parts of the world. A large number of AIDS-related deaths and co-morbidities are registered every year in resource-limited countries like Ethiopia. Most studies that have assessed the progression of the disease have used models that required a continuous response. The main objective of this study was to make use of appropriate statistical models to analyze routinely collected HIV data and identify risk factors associated with the progression of the CD4+ cell count of patients under ART treatment in Debre Markos Referral Hospital, Ethiopia. In this longitudinal retrospective study, routine data of 445 HIV patients registered for ART treatment in the Hospital were used. As overdispersion was detected in the data, and Poisson-Gamma, Poisson-Normal, and Poisson-Gamma-Normal models were applied to account for overdispersion and correlation in the data. The Poisson-Gamma-Normal model with a random intercept was selected as the best model to fit the data. The findings of the study revealed the time on treatment, sex of patients, baseline WHO stage, and baseline CD4+ cell count as significant factors for the progression of the CD4+ cell count.


INTRODUCTION
HIV disease continues to be a serious health issue for resource-limited countries like Ethiopia. According to the UNAIDS (2016) fact sheet, there were about 2.1 million new cases of HIV in 2015 globally (1). About 36.7 million people were living with HIV around the world, and, as of June 2016, 18.2 million people living with HIV were receiving medicine to treat HIV, called antiretroviral therapy (ART). An estimated 1.1 million people died from AIDS-related illnesses in 2015, and 35 million people have died from AIDS-related illnesses since the start of the epidemic. CD4 + cell counts are the primary targets of HIV. The relentless destruction of CD4 + cell counts by HIV, either directly or indirectly, results in the loss of HIV-specific immune responses and, finally, non-specific immune response in the AIDS stage. The estimation of peripheral CD4 + cell counts has been used as a tool for monitoring disease progression and the effectiveness of antiretroviral treatment (ART) (2). The changes in the CD4 + cell counts are important indicators of the response to ART. Initial CD4 + cell count, age, gender, smoking, unemployment, WHO stage, hospital, opportunistic infections, body mass index, changing doctors during outpatient follow up, use of alcohol and drugs, and duration of treatment (in months) are some of the significant determinants that affect CD4 + cell count progression of patients on ART (3)(4)(5).
Most studies conducted in the area fitted statistical models that require (multivariate) Normal distribution by considering CD4 + cell counts as continuous variable. When this assumption is violated, even after transformation, considering Poissonrelated models is a natural choice. One of the common problems one can be faced with in analyzing count data like the CD4 + cell count is overdispersion. A Negative Binomial model can be considered to overcome this problem. Trindade et al. applied Poisson and Negative Binomial models using the multilevel (ML) approach and the generalized estimations equations (GEE) to model CD4 + cell counts of 587 HIV seropositive patients, and they stated that the best marginal model to fit the data was the Negative Binomial (NB) with an exchangeable correlation structure (6). Tekle et al. also employed different count data analysis methods starting from the ordinary Poisson regression model to study CD4 + cell counts of 222 HIV positive patients, and they found that Poisson-Normal-Gamma is the best model to fit their data (7). In this study, we applied various count data models to study the progression of the CD4 + cell count of HIV patients and identified risk factors for progression of patients' CD4 + cell count in Debre Markos Referral Hospital, Ethiopia.

MATERIALS AND METHODS
In practice, it is common to have response variables of a count type-like number of the CD4 + cell count in a cubic milliliter of blood. Some data analysts treat the CD4 + cell count as a continuous measure and apply the linear mixed effects model. But that practice ignores two facts: the data are really discrete, and the distributions of count variables are usually skewed. For these reasons, the use of models that assume (multivariate) normality might not be efficient (8). Even if the data is transformed and these models are applied, the interpretation might not be straightforward. In scenarios like this, it is better to apply statistical models that account for the nature of the data.
Our data includes 445 HIV-positive patients who started ART treatment between December 2005 and July 2014 in Debre Markos Referral Hospital, Ethiopia. The minimum number of measurements was two and the maximum was seven. Patients with less than two measurements and age of <15 years were excluded from the study. For our data, the assumption of multivariate normality failed, and this suggested that use of a linear mixed model was not appropriate ( Table 1). The Poisson regression model with normal random effects and models that account for both correlation between repeated measures and overdispersion simultaneously were thus considered in line with Booth et al. (9) and Molenberghs et al. (10,11).

Dependent Variable
The dependent variable of this study was the CD4 + cell count per cubic millimeter of blood of HIV-infected patients who are under ART treatment.

Independent Variables
The independent variables considered in this study were selected based on related literature (5,7). These include the sex of patients, age of patients (age at the initiation of the treatment), baseline CD4 + cell count (the CD4 + cell count of the patients at the start of the treatment), WHO clinical stage at baseline (stage I, stage II, stage III, and stage IV), marital status at baseline, baseline weight, level of education at baseline, functional status at baseline, TB status at baseline, and time in months. Functional status was defined as WHO categories: Ambulatory and Working. Patients who are able to perform activities of daily living but not able to work or play are classified as ambulatory and the who are able to perform usual work in or out of the house, harvest, go to school or for children, normal activities, or playing were classified as working.

Poisson Model
Let Y i be the ith CD4 + cell count and is Poisson distributed with mean λ i . The density function of Y i can then be written as The Poisson distribution belongs to the exponential family, with natural parameter θ i equal to ln λ i , scale parameter φ = 1, and variance function v(λ i ) = λ i (12). The logarithm is the natural link function, leading to the classical Poisson regression model

Poisson-Gamma Model
The standard Poisson distribution requires the mean and variance to be equal. When this assumption fails, the Poisson-Gamma model should be used to fit the data. Assume that Y i |θ i ∼ Poi(θ i λ i ), where θ i denotes an independent and identically distributed (iid) sample of unit mean Gamma random variables with shape parameter α (9). Conditional on θ i , the CD4 count of the ith patient follows a Poisson distribution with mean θ i λ i . The counts are then marginally independent Poisson-Gamma random variables [Y i ∼NB(α, λ i )] with mean λ i and variance λ i + λ 2 i /α. Hence, the parameter α quantifies the amount of overdispersion with α = ∞ corresponding to no overdispersion  with respect to the Poisson distribution. The mass function of the Poisson-Gamma random variables is given by The Poisson-Gamma model (also known as the Negative Binomial model) is given by log(λ i ) = X T i β.

Poisson-Normal Model
For µ ij =E(Y ij |b i ) and known link function η(.), the generalized linear mixed model can be expressed as: where Y ij is the CD4 + cell count of the ith patient at jth visit (measurement). β= a p-dimensional vector of unknown fixed regression coefficients. b i = a q-dimensional vector of unknown random regression coefficients for the ith individual, and these are often assumed to be drawn independently from the N(0, D), and D is the variance-covariance matrix of the random effects. X ij and Z ij are p-dimensional and q-dimensional vectors of known covariate values, respectively (10). The generalized mixed Poisson model with normal random effects (Poisson-Normal model) becomes This model is referred to as the Poisson-Normal model because it assumes Poisson distribution for the counts and normal distribution for the random effects b i (10, 11).

Poisson-Gamma-Normal Model
According to Molenberghs et al. (10,11), a model combining the ideas from the Poisson-Normal and overdispersion models for repeated Poisson data with overdispersion can be specified as follows Y ij ∼ poi(θ ij λ ij ) where θ ij capture overdispersion and denote an independent and identically distributed (iid) sample of unit mean gamma random variables with shape parameter α and scale parameter β=1/α, and where b i ∼ N(0, D) and θ ij ∼ Gamma(α, β). This model is called the Poisson-Gamma-Normal (combined) model because it includes both Normal (b i ) and Gamma (θ ij ) random effects to account for correlation and overdispersion, respectively.

Methods of Parameter Estimation
In this study, we used glmer and glmer.nb functions in R under packages MASS and lme4. A Laplace approximation was used to obtain parameter estimates. The R code used to fit the models is available in Supplementary Material.

Model Comparison
To select the important variables, first the main effect, main effect by time interaction, and plausible main effect by main effect interactions were incorporated to the initial candidate models,  Frontiers in Public Health | www.frontiersin.org the non-significant interaction effects were then removed, and the models were refitted again and so on. The best model that can fit the data was selected using various information criteria (AIC, BIC, and −2loglikelihood) ( Table 7). The model with smallest values of information criteria was selected as the final model.

Descriptive Analysis
In this section, CD4 + cell count data obtained from 445 HIV patients on ART treatment in Debre Markos Referral Hospital were summarized. The majority of the HIV patients [347 (78.0%)] started antiretroviral treatment with CD4 + cell counts <200 cells/mm 3 . At the start of the treatment, the median CD4 + cell count of the patients was 145 CD4 + cells/mm 3 of blood with IQR of 107.00 CD4 + cells/mm 3 of blood. The minimum and maximum baseline CD4 + cell counts were three and 971 CD4 + cell cells/mm 3 of blood, respectively. The summary of CD4 + cell counts at different time points is given in Table 2. As can be seen in Table 2, the median CD4 + cell count increased over time. The IQR of CD4 + cell counts increased at some points and then started to decrease after the 24th month. The number of patients decreased at some points and increased at others, which implies the presence of intermittent missingness in the data. That means some patients were falling out of care and then re-engaging, or they did not have CD4 + cell counts that were spaced perfectly every 6 months.
Data on demographic and clinical characteristics of the patients was collected at the start of antiretroviral treatment. Among the 445 patients, 280 (62.9%) were females. The male patients had a 134.84 mean baseline CD4 + cell count, while the female patients had a mean baseline CD4 + cell count of 168.91. On average, female patients started ART treatment at a relatively higher CD4 + cell count. The difference in mean CD4 + cell count of the two groups increases as time increases. The average CD4 + cell count of females was higher than males at all time points and the difference increases over time.
WHO   Figure 1 depicts the individual profile plot of the CD4 + cell count of HIV-infected patients included in the study. The plot provides some information on the between patients' CD4 + cell count variability and illustrates the over-time change in patients' CD4 + cell count. Some individuals have an erratic CD4 + cell count and others have a CD4 + cell count that slowly increases over time. As one can see from the graph, there is a considerably large difference in the intercepts of individual trajectories. Similarly, some trajectories are steeper, while others were almost horizontal, indicating the possible variability in the slope of CD4 + cell counts. Therefore, because of the variability in the intercept and slope of trajectories, using a mixed model could fit the data very well. The overall mean profile plot of the CD4 + cell count shows somehow a linear increasing pattern of CD4 + cell count over time (Figure 2), suggesting that a linear time effect seems reasonable. The mean CD4 + cell count increases at a high rate from baseline till the 6th month and then starts to increase slowly from 6 to 24th month and decreases at month 30.   Table 5. Depending on this model time, the WHO stage and initial CD4 + cell count were found to be significant factors of patients' CD4 + cell count progression. An improvement in both the Poisson-Gamma and Poisson-Normal models as compared with the Poisson model in fitting the data is an indication of the occurrence of both correlation and overdispersion in the data. The Poisson-Gamma-Normal (Negative Binomial log-linear mixed) model proposed by Booth et al. (9) and Molenberghs et al. (10,11) was fitted to overcome this problem of correlated and overdispersed count data, and the random intercept Poisson-Gamma-Normal Model is a much better fit because of its lower AIC (27,379.9), BIC (27,488.4), and −2loglikelihood (27,342) values as compared to the Poisson-Normal models ( Table 6). Therefore, the final model to fit our data was the random intercept Poisson-Gamma-Normal model. We have also tried the Poison-Gamma-Normal model with different (random) linear slopes for a time, but we found that the Poison-Gamma-Normal with random intercept was better based on information criteria (AIC and BIC).

Model Results
Based on the results obtained from the Poisson-Gamma-Normal model, time in months, sex, and baseline CD4 + cell count were found to be significant factors of the CD4 + cell count of a patient ( Table 8). For a given patient, keeping the random intercept and other covariates constant, one more month on ART increased the CD4 + cell count by a multiplicative factor of e 0.0243 = 1.0246.
A female patient had a CD4 + cell count of 1.1215 times that of a male patient, adjusting for other covariates and random intercept. A unit change in baseline CD4 + cell count increased the CD4 + cell count of a patient by a factor of 1.0034, fixing the values of the other covariates and the random intercept constant.
The dispersion parameter (1/α) has been estimated, in the final model, as 7.7009, and the Gamma (overdispersion) random effects are assumed to follow a Gamma distribution with unit mean and shape parameter α (0.130).

Discussion
The effects of demographic and clinical factors on the progression of CD4 + cell counts over time of HIV patients taking ART treatment in Debre Markos Referral Hospital were assessed using Poisson longitudinal models since the response variable of interest CD4 + cell count is a count variable. The results of the summary statistics revealed that the value of IQR is high at all time points, which might be an indication for high variation among the patients' CD4 + cell count at baseline as well as at different time points after the initiation of ART treatment. This variation might have been caused by the year at which the patients started ART treatment, as there have been different WHO's CD4 + cell count cut-off points to initiate ART treatment at different times. Although most of the patients included in our study started with lower CD4 + cell counts (<200 cells/mm 3 ), there were patients who had higher baseline CD4 + cell counts (971 cells/mm 3 ). Despite the continuous effort to initiate early, some patients still presented with lower CD4 + cell counts, which might be due to patients' lack of willingness to get tested (13,14) or difficulties to provide treatments to all patients in lower-income countries including Ethiopia. Hence, we believe that our result could be generalizable. The final model also indicated that initial CD4 + cell count (CD4 + cell count at the start of the treatment) significantly affects CD4 count progression. Therefore, based on our findings we recommend patients to start the treatment early as of the WHO's "treat all" recommendation.
The sign of the parameter estimate of WHO stage III is positive, which implies that a patient with WHO stage III has a higher CD4 + cell count as compared with a patient of WHO stage II. It might be because the number of patients with WHO stage III are much higher (non-comparable) than patients with WHO stage II. The relationship between CD4 + cell count and WHO stage III might also be explained by the baseline CD4 + cell count. Duration of treatment also have a positive effect on the CD4 + cell count progression of HIV patients. This means patients with longer time on ART treatment have good recovery of CD4 + cell count than that of patients with short duration on the treatment.

CONCLUSION
An analysis of CD4 + cell count data using conventional models like linear mixed models might be inadequate as the data were highly skewed and may not satisfy normality (multivariate) assumption as demonstrated in our data.
In this study, CD4 + cell count data of 445 HIV patients under ART in Debre Markos Referral Hospital was analyzed using different longitudinal count models, and the Poisson-Gamma-Normal model was selected as the final model to fit the data based on different selection criteria. The Poisson-Gamma-Normal model handles overdispersion and correlation simultaneously.
The duration on ART treatment (time in months), sex of patients, and baseline CD4 + cell count were all identified as potential risk factors of CD4 + cell count progression. Having a good CD4 + cell count at baseline had a positive impact on CD4 + cell count evolution over time.
Although good CD4 + cell count progress in response to ART was observed, most of the patients (78.0%) were at decreased CD4 + cell counts (<200 cells/mm 3 ) when enrolled for ART treatment, which might have contributed to low CD4 + count recovery in some patients.

LIMITATIONS AND RECOMMENDATION
In our study, we only considered patients from one hospital. The likelihood inference of the models considered in this study are valid under MCAR (missing completely at random). In the current study, we did not carry out a sensitivity analysis, and we only considered linear slopes models, although a different linear slope for different time periods seems reasonable. Hence, we recommend that researchers consider sensitivity analysis and data obtained from different Hospitals. The age and weight of patients might have a non-linear relationship with the CD4 + cell count. We recommend smoothing techniques like splines to be explored for further studies. The assumption of multivariate normality that is assumed by most statistical models used in longitudinal data analysis should be checked before analysis. Efficient methods like the ones used in this study could be considered if the assumption is violated.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
Before data collection, a letter of support written by the Statistics Department of Addis Ababa University was submitted to Debre Markos Hospital and permission to collect anonymized data was obtained. The data was extracted by trained data clerks in the ART Clinic and none of the researchers had access to original cards of patients. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
BAn conceived the idea, performed the data cleaning and analysis, interpreted the ensuing results, and drafted the manuscript. BAy supervised the study, contributed to the conception, and revised the manuscript. Both the authors read and approved the final draft.