# THE USE OF ROUTINE HEALTH DATA IN LOW- AND MIDDLE-INCOME COUNTRIES

EDITED BY : Jim Todd and Michael Johnson Mahande PUBLISHED IN : Frontiers in Public Health and Frontiers in Psychology

### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-080-3 DOI 10.3389/978-2-88966-080-3

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# THE USE OF ROUTINE HEALTH DATA IN LOW- AND MIDDLE-INCOME COUNTRIES

Topic Editors: Jim Todd, University of London, United Kingdom Michael Johnson Mahande, Kilimanjaro Christian Medical University College, Tanzania

Citation: Todd, J., Mahande, M. J., eds. (2020). The Use of Routine Health Data in Low- and Middle-Income Countries. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-080-3

# Table of Contents

*05 Editorial: The Use of Routine Health Data in Low- and Middle-Income Countries*

Jim Todd and Michael Johnson Mahande

*07 Increasing Proportion of HIV-Infected Pregnant Zambian Women Attending Antenatal Care Are Already on Antiretroviral Therapy (2010–2015)*

Sehlulekile Gumede-Moyo, Jim Todd, Ab Schaap, Paul Mee and Suzanne Filteau

*15 Analysis of Hierarchical Routine Data With Covariate Missingness: Effects of Audit & Feedback on Clinicians' Prescribed Pediatric Pneumonia Care in Kenyan Hospitals*

Susan Gachau, Nelson Owuor, Edmund Njeru Njagi, Philip Ayieko and Mike English

*27 Modeling Long-Term Graft Survival With Time-Varying Covariate Effects: An Application to a Single Kidney Transplant Centre in Johannesburg, South Africa*

Okechinyere J. Achilonu, June Fabian and Eustasius Musenge


*72 HIV Disease Progression Among Antiretroviral Therapy Patients in Zimbabwe: A Multistate Markov Model*

Zvifadzo Matsena Zingoni, Tobias F. Chirwa, Jim Todd and Eustasius Musenge *87 Effectiveness of Lifelong ART (Option B+) in the Prevention of Mother-to-Child Transmission of HIV Programme in*

*Zambia: Observations Based on Routinely Collected Health Data* Brian Muyunda, Patrick Musonda, Paul Mee, Jim Todd and Charles Michelo

*97 Characterizing a Leak in the HIV Care Cascade: Assessing Linkage Between HIV Testing and Care in Tanzania*

Richelle Harklerode, Jim Todd, Mariken de Wit, James Beard, Mark Urassa, Richard Machemba, Bernard Maduhu, James Hargreaves, Geoffrey Somi and Brian Rice

*105 Performance of and Factors Associated With Tuberculosis Screening and Diagnosis Among People Living With HIV: Analysis of 2012–2016 Routine HIV Data in Tanzania*

Werner Maokola, Bernard Ngowi, Lovetti Lawson, Michael Mahande, Jim Todd and Sia E. Msuya

*114 Discrete Survival Time Constructions for Studying Marital Formation and Dissolution in Rural South Africa*

Jesca M. Batidzirai, Samuel O. M. Manda, Henry G. Mwambi and Frank Tanser


Tendai Munthali, Charles Michelo, Paul Mee and Jim Todd

*142 Misreporting of Patient Outcomes in the South African National HIV Treatment Database: Consequences for Programme Planning, Monitoring, and Evaluation*

David Etoori, Alison Wringe, Chodziwadziwa Whiteson Kabudula, Jenny Renju, Brian Rice, F. Xavier Gomez-Olive and Georges Reniers

# Editorial: The Use of Routine Health Data in Low- and Middle-Income Countries

Jim Todd<sup>1</sup> \* and Michael Johnson Mahande<sup>2</sup>

*<sup>1</sup> London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom, <sup>2</sup> Kilimanjaro Christian Medical University College, Kilimanjaro, Tanzania*

Keywords: data, public health, LMIC = low- and middle-income countries, Africa, health information, routine data

**Editorial on the Research Topic**

### **The Use of Routine Health Data in Low- and Middle-Income Countries**

In most high income countries, routinely collected health data are regularly used to inform policy, and provide real-time updates of health concerns. Techniques and applications have been developed for a wide range of data, both aggregated data and for individual patient data. However, in low and middle income countries (LMIC) routinely collected data have not been used quite as much, partly because the data are not so easily available, and partly due to the dearth of data professionals to analyse the data (1). There have been several initiatives to increase the numbers of data professionals in sub-Saharan Africa (SSA), and to improve their skills. This topic aimed to explore the extent to which routine data for health are being used across LMIC and how this was being led by African researchers.

One area where the analysis of routine health has produced beneficial results is through electronic health records (EHR). In SSA there has been a lot of investment in EHR for HIV services, and we anticipated papers that used anonymized records of people living with HIV (PLHIV) and the impact of HIV treatment on survival and quality of life (2). Two papers in this collection identified ways to model HIV disease progression using routinely collected data from HIV clinics. From Ethiopia a Poisson-Gamma-Normal model was used to account for overdispersion of the data and correlation within subjects, while from Zimbabwe multistate Markov models were used to identify factors that were associated with disease progression (Andualem and Ayele; Matsena Zingoni et al.). Two other papers, both from Tanzania, used the HIV EHR to look at TB infections, with one paper reporting that 96% of clinic visits by PLHIV included screening for TB, and the other giving the incidence of TB among PLHIV as 2 per 1000 person-years (Mollel et al.). This experience demonstrate that routinely collected, openly-accessible health data from many countries promotes a wealth of analyses and papers, which feed into the wider picture for the impact of improved HIV services.

A different use for routine data came from the development of tools to get more accurate estimates of disease. In many countries of Africa, there is a dependence on surveys to provide measures of prevalence and incidence of disease (3). However, smaller scale estimates are needed to inform health services for communities and routine data from health facilities can be used to obtain those hybrid estimates (4). A paper from South Africa used spatial interpolation to develop tools to bring routinely collected health data in order to provide more accurate estimates of HIV prevalence at national and sub-national levels (Wabiri et al.). This is an inspiration to others working the field to generate new methods to anneal data in this way.

Edited and reviewed by: *Nilufar Baghaei, Massey University, New Zealand*

> \*Correspondence: *Jim Todd jim.todd@lshtm.ac.uk*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *07 July 2020* Accepted: *10 July 2020* Published: *20 August 2020*

### Citation:

*Todd J and Mahande MJ (2020) Editorial: The Use of Routine Health Data in Low- and Middle-Income Countries. Front. Public Health 8:413. doi: 10.3389/fpubh.2020.00413*

Another use of routine data is to improve health services. This can be done by using the data to show areas where more effort is needed, and by comparing different services using the data they generate. Several papers in this topic showed how health services can be improved through judicious use of the data they generate every month. This ranged from ways to improve the linkage between HIV testing and HIV care (Harklerode et al.) to highlighting the misreporting of patient outcomes if those to follow up are not properly accounted for (Etoori et al.). Routine health has its problems in assessing the impact of interventions in hospitals, not least because of missing data that has not been properly recorded. Gachau et al. showed that the use of missing data methods need to account for the hierarchical nature of the data, in order to correctly assess the impact of an audit and feedback intervention.

Health services for infants are of particular concern as there is often a need to link health records of children with those of their mother. In Tanzania, PATH are starting with immunization records to build the requirement for an electronic immunization registry (EIR), showing this needs to operate on different software platforms (Seymour et al.). In Zambia an increasing number of HIV-infected pregnant women are already on antiretroviral therapy (ART) when they first attend antenatal care (Gumede-Moyo et al.). SmartCare data in Zambia were used to assess the impact of the Option B+ program on subsequent HIV transmission and the impact of ART on the survival of children born with HIV infection (Muyunda et al.; Munthali et al.).

The importance of advanced statistical methods was evident in all the papers. One paper used survival techniques to study marital formation and dissolution in South Africa (Batidzirai et al.). This showed that with advanced statistical methods you can get a more accurate picture of the real risks for life events such as marital formation and dissolution. Another paper showed that time-varying covariates are needed to properly understand the survival of patients given a kidney transplant (Achilonu et al.).

All papers used some interesting advanced statistical methods, These ranged from Bayesian methods to model disease

## REFERENCES


progression, through clustering of effects in hierarchical data, time-varying covariates in survival analyses, and adjusting for missing data. The application of these methods in LMIC, and in sub-Saharan Africa in particular, are crucial to the future analyses of routinely collected health data. These papers represent the first steps of African statisticians to come to grips with the complexity of analysis, and the exploration of Big Data related to the health of the African continent.

## AUTHOR CONTRIBUTIONS

JT conceived the topic and provided editorial inputs to papers. MM supported the topic and provided editorial inputs to a number of papers. Together we identified the need to highlight analysis of routinely collected health data in many African countries.

## FUNDING

The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS) Alliance for Accelerating Excellence in Science in Africa (AESA) and was supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust (Grant No. 107754/Z/15/Z) and the UK government.

## ACKNOWLEDGMENTS

We acknowledge the work and efforts to collect routine health data in many countries, and the people who make this possible, especially those who collect the primary data. We appreciate the recognition of societies and funding for analysts in LMIC, and the efforts to improve the analysis of data. We specifically thank the Sub-Saharan Africa Consortium for Advanced Biostatistical training for supporting students and post-graduates to explore and analyze routinely collected health data.

4. Jeffery C, Pagano M, Hemingway J, Valadez JJ. Hybrid prevalence estimation: method to improve intervention coverage estimations. Proc Natl Acad Sci USA. (2018) 115:13063–8. doi: 10.1073/pnas.1810287115

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Todd and Mahande. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Increasing Proportion of HIV-Infected Pregnant Zambian Women Attending Antenatal Care Are Already on Antiretroviral Therapy (2010–2015)

### Sehlulekile Gumede-Moyo1,2 \*, Jim Todd<sup>1</sup> , Ab Schaap1,3, Paul Mee<sup>1</sup> and Suzanne Filteau<sup>1</sup>

*<sup>1</sup> Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>2</sup> School of Public Health, University of Zambia, Lusaka, Zambia, <sup>3</sup> ZAMBART, School of Public Health, University of Zambia, Lusaka, Zambia*

### Edited by:

*Tobias Freeman Chirwa, University of the Witwatersrand, South Africa*

### Reviewed by:

*Laszlo Balkanyi, Retired, Solna, Sweden Priyamvada Paudyal, University of Sussex, United Kingdom*

> \*Correspondence: *Sehlulekile Gumede-Moyo seh.sokhela@gmail.com*

### Specialty section:

*This article was submitted to Digital Health, a section of the journal Frontiers in Public Health*

Received: *12 October 2018* Accepted: *28 May 2019* Published: *13 June 2019*

### Citation:

*Gumede-Moyo S, Todd J, Schaap A, Mee P and Filteau S (2019) Increasing Proportion of HIV-Infected Pregnant Zambian Women Attending Antenatal Care Are Already on Antiretroviral Therapy (2010–2015). Front. Public Health 7:155. doi: 10.3389/fpubh.2019.00155* Introduction: Accurate estimates of coverage of prevention of mother-to-child (PMTCT) services among HIV-infected pregnant women are vital for monitoring progress toward HIV elimination targets. The achievement of high coverage and uptake of services along the PMTCT cascade is crucial for national and international mother-to child transmission (MTCT) elimination goals. In eastern and southern Africa, MTCT rate fell from 18% of infants born to mothers living with HIV in 2010 to 6% in 2015. This paper describes the degree to which World Health Organization (WHO) guidelines for PMTCT services were implemented in Zambia between 2010 and 2015.

Method: The study used routinely collected data from all pregnant women attending antenatal clinics (ANC) in SmartCare health facilities from January 2010 to December 2015. Categorical variables were summarized using proportions while continuous variables were summarized using medians and interquartile ranges.

Results: There were 104,155 pregnant women who attended ANC services in SmartCare facilities during the study period. Of these, 9% tested HIV-positive during ANC visits whilst 43% had missing HIV test result records. Almost half (47%) of pregnant women who tested HIV-positive in their ANC visit were recorded in 2010. Among HIV-positive women, there was an increase in those already on ART at first ANC visit from 9% in 2011 to 74% in 2015. The overall mean time lag between starting ANC care and ART initiation was 7 months, over the 6 year period, but there were notable variations between provinces and years.

Conclusion: The implementation of the WHO post 2010 PMTCT guidelines has resulted in an increase in the proportion of HIV-infected pregnant women attending ANC who are already on ART. However, the variability in HIV infection rates, missing data, and time to initiation of ART suggests there are some underlying health service or database issues which require attention.

Keywords: ANC, PMTCT, ART, pregnant, HIV-infected, coverage

## INTRODUCTION

The Joint United Nations Programme on HIV/AIDS superfast-track framework of ending the AIDS epidemic by 2030 set to reach and sustain 95% of pregnant women living with HIV with lifelong HIV treatment by 2018 globally (1). The achievement of high coverage and uptake of services along the prevention of mother-to-child (PMTCT) cascade is crucial for national and international mother-to-child transmission (MTCT) elimination goals (2). Twenty-two countries in sub-Saharan Africa with a high burden of MTCT were identified as priority countries for intensified support to achieve the UNAIDS HIV elimination goal (2, 3), which included Zambia. A commonly used surrogate marker for programme effectiveness is programme coverage. For PMTCT this would be the proportion of HIV-infected women and exposed infants in a population that access the different components of the PMTCT programmatic cascade (4). Estimates of coverage with PMTCT services among all HIV-infected pregnant women are vital to monitor progress relative to targets, and to secure donor funding for PMTCT programmes (5).

As a result of increased coverage and improved regimens, rates of HIV transmission from mothers to infants during pregnancy and breastfeeding have decreased around the world (6). The largest decline was in eastern and southern Africa, where it fell from 18% of infants born to mothers living with HIV in 2010 to 6% in 2015 (7). In 2017, 210,000 new infections were averted due to PMTCT (8). Some countries in the SSA like South Africa are approaching the very low MTCT rates achieved in highincome countries, but several others such as Zambia, Angola, DRC, Nigeria, Lesotho, and Kenya lag far behind at the moment (9). In Zambia coverage of pregnant women living with HIV accessing antiretroviral medicines was 92% [78–>95] in 2017, a decrease from 95% in 2015 (8).

This paper describes trends in the coverage of PMTCT services from 2010 to 2015 using the SmartCare database of routine clinical information collected in Zambia. This is the first study to have evaluated the effectiveness of implementing post 2010 PMTCT interventions nationwide using SmartCare routine data.

## METHODS

## Study Design

This was a retrospective cohort study using routinely collected data. The study population was all pregnant women attending antenatal care (ANC) from January 2010 to December 2015 in health facilities using the SmartCare database.

In Zambia over 90% of pregnant women attend ANC services at least once during their pregnancy, but only 47% deliver at health facilities (10). Thus, it is difficult to ensure that eligible pregnant women receive the complete treatment to prevent transmission of HIV to their babies. Although more than 75% of the ANC facilities currently provide PMTCT services, the majority of these facilities are along the country's main rail line and in urban centers, resulting in geographical inequity (10).

## Data Sources

The study retrospectively analyzed the Ministry of Health electronic SmartCare database, using routinely collected data from all pregnant women attending ANC from January 2010 to December 2015. SmartCare is a Zambian Ministry of Healthled project funded from the United States Centre for Disease Control and Prevention (CDC) (11). The SmartCare database was developed to improve continuity of care and provide timely data on maternal and child health, HIV/AIDS, tuberculosis, and malaria interventions for public health purposes. Since 2005, the SmartCare database has been deployed in over 800 health facilities, which represents 40% of all facilities in Zambia, including the biggest and busiest health facilities. These results come from 886 health facilities from all provinces in Zambia. The Southern province had the most number of facilities (254/886) represented in the dataset, followed by the Copperbelt (187/886), and Eastern (166/886) provinces. Muchinga and Northern provinces had the least number of facilities, 10 and 26, in the analyzed dataset.

## Data Extraction

The data was extracted into Excel, without names, but with the unique identity (ID) number, and then transferred to Stata 13 for cleaning and analysis. All women enrolled in a facility using SmartCare have an electronic health record about their ANC visits which includes information collected in each visit. Records are updated at every point of clinical service. SmartCare is organized into comprehensive modules and sub-modules. The information from various modules is linked through the unique ID number. For this study, the ANC data was linked to the HIV Client Summary module and the ARV Eligibility Interaction Module to identify HIV-positive women. Data from the Obstetric History Module was then used to segregate PMTCT clients from general ART clients. The oldest date of HIV testing and ANC visit date were used to determine whether women had known their HIV status before the ANC visit. The final data were stratified by province using the geography file from the Central Statistical Office (CSO) which has a list of all the districts and provinces.

The first step in data cleaning was to remove duplicate data for repeat visits in the same pregnancy (based on parity and gravid status). This was done by keeping the first visit date of each pregnancy then populating any empty fields with information captured at later visits in the same pregnancy. Records for all the mothers <15 years and those above 49 years of age were dropped from the sample making our target group to be those between 15 and 49 years (reproductive age group). The data flow chart is illustrated in **Figure 1**. Age groups were categorized as 15–24, 25–34, and 35–49 years. Marital status was grouped into single, married, divorced, widowed, and missing. The education status groups were no education, primary education, junior secondary, secondary, and tertiary education.

## Statistical Analysis

The data were used to estimate the proportion of HIV-positive pregnant women attending ANC by province and year. The study population was divided into three strata: pregnant women with a new HIV test result documented in ANC clinic, pregnant women

with known status but not on ART, and pregnant women who were already on ART. Among the total number of pregnant women presenting to ANC clinic in each calendar year; the percentages in each group were calculated.

The STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines were used to conduct and report on the findings of this study (12).

## Ethics

Ethical approval was granted from Zambia Biomedical Research Ethic Committee (Ref 101-04-16) and the LSHTM Research Ethics Committee (Ref 12086). Permission to use SmartCare data was granted by the Zambia Ministry of Health. The Ethics Committees that approved the study waived the need for written informed consent to be obtained as this was a secondary analysis of previously collected data and the authors had access only to de-identifiable information.

## RESULTS

## Demographics

There were a total of 327,368 visits to antenatal clinics by 172,517 pregnant women in the SmartCare database (**Figure 1**). However, 64,620 were before 2010, and 2,524 were after 2015. A further 1,052 were under 15 years of age, and 166 were over 49 years of age, leaving104, 155 pregnant women in the final study sample (**Table 1**) with 33% recorded in 2010. Most women were from Copperbelt (27%), Southern (21%), and Eastern (21%) provinces whilst the fewest where from Luapula and North-western provinces. The majority (51%) were between 15 and 24 years and 82% were married. A high proportion had attained primary level (34%) and secondary level (39%) education, however, educational level attainment data was missing for 20% of pregnant women.

## HIV Test Results

Overall during the study period 9% of pregnant women tested HIV-positive (**Table 2**) at ANC visits whilst 43% had missing HIV test result records. In addition, 34% of the missing HIV test results were in 2014, whereas only 2% were in 2011. More so, over 60% of HIV test results were missing in Lusaka and Muchinga provinces.

The overall percentage of HIV-positive pregnant women, who tested for the first time at the ANC decreased from 13% in 2010 to 5% in 2013 and then to 0.15% in 2015. The percentage with missing HIV test results increased from 11% in 2010 to 65% in 2013 and then to 98.8% in 2015 (**Table 2**).

## ART Initiation

Almost half (47%) of the pregnant women who tested HIVpositive in their ANC visit were recorded in 2010 (**Figure 2**). More women knew their HIV-positive status in 2015 (30%) than in 2011 (9%). There was a large increase in the proportion of HIV-positive women who were already on ART from 9% of the HIV-positive women seen in 2011 to 74% of the HIV-positive women seen in 2015.

The overall mean time difference between HIV-positive diagnosis at the first ANC visit and ART initiation was 7 months. If a woman was diagnosed at 14 weeks the analysis suggests that most women were not started on ART until after delivery (7–9 months). However, there are notable variations between visit years (**Figure 3**). There were also large differences between provinces; for example, in 2010 pregnant women in Luapula province took an average of 37 months from diagnosis to treatment whereas in the Copperbelt it took <1 month.

### TABLE 1 | Demographic characteristics of pregnant women attending ANC in Zambia (2010–2015).


## DISCUSSION

Zambia initiated the PMTC programme in 1999 to address the burden of vertical transmission of HIV and to integrate PMTCT in all maternal, newborn, and child health services throughout the country (13). The results of this study show that, although there is a high rate of engagement with PMTCT services, the variability in HIV infection rates, missing data, and time to initiation of ART suggests there are some underlying health service or database issues which require attention.

Our results indicate a progressive increase in the proportion of women who were already on ART before registering for ANC in their visit year (from 3% in 2010 to 74% in 2015). This is likely to be attributable to the introduction of Option B+ for those women with repeat pregnancies and adoption in 2013 of WHO Test and Treat guidelines that recommend anyone who tests positive for HIV should be started on treatment, regardless of their CD4 count. However, the UNAIDS Prevention Gap Report indicated a decline in pregnant women living with HIV who received effective ART from 96% in 2013 to 87% in 2015 (6). In a systematic review of literature on the effectiveness of implementing post 2010 PMTCT guidelines, we concluded that many HIV-infected women who are engaged in care during pregnancy are lost to follow-up during the postpartum period (14).

The increased volume of patients initiating ART due to testand-treat and Option B+ could have threatened programme performance and negatively affected the HIV continuum of care for all HIV–infected patients (15). Mathematical modeling using the Lifelong ART tool indicated that the probability of HIVinfected pregnant women initiating ART would increase by 80%. It was also suggested that while the shift would generate higher PMTCT costs, it would be cost-saving in the long term as it spares future treatment costs by preventing infections in infants and partners (16).

Other studies from Africa have shown that the uptake of PMTCT services could be influenced by health system or structural issues such as staffing level, availability, and cost of ART, capacity of health personnel to prescribe appropriate regimens, shortage of supplies in facilities, failure to follow up mothers' or infants' status, and giving wrong information or suboptimal quality of counseling leading to loss or dropout TABLE 2 | HIV test result for of pregnant women attending ANC in Zambia (2010–2015).


FIGURE 2 | HIV-positive not started on ART refers to women who first tested HIV-positive in this pregnancy but were known HIV not started on ART because they were pregnant when Option A was in place. Started ART refers to women who first tested positive during ANC for this pregnancy and were started on ART. Already on ART refers to pregnant women who were already on ART before seeking ANC services for this pregnancy. Known HIV-positive status refers to women with known HIV-positive status before ANC for this pregnancy so were not tested again but were not started on ART because they were on Option A.

from the PMTCT cascade (17–22). In Zambia lack of human resources remains a serious impediment to addressing HIV, so that even when physical resources are available, there is often not the healthcare personnel to administer them (23). However, knowledge around PMTCT is high: the Zambia Demographic Health Survey (ZDHS) 2013–14, reported that 82% of women and 66% of men were aware of the risk of MTCT and that it can be reduced by taking special drugs during pregnancy (24).

In our study 65% (983/1501) of the women who were initiated on ART after testing HIV-positive during their ANC were documented after the adoption of Option B+ (2013– 15). However, increasing ART initiation coverage does not always translate to programme effectiveness: for example, in a surveillance exercise conducted in Lusaka in 2003, 32% of HIVinfected women reported not to actually ingest the NVP tablet given to them in ANC (25). In addition, the risk of being lost to follow up was higher in 'B+ pregnant' compared to women on ART for their own health in Mozambique (26). In Malawi where Option B+ was first piloted, default, and incomplete adherence were more common with Option B+ than with Option A (27). Hence more efforts must be directed to postnatal programs that ensure retention in care so that women who are initiated on ART do not disengage.

The SmartCare database offers real time data which can enable the Zambian health policy makers to act on urgent PMTCT interventions and improve health care quality and outcomes of

mothers and their infants. However, current data are not available for analyses, as there are delays in uploading data to the central database, cleaning, and verifying data, and making the data safe for extraction and this analysis was based on data that were more than 4 years old. SmartCare is a facility-based approach which is unable to account for individuals who do not access ANC services; hence it's possible that we might overestimate PMTCT effectiveness (4). The ANC SmartCare database sample was likely to be biased toward the generally better outcomes of those who receive ANC services. However, with over 90% pregnant women attending ANC services at least once during their pregnancy (24), the results could be a good indicator of program performance.

These results come from more than 800 health facilities from all provinces in Zambia, and hence representative of the population of pregnant women in Zambia, although some provinces, such as Lusaka, may be under represented due to data quality. The level of missing data on HIV test results indicate that this data must be viewed with caution and hence prevents us making meaningful conclusions in the later years (2014–15) where the missing data HIV test results is almost 99%. This was despite the efforts of PEPFAR and the Ministry of Community Development, Mother and Child Health to strengthen the Health Management Information Systems (HMIS) and linkages with the national electronic health record system (28). The main efforts were supposed to be directed toward supporting partners to utilize the capability of SmartCare to electronically populate the HMIS. In contrast, our study found that the SmartCare database has not been mined and data quality has been deteriorating due to the lack of utilization of the data and its findings. We are conducting qualitative research to investigate the problems with using the SmartCare system and how to improve them.

Our study shows that there are major problems in both the completeness of the collection and reporting of data that tracks PMTCT service delivery. The data quality challenges were similar to other studies from the region using routinely collected data (5, 15, 29, 30). Despite tremendous progress and many country-driven successes achieved during the Global Plan, operational challenges in data use, monitoring, and evaluation for PMTCT persist. Collecting longitudinal data on mother– baby pairs throughout the PMTCT cascade is challenging but necessary to optimize maternal and infant outcomes. However, the Global Plan priority countries (which include Zambia) health records are not properly completed, hence the need to scale up electronic data systems (31) such as the SmartCare.

## LIMITATIONS

SmartCare data collection is implemented parallel to the main line Ministry of Health HMIS, which also collects HIV test results. This has caused high levels of missing HIV test results data in the SmartCare as clinicians prefer to enter the information in the HMIS forms compared to SmartCare forms which are longer (5–6 pages per interaction). It also means the data take time to be processed and may take 2 years or more before it can be available for analysis.

Due to data security and confidentiality from data custodians, we were not able to get the exact date of birth and the national identity numbers for individuals in the SmartCare database. As a result we could not match records of the same individuals in cases where they are double registered through change of name, or facility. A commonly used surrogate marker for programme effectiveness is programme coverage, i.e., the proportion of HIVinfected/exposed mother/infant pairs in a population that receive a PMTCT intervention (4). In our study the infant-mother pairs could not be linked as infants are registered as separate individuals with unique numbers.

The decrease in the number of records for over 30,000 in 2010 to 10,000 in 2015 is likely to have introduced bias and hence affects the external validity of our study. The findings could have been triangulated against HMIS data as this could provide an opportunity to identify omissions and errors in the dataset.

The data was mainly collected for administrative purposes without research intentions; for example, breastfeeding is part of the PMTCT cascade but the database contains no infant feeding information.

## CONCLUSION

The implementation of the WHO post 2010 PMTCT guidelines has resulted in an increase in the proportion of HIV-infected pregnant women attending ANC who are already on ART. The SmartCaredatabase could enable Zambian health policy makers to act on urgent PMTCT interventions and improve health care quality and outcomes of mothers and their infants. However, there is a need first to improve procedures for data collection and entry. The missing data observations indicated the need

## REFERENCES


for further qualitative research to determine why it was such a problem.

## AUTHOR CONTRIBUTIONS

SG-M analyzed the data and wrote the initial draft of the manuscript with guidance from SF and JT. AS, PM, and JT provided advice on cohort datasets and statistical analyses. All authors contributed to subsequent drafts of the manuscript and approved the final version.

## FUNDING

The study was supported by the SEARCH (Sustainable Evaluation through Analysis of Routinely Collected HIV data) Project funded by the Bill & Melinda Gates Foundation grant number OPP1084472.

## ACKNOWLEDGMENTS

We also thank the Ministry of Health, Department of Policy and Planning for granting us permission to access the SmartCare database.

the-way-in-smartcare-electronic-health-records-system-a-benefit-to-bothproviders-and-patients/ (accessed June 5, 2016)


transmission services cascade in South Africa: uptake, determinants, and attributable risk (the SAPMTCTE). PLoS ONE. (2015) 10:e0132425. doi: 10.1371/journal.pone.0132425


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Gumede-Moyo, Todd, Schaap, Mee and Filteau. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Analysis of Hierarchical Routine Data With Covariate Missingness: Effects of Audit & Feedback on Clinicians' Prescribed Pediatric Pneumonia Care in Kenyan Hospitals

Susan Gachau1,2 \*, Nelson Owuor <sup>2</sup> , Edmund Njeru Njagi <sup>3</sup> , Philip Ayieko<sup>4</sup> and Mike English1,5

*<sup>1</sup> Health Services Unit, Kenya Medical Research Institute-Wellcome Trust Research Programme, Nairobi, Kenya, <sup>2</sup> School of Mathematics, University of Nairobi, Nairobi, Kenya, <sup>3</sup> Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>4</sup> Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>5</sup> Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom*

### Edited by:

*Jim Todd, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom*

### Reviewed by:

*Charles Opondo, University of Oxford, United Kingdom Andrew Max Abaasa, Medical Research Council, Uganda*

> \*Correspondence: *Susan Gachau sgachau@kemri-wellcome.org*

### Specialty section:

*This article was submitted to Digital Health, a section of the journal Frontiers in Public Health*

Received: *02 February 2019* Accepted: *02 July 2019* Published: *16 July 2019*

### Citation:

*Gachau S, Owuor N, Njagi EN, Ayieko P and English M (2019) Analysis of Hierarchical Routine Data With Covariate Missingness: Effects of Audit & Feedback on Clinicians' Prescribed Pediatric Pneumonia Care in Kenyan Hospitals. Front. Public Health 7:198. doi: 10.3389/fpubh.2019.00198* Background: Routine clinical data are widely used in many countries to monitor quality of care. A limitation of routine data is missing information which occurs due to lack of documentation of care processes by health care providers, poor record keeping, or limited health care technology at facility level. Our objective was to address missing covariates while properly accounting for hierarchical structure in routine pediatric pneumonia care.

Methods: We analyzed routine data collected during a cluster randomized trial to investigating the effect of audit and feedback (A&F) over time on inpatient pneumonia care among children admitted in 12 Kenyan hospitals between March and November 2016. Six hospitals in the intervention arm received enhance A&F on classification and treatment of pneumonia cases in addition to a standard A&F report on general inpatient pediatric care. The remaining six in control arm received standard A&F alone. We derived and analyzed a composite outcome known as Pediatric Admission Quality of Care (PAQC) score. In our analysis, we adjusted for patients, clinician and hospital level factors. Missing data occurred in patient and clinician level variables. We did multiple imputation of missing covariates within the joint model imputation framework. We fitted proportion odds random effects model and generalized estimating equation (GEE) models to the data before and after multilevel multiple imputation.

Results: Overall, 2,299 children aged 2 to 59 months were admitted with childhood pneumonia in 12 hospitals during the trial period. 2,127 (92%) of the children (level 1) were admitted by 378 clinicians across the 12 hospitals. Enhanced A&F led to improved inpatient pediatric pneumonia care over time compared to standard A&F. Female clinicians and hospitals with low admission workload were associated with higher uptake of the new pneumonia guidelines during the trial period. In both random effects and marginal model, parameter estimates were biased and inefficient under complete case analysis.

**15**

Conclusions: Enhanced A&F improved the uptake of WHO recommended pediatric pneumonia guidelines over time compared to standard audit and feedback. When imputing missing data, it is important to account for the hierarchical structure to ensure compatibility with analysis models of interest to alleviate bias.

Keywords: missing data, multiple imputation, PAQC score, routine data, audit and feedback, pediatrics

## INTRODUCTION

Routine data are widely used in many countries to monitor quality of care and to inform intervention programmes for better patients' health outcomes (1).

Routine data can also be used to highlight areas of concern in clinical performance thus prompting actions and strategies to improve practice at individual or institutional levels (2). Prior studies show that quality of care vary across place and time in spite of standard clinical guidelines (3). These variations can be attributed to multiple factors including changes in clinical guidelines, degree of task complexity, and patient's characteristics, clinician characteristics in addition to organizational and contextual factors at hospital level (3–5). Between 2013 and 2014, the Kenya Medical Research Institute-Wellcome Trust Research Programme in collaboration with the Ministry of Health, the Kenya Pediatric Association and 14 county-level hospitals initiated a partnership known as the Clinical Information Network (CIN). The main aim of CIN is to collect and use routine pediatric data to promote adoption and adherence to recommended clinical practices through audit and feedback (A&F) cycles (3, 5–7). While such data from multiple sites enhance generalization of results to wider population, it leads to complex hierarchical data structures, for instance, patients clustered within clinicians, who are then clustered within hospitals.

Besides complex structures, routine data are subject to missing information at any level of hierarchy. Missing information may occur due to lack of documentation of care processes by health care providers, poor record keeping, or limited health care technology at facility level (1, 8, 9). In the occurrence of missing data, appropriate missing data methods at analysis stage are recommended to avoid biased results (10) informing clinical policies and ultimately leading to poor patients care and outcomes (11).

In the recent past, there has been an increase in literature on quality of care among children admitted with common childhood illnesses in low and middle income countries (3, 12– 15). However, majority of the studies account for variation at patient and hospital levels ignoring variation due to clinicians characteristics in spite of their critical role in delivery of routine care (16). Besides, missing data is a common problem across these studies. Majority of the studies report using complete case analysis (13, 17, 18) and multiple imputation (15, 19, 20). A major limitation of complete case records is biased and inefficient parameter estimates due to information loss. In studies where multiple imputation is used to handle missing data, the nature and details of the imputation model are rarely reported posing uncertainty about conclusions and barriers for replicate analyzes. Furthermore, when missing data occur in multilevel data context, incompatibility between the imputation model and the analysis models potentially leads to biased estimates, underestimated cluster level variances, and overestimated individual level variances (10, 21–23). For example, incompatibilities occur when the imputation model assumes data are single level (i.e., ignoring multilevel structure) while the analysis model of interest is multilevel.

In this study, we aim to address missing covariates while properly accounting for hierarchical structure in inpatient routine data set, that is, patients nested within clinicians who are then nested within hospitals. Specifically, we analyze data from a cluster randomized trial investigating the effect of enhanced audit and feedback on clinicians' prescribed pediatric pneumonia care in Kenyan hospitals. To achieve this objective, we construct and analyze pneumonia Pediatric Admission Quality of Care (PAQC) score adapted to new WHO recommendations on assessment and treatment of inpatient pediatric pneumonia cases. PAQC score is a newly developed ordered composite measure used to benchmark quality of care among children admitted with common childhood illnesses in low and middle income settings.

The remainder of this paper is structured as follows: In the Methods section we present a description of pneumonia trial data followed by statistical analysis methods for cluster correlated and missing data methods, respectively. Thereafter, we present results before and after multiple imputation and conclude with a discussion.

## METHODS

## Study Design

In this study we analyzed data from a cluster randomized trial conducted by KEMRI-Wellcome Trust Research programme between March 2016 and November 2016. Details of the trial and the study population are described in full elsewhere (5, 24). In summary, the trial was embedded within the larger CIN study (ongoing) (6, 7, 25). The primary goal of the trial was to investigate whether enhanced audit and feedback improved quality of inpatient pediatrics pneumonia care (i.e., assessment, diagnosis, and treatment of childhood pneumonia) in Kenyan hospitals following new pneumonia guidelines recommended by the World Health Organization (WHO) in 2013 (26). Six hospitals were randomized to receive a standard audit and feedback report on general inpatient pediatric care (control arm). The remaining six hospitals received a standard audit and feedback report in addition to an enhanced audit and feedback targeting assessment, classification and treatment of pneumonia cases (intervention arm) (5, 24). Trained data clerks abstracted routine data from the medical records into Research Electronic Data Capture (REDCap) tool after patient's discharge from general pediatric wards. Data abstraction process was guided by a standard operational procedure manual (5). Patients' data spanned history of illness, physical examination, diagnosis, laboratory investigations, treatments, and discharge plans (5, 24). Details of admitting clinician including sex and professional qualification were also recorded into a separate database linked to the patients' database by a unique clinician code.

Data quality assurance (DQA) exercises were conducted by CIN research assistants in each hospital every 3 months to check consistencies with data clerk's entries. The Kenya Ministry of Health and Kenya Medical Research Institute's Scientific and Ethical Review Unit approved data collection without individual patient's consent (5).

## Outcome: Pneumonia Pediatric Admission Quality of Care Score

Our outcome of interest was pneumonia PAQC score adapted to 2013 WHO pediatric pneumonia treatment guidelines. As earlier mentioned, PAQC score is a summary measure spanning three quality of care domains namely, assessment, clinical diagnosis, and treatment of common childhood illnesses including pneumonia, malaria, diarrhea, and dehydration. Details on PAQC score construction and validation are described in full elsewhere (12, 27). With regard to pneumonia PAQC, there are three binary subcomponents in the assessment domain. The first subcomponent represents assessment and documentation of two primary signs and symptoms required for pneumonia identification (i.e., presence of cough or difficulty in breathing). The value 1 in the binary indicator denotes documentation of both cough and difficulty in breathing as either present or absent while 0 denotes lack of documentation of least one primary sign and symptom in a patient's medical record.

The second binary indicator represents assessment and documentation of secondary signs and symptoms required for pneumonia severity classification (i.e., chest indrawing, respiratory rate, grunting, central cyanosis, oxygen saturation, ability to drink, or altered level of alertness). The value 1 in the binary indicator denotes documentation of all secondary signs and symptoms, respectively, while 0 denotes lack of documentation of least one secondary signs and symptom. The third binary indicator of the assessment domain corresponds to 1 when primary and secondary pneumonia signs and symptoms (all primary and secondary signs and symptoms combined) are documented and 0 otherwise (26).

The second PAQC score domain entails integration of information on presenting signs and symptoms by admitting clinician to correctly diagnose and classify pneumonia severity (i.e., severe pneumonia or pneumonia). For example, pneumonia was the correct diagnosis for a child who, in addition to cough and/or difficult breathing (primary signs), presented with lower chest indrawing or respiratory rate >50 for patients aged 2–11 months (or respiratory rate <40 for patients aged 12–59 months) in the absence of all other secondary signs and symptoms. In this study, a binary indicator was created with value 1 representing correct pneumonia severity classification (i.e., is, pneumonia severity documented in the medical record by the admitting clinician was in line with severity implied by presenting signs and symptoms) and 0 representing misclassified pneumonia severity.

The third PAQC score domain consists of two binary indicators. The first binary variable indicates whether oral amoxicillin was prescribed for pneumonia cases (denoted by 1) or not (denoted by 0). The second binary variable indicates whether oral amoxicillin was prescribed according to guideline recommended doses (26). In order to determine correctness of the dose, we created a new variable "dose per kilo body weight" using actual dose given at point of care, patient's weight, and frequency of administration. Among pediatric pneumonia cases, the recommended oral amoxicillin dose should range between 32 and 48 international units per kilogram (IU/Kg) every 12 h. The new variable was then transformed into a binary variable with 1 representing correct dose (that is, dose per kilo body weight and frequencies of administration are in line with guidelines recommendations) and 0 representing incorrect dose (incorrect in either dose per kilo body weight or frequency of administration) or missing dose. Subsequently, we summed all the six binary components across domains to obtain PAQC score; an ordinal outcome on a 7-point scale. We constructed pneumonia PAQC score at patient level. A minimum score of zero corresponded to inappropriate pneumonia care and maximum score of six represented complete adherence to new pneumonia guidelines across domains of care. To assess performance in terms of adherence to pediatric pneumonia guidelines during the trial period, we calculated and plotted the LOESS smoothing curves and the corresponding 95% confidence bands for the mean monthly PAQC score for each intervention arm.

## Covariates

The covariates of interest were intervention arm, follow up time in months with their interaction, hospital malaria prevalence status, and hospital admission workload. At clinician level, gender, and cadre were considered (here cadre refers to clinician's level of training that is, clinical officers with diploma-level training and medical officers with a bachelor's degree level training). At patient level, we considered sex, number of comorbid illnesses, and age at admission. Prior to analysis, we converted age for all the patients into months before categorizing them into two age groups that is, patients aged 2–11 months and patients aged 12–59 months. With regard to comorbidities, we determined the total number of clinical diagnoses documented in patient's medical records. The diagnoses of interest included malaria, malnutrition, HIV, Asthma, Tuberculosis (TB), rickets, anemia, diarrhea, and dehydration. For each patient, we created separate binary variables for the diagnosis above with value 1 denoting the presence of a disease and 0 denoting absence of a disease. We then summed the binary indicators and categorized patients into four groups, that is those with 0, 1, 2, 3 or more comorbidities, respectively.

## Missing Data Concepts

In the analysis of partially observed data, assumptions were made about the missingness mechanism generating the data (10). Suppose Y (representing both response and independent variables) is an N × p matrix denoting a hypothetical data set containing p variables (j = 1,...,p) for the ith study subject, (i = 1,2,3,. . . ,N). For each study subject, Y<sup>i</sup> can be partitioned into observed and missing components denoted by Y<sup>i</sup> obs and Yi miss, respectively. Further letting a missingness indicator R<sup>i</sup> take the value 1 if Y<sup>i</sup> is observed and 0 if Y<sup>i</sup> is missing. Then according to Rubin (28) data are said to be missing completely at random (MCAR) when the probability of missing values in variable is independent of the variable itself or any other observed variable in the data set that is, P(R<sup>i</sup> |Yi miss, Y<sup>i</sup> obs) = P(Ri). When the probability of missing values in a variable does not depend on the variable of interest but are conditionally dependent on other observed variables in the data set, then data are said to be missing at random (MAR) and denoted by P(R<sup>i</sup> |Yi miss, Y<sup>i</sup> obs) = P(R<sup>i</sup> |Yi obs). When MAR assumption does not hold, then data are said to be Missing Not at Random (MNAR). MNAR mechanism occurs when the missingness depends on the actual value of the missed observation (10).

## Investigating the Missing Data Mechanism

Before analyzing partially observed data, it was important to investigate plausible missing data mechanisms (10, 29). In this study we generated binary missingness indicators (Ri) for partially observed variables in the pneumonia trial data set. The binary missingness indicators were analyzed separately using a logistic regression model below

$$\text{logit}[P(\mathcal{R}\_i)] \;= \mathcal{X}\_i \beta \; \tag{1}$$

where X<sup>i</sup> is a vector of fully observed variables for the ith subject. The vector β denotes fixed regression parameters to be estimated. When the probability of missingness is independent on fully observed variables (P-values for the regression coefficients > 0.05), a variable is said to be MCAR. On the other hand, when the probability of missingness is dependent on fully observed variables (P-values for the regression coefficients < 0.05), then MAR assumptions holds and restricting analysis to complete observations yields bias and inefficient estimates (10, 29, 30). Similarly, when the probability of missingness is dependent on fully observed covariates but independent of the response variable, then covariate dependent MAR assumptions holds and restricting analysis to complete observations yields unbiased but inefficient estimates due to information loss (10, 29, 30). We also used graphical methods to investigate missing data patterns underlying pneumonia trial data (Figure A1 in **Supplementary Material**).

## Multiple Imputation

Multiple imputation (MI) involves substituting each missing value with a set of plausible values given the observed data and an imputation model (10, 31). MI is commonly used assuming a MAR mechanism but can also be used when data are MNAR. Multiple imputed data sets are then analyzed using standard methods and results pooled into a single inference using Rubin's Rule (32). Multiple imputation is preferred over other missing data methods such as list wise or pairwise deletion because uncertainty about the missing values is taken into account (10, 23, 30, 31, 33). Additionally, MI separates imputation phase from analysis phase therefore allowing inclusion of auxiliary variables in the imputation model that are predictive of missing variables and the missingness mechanism (10, 23, 27, 33–35).

In this study, we imputed missing level 1 and level 2 variables within the joint modeling framework where replacement values are drawn from a multivariate normal distribution in a single step. Multilevel MI was implemented in the newly developed jomo and mitmil packages in R (version 3.4.3) which allows imputation of categorical variables with more than two levels in the second and higher levels of the multilevel structure (36). For the ith patient nested within jth clinician in hospital l, we defined a two level JM imputation model corresponding to

$$Y\_{i,j,l}^{(1)} = X\_{i,j,l}^{(1)} \beta^{(1)} + b\_{j,l}^{(1)} + e\_{i,j,l}^{(1)} \tag{2}$$
 
$$Y\_{j,l}^{(2)} = X\_{j,l}^{(2)} \beta^{(2)} + b\_{j,l}^{(2)}$$
 
$$e\_{i,j,l} \sim N(0, \sigma\_c^2), \text{ and } (b\_{j,l}^{(1)}, b\_{j,l}^{(2)}) \sim N(0, \Sigma\_b)$$

where Yi,j,<sup>l</sup> (1) and Yj,<sup>l</sup> (2) are vectors of partially observed level 1 variables (patient's sex) and level 2 variables (clinician's sex and cadre), respectively. Predictor variables (Xi,j,<sup>l</sup> (1)) of missing patient's sex included fully observed follow-up time interacted with feedback arm, hospital admission workload and hospital malaria prevalence status, patient's PAQC score, patient's age and number of comorbid illnesses. Level 2 predictors (Xj,<sup>l</sup> (2)) for missing clinicians' sex and cadre included follow-up time interacted with feedback arm, hospital admission workload, and hospital malaria prevalence status. Column vectors β 1 and β 2 denote level 1 and level 2 fixed effects, respectively. A clinician random intercept (bj,<sup>l</sup> ) was included to account for clustering at clinicians' level and to ensure compatibility with substantive models of interests. A burn-in of 1,000 updates and a 1,000 iterations between each of the 30 imputations were considered. We used trace plots to assess convergence (37). Final estimates were pooled according Rubin's rules.

## Statistical Analysis

We considered two model families to analyze pneumonia trial data, that is, generalized estimating equations (GEE) and random effects models. The random effects and GEE models differ in terms of estimation and interpretation of parameter estimates (30). We considered both models in order to assess the stability of inferences and conclusions within and across the two methods before and after multiple imputation.

### Generalized Estimating Equations (GEE) Model

Generalized estimating equations (GEE) proposed by Liang and Zeger (38) is a quasi-likelihood method for modeling correlated responses within the marginal (population averaged) family of models (29, 30). In GEE model a working correlation structure is adopted. However, the parameter estimates in GEE model are consistent even when the association structure is misspecified (29, 39). A GEE model is given by

$$h^{-1}\left\{E(Y\_i|X\_i)\right\} = X\_i\beta \tag{3}$$

where the link function h −1 (•) is a known function, X<sup>i</sup> is a design matrix for the fixed effects and β is the vector of unknown regression parameters. The vector of regression parameters is interpreted in terms of average response over the population rather than prediction of the effect of changing covariates on a given study subject (29).

When the responses are ordered and the proportional odds assumptions of parallel logits hold, the cumulative logits (proportional odds) model is considered (40). For instance, considering ordered pneumonia PAQC score (outcome) for the ith patient nested within jth clinician in hospital l, the proportional odds GEE model of interest corresponds to

$$\text{logit}[P(Y\_{\text{PACQ Score}} \colon\_{i,j,l} \le k)] = \alpha\_k + \beta\_1 X\_{\text{age group}} \colon\_{i,j,l} \tag{4}$$


where α<sup>k</sup> , k = 1,2,3,4,5,6 are PAQC score intercepts and β ′ s are regression coefficients common across all k−1 cumulative logits.

### Random Effects Model

In contrast to population-averaged models, random effects models are useful when drawing inferences with respect to the subject-specific parameters. Given the covariates and random effects, the responses are assumed to be conditionally independent in this model (29, 30). A random effects model is denoted by

$$h^{-1}\left\{E(Y\_i|X\_i)\right\}\_{\begin{array}{c}} = X\_i\beta + Z\_i b\_i \\ b\_i \sim N(\mathbf{0}, \Sigma) \end{array} \tag{5}$$

where h −1 (•) is a known link function, X<sup>i</sup> and Z<sup>i</sup> are design matrices for the fixed effects and random effects while β and b<sup>i</sup> are vectors of fixed and random parameters, respectively. The vector b<sup>i</sup> is assumed to be sampled from a multivariate normal distribution with mean vector **0** and covariance matrix 6. The vector of regression parameters (β) has subject specific interpretation in terms of the transformed mean response for in individual. Considering pneumonia trial data with ordinal PAQC score as above, proportional odds random intercepts model of interest corresponds to

$$\begin{aligned} &\text{logit}[P(Y\_{\text{PACQ Score}::i,j,l} \le k)] \\ &= \alpha\_k + \beta\_1 X\_{\text{age group}::i,j,l} + \beta\_2 X\_{\text{patient's sex}::i,j,l} \\ &+ \beta\_3 X\_{\text{combody}::i,j,l} + \beta\_4 X\_{\text{clinician's candre}:j,l} + \beta\_5 X\_{\text{clriteian's sex}::j,l} \\ &+ \beta\_6 X\_{\text{admission workload}::l} + \beta\_7 X\_{\text{malaria prevalence}::l} \\ &+ \beta\_8 X\_{\text{time in months}:l} \ast X\_{\text{trial}} x\_{\text{arri}} \cdot l + b\_{jl} \end{aligned}$$

where α<sup>k</sup> , k = 1,2,3,4,5,6 are PAQC score specific intercepts, β ′ s are estimated regression coefficients (common across all k−1 cumulative logits) and bj,<sup>l</sup> are clinician's random intercepts. Hospital level random effects were not considered in these analyses due to the few number of clusters.

### Statistical Tests for Multiple Parameters

We used Wald tests and likelihood-ratio tests to determine covariates with statistically significant effect on pneumonia PAQC score. The likelihood-ratio tests was used to test for statistical significance of covariates in the random effects models (10, 41, 42). On the other hand, Wald tests suggested by Rubin (10, 41) was used for the GEE model. The full (saturated) models contained all the covariates while the reduced (null) models dropped one covariate at a time. The tests were conducted on complete case records and after multiple imputation. Details on multi-parameter hypothesis tests after MI using Wald tests and likelihood-ratio tests are available in Carpenter and Kenward (10, p. 53–54) and Van Buuren (42, p. 157–158). All analyses were conducted in R version 3.4.3. A 5% level of significance was considered under complete case analysis and after MI of missing covariates.

## RESULTS

## Descriptive Summaries

In total, 2,299 children aged 2–59 months were admitted in general pediatric wards with childhood pneumonia in 12 CIN hospitals during the trial period. We linked patients and clinicians' databases using unique clinician code present in both databases with a success rate of 92.5% (2,127/2,299) after exclusion of 172/2,299 case records lacking admitting clinician's information. This resulted to three levels of clustering i.e., 2,127 patients admitted by 378 clinicians in 12 hospitals. Of the 2,127 pneumonia cases, 953/2,127 (44.8%) were admitted in six hospitals assigned to enhanced A&F (intervention) arm. The number of pneumonia cases varied across hospitals with a range of 42–356 patients (**Table 1**).

Five out of 12 hospitals were drawn from high malaria endemic regions (three control and two intervention hospitals) while the remaining seven hospitals (four control and three intervention hospitals) were drawn from low malaria regions in Kenya (25). Furthermore, four in 12 hospitals were high admission workload hospitals that is, more than 1,000 pediatric admissions per annum (three control and one intervention hospitals) while 8/12 were low admission workload hospitals i.e., <1,000 pediatric admissions per annum (three control and five intervention hospitals) irrespective of admission diagnosis. On average, there were 32 clinicians per hospital with a standard deviation of nine clinicians. The number of patients per clinician ranged between 3 and 46. Majority of the admitting clinicians were clinical officer interns at 48.7% (185/378) followed by Medical officer interns at 26.2% (99/378). Clinical officer and medical officers accounted for 1.6% (6/378) each. Approximately, 21.9% (83/378) and 21.7% (82/378) clinicians had missing gender and cadre, respectively (**Table 1**). In subsequent analyses we grouped clinicians into two cadres from the initial four. That is, clinical officers (CO) combining clinical officers and clinical officer interns and medical officers TABLE 1 | Descriptive characteristics of hospitals, clinicians and patients in pneumonia trial data.


‡*CO-Clinical Officer, MO-Medical Officer, H1–H12 denote hospitals participating in the trial.*

(MO) combining medical officers and medical officer interns, respectively. Approximately, 42% (903/2,127) of patients were aged between 2 and 11 months and 45% (950/2,127) were females. Patient's sex was missing in 0.7% (17/2,127) of case records (**Table 1**).

Examining pneumonia PAQC score over time graphically, hospitals in the standard A&F arm (red curve) exhibited a higher mean PAQC score at baseline with no significant fluctuations over time (**Figure 1**). On the other hand, hospitals assigned to enhanced A&F arm (blue curve) had a lower mean PAQC score at baseline which rapidly improved toward higher score in the first 6 months of follow-up. Although enhanced A&F arm's trend line surpassed that of standard A&F arm after 6 months of follow-up, the 95% confidence bands of the two intervention arms overlapped substantially (**Figure 1**).

An assessment of missing data patterns suggested a multivariate missing data pattern (Figure A1 in **Supplementary Material**). The missing data pattern further revealed similarities between of missing clinician's cadre and sex. That is, nearly all clinicians with missing sex had missing cadre as well. Further investigations into missing data patterns showed that missing clinicians' cadre and sex only occurred in six out of 12 hospitals (**Figure 2**).

Logistic regression results on plausible mechanisms underlying pneumonia trial data indicated that the probability of missing patient's sex was neither dependent on the outcome (PAQC score) nor fully observed covariates (interaction between intervention arm and follow up time in months, hospital admission workload, and malaria prevalence, patient's age group, and the number of presenting comorbid illnesses). That is, the P-values were >0.05 suggesting a MCAR mechanism (Table A1 in **Supplementary Material**). On the other hand, the probabilities of missing clinician's cadre and gender were dependent on both the outcome and fully observed covariates suggesting evidence against MCAR (Table A1 in **Supplementary Material**). Therefore,

we imputed missing data assuming a MAR mechanism. MI diagnostic test indicated satisfactory convergence (Figure A2 in **Supplementary Material**).

## Random Effects and GEE Model Results

Test for proportional odds assumption was not statistically significant at 5% level (P = 0.17). Therefore, we assumed parallel logits and fitted proportional odds models to complete case records and imputed datasets. In **Table 2,** we present the likelihood ratio test and Wald test results for proportional odds random effects and GEE model, respectively. After MI of missing covariates, we observed consistent results between the random effects model and the GEE model in terms of statistical significance of covariates of interest (**Table 2**). Specifically, we found statistically significant interaction effect between intervention arm and follow-up time. Similarly, admission workload at hospital level was significant at 5% level. At patients' level, age and the number of comorbidities were statistically significant while at clinicians' level, sex showed significant effect on pneumonia PAQC score (**Table 2**).

In **Table 3**, we present proportional odds ratios and the corresponding 95% confidence interval obtained after fitting the random intercepts model and GEE models before and after multilevel multiple imputation. Standard errors before and after MI are presented in Table A2 (**Supplementary Material**). For the GEE model, we reported robust (empirically corrected) standard errors which were in agreement with model based (naive) standard errors (Table A2 in **Supplementary Material**). Under complete case analysis, only 1,619/2,127 (76.1%) case records were considered.

This loss information led to larger standard errors comparison to those obtained after MI of missing covariates in both random

TABLE 2 | Likelihood ratio test and Wald test statistics for random effects model and GEE model under complete case analysis and after multilevel multiple imputation of missing covariates.


*LRT, Likelihood ratio test; A&F, Audit and feedback; MI, Multiple imputation; GEE, Generalized estimating equations.*

effects and GEE model families. Furthermore, the proportional odds ratios were consistently smaller under complete case analyses compared to those obtained after MI (**Table 3**). These results were an indication of bias and inefficiency of parameters estimated under complete case analysis. The six PAQC score intercepts presented in **Table 3** denote thresholds (cut points) differentiating adjacent levels of the response variable. For example, intercept 1 in **Table 3** denote the odds of PAQC score = 1 vs. PAQC score ≥ 2 for a female patient aged 2–11 months admitted with no comorbidities admitted by a male medical officer in a high workload hospitals located in high malaria prevalence region. The individual fixed effect parameters are the proportional odds ratios of individual variables on PAQC score holding all other variables in the model constant.

From study results, enhanced audit and feedback led to improve uptake of new pneumonia pediatric guideline over time. For instance, considering a patient admitted in an intervention hospital (enhanced audit and feedback arm), the odds of PAQC score = 1 vs. PAQC score ≥ 2 were 1.16 (95% CI: 1.02– 1.308) times higher the odds of a patients admitted in a control hospital, for a unit increase in follow-up time and holding other variables at reference levels. Likewise, for a patient admitted in an intervention hospital, the odds of PAQC score = 1 vs. PAQC score ≥ 2 were 1.29 (95% CI: 1.17–1.482) times higher the odds of a patients admitted in a control hospital, for a unit increase in follow-up month (GEE model after MI). These interpretations hold for all other response (PAQC score) levels.

The study results also exhibited shifts in statistical significance before and after multiple imputation for selected variable. Specifically, adjusting for other variables, complete cases analysis lead to insignificant difference between low and high admission workload hospitals on levels of PAQC score in both random effects model and GEE model where the 95% CI confidence intervals contained the value 1. But after MI, the odds of higher pneumonia PAQC score in low workload hospitals were 1.12 (95% CI: 1.08–1.372) and 1.40 (95% CI: 1.103–2.063) times higher than for high workload hospitals for the random intercepts and GEE model, respectively (**Table 3**).

With regard to random effects model, the variance component between clinicians and the corresponding standard error were inflated under complete cases analysis. A possible explanation for this results is that clinicians with missing cadre and sex were discarded under complete case analysis resulting to fewer number of clinicians (clusters) hence inflated clinicians' variability. On the other hand, all clinicians were retained after MI hence lower variability between clinicians.

## DISCUSSION

This study sought to investigate the effect of enhanced A&F on routine pediatric pneumonia care in 12 Kenyan hospitals during a cluster randomized trial. In the analysis we adjusted for patients, clinicians, and hospital levels factors while accounting for covariate missingness across the three levels of hierarchy. The number of pneumonia admissions varied widely across hospitals during the trial period. The outcome of interest (pneumonia PAQC score) is a composite measure representing multiple aspects of pediatric pneumonia care on a 7-point ordinal scale. The advantage of using composite outcomes over individual performance measures is increased statistical efficiency (43–47). Although we reported and analyzed a fully observed outcome, we note that variations in pneumonia PAQC on the 7-point ordinal scale was attributable to missing data in some of the subcomponents in addition to inappropriate pneumonia care across domains of care (12). Specifically, missing components and those corresponding to inappropriate care were scored zero. Among covariates, clinician variables exhibited the highest proportions of missingness. Approximately 21% of all admitting clinicians had missing sex and cadre, respectively. These observations were consistent with previous results of a cluster randomized trial evaluating the effectiveness of a multifaceted intervention to improve admission pediatric care in eight Kenyan TABLE 3 | Odds ratios (95% confidence intervals) estimated under complete case analysis and after multilevel multiple imputation of missing covariates.


intercepts

*SE, Standard Error; CI, Confidence interval; MO, Medical Officer; A&F, Audit and feedback.*

hospitals (10, 48). In the said study, 14 and 20% of the clinicians had missing sex and years of experience, respectively.

In contrast, patient level variables were fully observed except patient's sex which had <1% missingness. The sharp contrast missingness between clinicians and patients level variables could be due the fact that continued CIN audit and feedback reports focus on the documentation of patient level variables rather than documentation of clinicians' characteristics. Through preliminary investigations, we established that missing clinicians' characteristics occurred in six out of 12 hospitals participating in the trial. The patterns of missingness in the two clinicians level variables was highly correlated. That is, clinicians who did not document their sex were also likely not to document their cadre and vice versa.

To alleviate bias and inefficiency, we used multiple imputation within the joint modeling (JM) imputation framework assuming a MAR mechanism (10, 30, 31). Although JM imputation framework does not address the full range of complexities that are typical of multilevel data (22, 23), it was preferred due to its flexibility coupled with recent statistical software developments in handling categorical variables with more than two levels in second and higher levels of hierarchy (36).

Consistent with our expectations, results demonstrated that multilevel imputation led to more precise parameter estimates compared to complete case analyses in both random effects and GEE models. Adjusting for patients, clinicians and hospital level factors, enhanced A&F improved uptake and adherence to recommended pediatric pneumonia guidelines over time among children aged 2–59 months admitted in six CIN hospitals during the trial period compared to standard A&F on general inpatient pediatric care. The significant difference in the uptake of the pneumonia guidelines between the intervention arms could be due to difference in baseline performance observed in the Loess curves. That is, control hospitals exhibited high baseline performance (on average) thus leaving smaller room for improvement compared to low baseline performance in the enhanced A&F arm hence larger room for improvement over time. These results were consistent with those of the primary analysis (24).

A key difference between our study and that primary analysis is that whereas we analyzed a composite outcome spanning three quality of care domain, Ayieko et al. (24) considered proportion of patients with correct pneumonia classification and treatment, respectively. Furthermore, our study accounted for clinicians' characteristics in addition to patients and hospitals level characteristics accounted for in the primary analysis. From results, the quality of pneumonia care differed between male and female clinicians. It was also evident that junior clinicians (medical officers and clinical officer interns) were responsible for much care during the trial period. However, the quality of care provided did not differ between the cadres. The high number of interns is an indication that hospitals in the trial were teaching and referral hospitals.

## Strengths and Implications of the Study

In this study, we investigated plausible missing data mechanism underlying pneumonia trial data. Though often ignored, this step is important in assessing and understanding the implications of missingness in a given data set under analysis. That is, inefficient estimates or both biased and inefficient estimates. In addition to missing data mechanism, we evaluated missing data patterns underlying the trial data set. This was useful in revealing trends and gaps in the quality of routine care. Insight into such information is useful when designing cost effective follow-up or new interventions programmes for optimal and efficient utilization of already stretched resources (49). For instance, based on our study results, a follow up intervention programme aimed at improving documentation and reporting of clinician characteristics should be directed to specific hospitals low documentation of clinicians' level data while resources in hospitals with good documentation practices should be directed elsewhere.

To address missing data, we employed recent statistical software tools to impute missing variables in routine pediatric data. Our choice of imputation tools and method was in consideration of the hierarchical structure of the data and type of variables in the data set. This ensured compatibility between imputation and analysis models of interest thus minimizing bias in parameter estimates (10, 23). Further, our choice of proportion odds models to analyze the ordinal outcome was ascertained through formal test further enhancing the validity of our study results. In instances when the proportional odds assumptions are violated, multinomial logistic regression model is recommended (40). In contrast to previous studies reporting quality of inpatient pediatric routine care in CIN hospitals (3, 13, 15), our study accounted for clinicians who are essential for the delivery of health intervention (16). Ignoring variation at clinician level may lead to biased estimates, overestimation or underestimation of variations in other levels of clustering (50).

## LIMITATIONS

A limitation of this study is that we relied on data collected after patient discharge. Therefore, we are unable to ascertain if patients received pneumonia care as documented by health workers (24). We imputed missing data assuming MAR mechanism. Therefore, sensitivity analyses will be undertaken to explore the robustness of the inferences to MAR assumptions.

## CONCLUSION

Adjusting for hospitals, admitting clinicians, and patient level factors, enhanced audit, and feedback improved uptake of WHO recommended pediatric pneumonia guidelines compared to standard audit and feedback. Additionally, female clinicians and hospitals with low admission workload were associated with higher uptake of the new pediatric pneumonia guidelines during the trial period. In both random effects and marginal model, parameter estimates were biased and inefficient under complete case analysis. Therefore, multiple imputation is recommended. When analyzing partially observed data with more than one level of clustering, it is paramount to accounts for the hierarchical structure in the imputation model to ensure compatibility with analysis models of interest and hence alleviate bias.

## ETHICS STATEMENT

The Kenya Ministry of Health and Kenya Medical Research Institute's Scientific and Ethical Review Unit approved the use of de-identified patient data obtained through retrospective review of medical records without individual patient consent.

## AUTHOR CONTRIBUTIONS

SG conducted the analyses. Feedback on the analytic approach was provided by EN, NO, PA, and ME. SG drafted the initial manuscript with feedback on subsequent drafts provided by all authors who then approved the final manuscript.

## FUNDING

This work was supported through the DELTAS Africa Initiative Grant No. 107754/Z/15/Z-DELTAS Africa SSACAB. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)'s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust (Grant No. 107754/Z/15/Z) and the UK government. The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK government.

Funds from the Wellcome Trust (Grant No. 097170) awarded to ME as a senior Fellowship together with additional funds from a Wellcome Trust core grant awarded to the KEMRI-Wellcome Trust Research Programme (Grant No. 092654) supported CIN data collection.

## ACKNOWLEDGMENTS

We would like to thank the Ministry of Health who gave permission for this work to be developed and have supported the implementation of the CIN together with the county health executives and all hospital management teams. We are grateful to the Kenya Pediatric Association for promoting the aims of the CIN and the support they provide through their officers and membership. We also thank the hospital teams involved in service delivery for the sick child. This work is published with the permission of the Director of KEMRI. The CIN team who contributed to the design of the data collection tools, conduct of the work, collection of data, and data quality assurance that form the basis of this report and who saw and approved the report's findings include: Grace Irimu, Samuel Akech, Ambrose Agweyu, Michuki Maina, Jacquie Oliwa, David Gathara, Paul Mwaniki, Morris Ogero, James Wafula, Thomas Julius, George Mbevi, Mercy Chepkirui, Abraham Lagat, Lucas Malla (KEMRI-Wellcome Trust Research Programme); Samuel N'garng'ar (Vihiga County Hospital), Ivan Muroki (Kakamega County Hospital), David Kimutai and Loice Mutai (Mbagathi County Hospital), Caren Emadau and Cecilia Mutiso (Mama Lucy Kibaki Hospital), Charles Nzioki (Machakos Level 5 Hospital), Francis Kanyingi and Agnes Mithamo (Nyeri County Hospital), Margaret Kuria (Kisumu East County Hospital), Samuel Otido (Embu County Hospital), Grace Wachira and Alice Kariuki (Karatina County Hospital), Peris Njiiri (Kerugoya County Hospital), Rachel Inginia and Melab Musabi (Kitale County Hospital), Hilda Odeny (Busia County Hospital), Grace

## REFERENCES


Ochieng and Lydia Thuranira (Kiambu County Hospital); Priscilla Oweso (Vihiga County Hospital), Ernest Namayi (Mbale Rural Health and Demonstration Centre), Benard Wambani and Samuel Soita (Kakamega Provincial General Hospital), Joseph Nganga (Mbagathi District Hospital), Margaret Waweru and John Karanja (Kiambu County Hospital), Susan Owano (Mama Lucy Kibaki Hospital), Esther Muthiani (Machakos Level 5 Hospital), Alfred Wanjau (Nyeri Level 5 hospital), Larry Mwallo (Kisumu East District Hospital), Lydia Wanjiru (Embu Provincial General Hospital), Consolata Kinyua (Karatina District Hospital), Mary Nguri (Kerugoya District Hospital), and Dorothy Munjalu (Kitale District Hospital).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00198/full#supplementary-material

address the main problems. Inter J Health Policy Manag. (2017) 6:587. doi: 10.15171/ijhpm.2017.17


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with several of the authors EN and PA, within the last two years.

Copyright © 2019 Gachau, Owuor, Njagi, Ayieko and English. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Modeling Long-Term Graft Survival With Time-Varying Covariate Effects: An Application to a Single Kidney Transplant Centre in Johannesburg, South Africa

### Okechinyere J. Achilonu<sup>1</sup> , June Fabian<sup>2</sup> and Eustasius Musenge<sup>1</sup> \*

*<sup>1</sup> Division of Biostatistics and Epidemiology, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa, <sup>2</sup> Wits Donald Gordon Medical Centre, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa*

Objectives: Patients' characteristics that could influence graft survival may also exhibit non-constant effects over time; therefore, violating the important assumption of the Cox proportional hazard (PH) model. We describe the effects of covariates on the hazard of graft failure in the presence of long follow-ups.

## Edited by:

*Jim Todd, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom*

### Reviewed by:

*Birhanu Ayele, Stellenbosch University, South Africa Innocent B. Mboya, University of KwaZulu-Natal, South Africa*

> \*Correspondence: *Eustasius Musenge Eustasius.Musenge@wits.ac.za*

### Specialty section:

*This article was submitted to Digital Health, a section of the journal Frontiers in Public Health*

> Received: *19 April 2019* Accepted: *08 July 2019* Published: *25 July 2019*

### Citation:

*Achilonu OJ, Fabian J and Musenge E (2019) Modeling Long-Term Graft Survival With Time-Varying Covariate Effects: An Application to a Single Kidney Transplant Centre in Johannesburg, South Africa. Front. Public Health 7:201. doi: 10.3389/fpubh.2019.00201* Study Design and Settings: We studied 915 adult patients that received kidney transplant between 1984 and 2000, using Cox PH, a variation of the Aalen additive hazard and Accelerated failure time (AFT) models. Selection of important predictors was based on the purposeful method of variable selection.

Results: Out of 915 patients under study, 43% had graft failure by the end of the study. The graft survival rate is 81, 66, and 50% at 1, 5, and 10 years, respectively. Our models indicate that donor type, recipient age, donor-recipient gender match, delayed graft function, diabetes and recipient ethnicity are significant predictors of graft survival. However, only the recipient age and donor-recipient gender match exhibit constant effects in the models.

Conclusion: Conclusion made about predictors of graft survival in the Cox PH model without adequate assessment of the model fit could over-estimate significant effects. The additive hazard and AFT models offer more flexibility in understanding covariates with non-constant effects on graft survival. Our results suggest that the period of follow-up in this study is long to support the proportionality assumption. Modeling graft survival at different time points may restrain the possibility of important covariates showing time-variant effects in the Cox PH model.

Keywords: graft survival, time varying covariate effect, Cox PH model, purposeful selection, additive hazard models

## 1. INTRODUCTION

The incidence and prevalence of end-stage kidney disease (ESKD) have significantly increased in developing countries, such as South Africa (1). Patients with ESKD have an increased risk of premature death on chronic dialysis therapy and for long term survival, kidney transplantation is the treatment of choice (2). A successful kidney transplant increases the life-expectancy and quality

**27**

Achilonu et al. Modeling Long-Term Graft Survival

of life of a patient with ESKD. Despite advances in the use of immunosuppressants, recipient and donor factors still compromise the efficacy of a kidney transplant outcome, especially for long-term survival (3, 4). This has brought increased interest in identifying these factors using statistical methods, such as survival analysis. In kidney transplant studies, time-to-graft failure or patient death is usually the event of interest.

Beyond the Kaplan-Meier (KM) estimator, most kidney transplant studies employ the Cox Proportional hazard (PH) model to analyse whether individual patients or donor's characteristics influence the probability of Graft survival (GS) or graft failure (GF). The framework of proportional hazard assumption under the Cox PH model states that factors under study act multiplicatively on the baseline hazard function and either increase or decrease the baseline function at a constant rate (5). This fundamental assumption may not be tenable in kidney transplant studies because the effect of recipient age may impose a strong effect immediately after kidney transplant but gradually fades with time. In this situation, a hazard ratio (HR) does not suggest the same magnitude or size on the survival time. Therefore, the variable is said to have a timevarying effect on survival. Assessing the PH assumption should be the fundamental aspect in the use of the Cox PH model because violation of this assumption could lead to misleading of the resulting parameter interpretation (6). However, if the assumption of PH is violated for any covariate, a more flexible model which does not condition on constant proportional could offer more insight about the relationship between graft survival and the risk factors.

One of these models is the Aalen's additive hazard model (7), which specifies how the covariates impact additively on the hazard, but the effects of the covariates are allowed to vary freely over time. As, however, the closest version of an additive hazard model which is analog to the Cox hazard model is the Lin and Ying (8) model. It assumes the covariates act additively upon an unknown baseline hazard and their effects are constant. Conversely, the effects of the covariates in the model may be constant or time-varying. McKeague and Sasieni (9) proposed a version of the additive model that accommodates both constant and time-varying covariates effects. Although several authors advocated and used the additive hazard models for survival time data, however, the additive hazard model is rarely used in survival data analysis, more especially in kidney transplant research due to lack of familiarities with the model (10, 11). Similarly, the parametric accelerated failure time (AFT) models accommodate time-varying covariates effects. The effect of covariates in an AFT model is constant and act multiplicatively on the survival times (12), and the covariates accelerate or decelerate the occurrence of events of interest i.e., a predictor effect acting to either accelerate or decelerate graft survival time. The formulation of these models allows the estimation of a time ratio (TR) and the regression coefficients are estimated with the method of full maximum likelihood. Parametric survival models were considered by Hashemian et al. (13), in analyzing survival after kidney transplant and noted that parametric survival models provide a more suitable description of the survival data compared with the Cox PH model.

This study is motivated by previous studies on the statistical analysis of kidney transplants done in South Africa (14– 16). These studies focused on the comparison of patients and GS or identification of factors that influence survival using the KM estimator and standard Cox PH model. As an extension to these previous studies, this study aims to use a more rational and methodical approach to (i) identify factors that influence long-term GS using purposeful model building strategy, (ii) affirm the importance of assessing the PH assumption in Cox PH model, and finally (iii) show the need to consider additive hazard and AFT models as a complement to the Cox model when the PH assumption is not tenable.

## 2. PATIENTS AND METHOD

We studied patients ≥18 years that underwent their first kidney transplant at Charlotte Maxeke Johannesburg Academic Hospital between 1984 and 2000. This is a retrospective cohort study, which involves 915 adult patients. Patients were followed-up after transplant, and information detailing patients, donors and transplant characteristics were recorded. GS was defined as the period from transplant to GF, loss to followup or end of the study. That is patients were right-censored if the graft did not fail by the end of the study or the patients were lost to follow-up (graft failure: 1, censored or alive: 0). Deaths with functioning grafts were not captured in this study. GF rates were computed as the ratio of the number of failed grafts to patient-years (PY) of follow-up and expressed as failure rates per 1,000 PY. Predictor variables or covariates for inclusion in this study were identified from literature using factors shown to significantly influence graft survival (16–18).

The covariates considered in this study are not timedependent because they were only measured at the beginning of the study. There is no relationship between each variable missingness and the values of the variable or other variables in the study. Nonetheless, we numerically verified the assumption of missing completely at random using the Little's test of MCAR (19). MissForest based imputation method (20) was used to replace the missing data with reasonable values. MissForest is a non-parametric imputation method that can simultaneously impute different types of variables and its algorithm is based on random forest. There is no need to specify the tweaking parameter or the distribution of the data in the algorithm. For each variable with missing observation, the algorithm fits a random forest model using the rest of the variables in the dataset and then predict the missing values for that variable. The imputation procedure continuously run interactively and performance between iterations are assessed until a stopping criterion is reached. This is done in a repeated approach for all the variables with missing value in the dataset. For the continuous variable (donor age) with missing value, we assessed the performance of the imputation algorithm using the normalized root mean square error and for the categorical variables with missing, we used the proportion of falsely classified entries (21). The data were summarized and relevant information available for

the patients were extracted. The analysis steps are described in **Figure 1**.

## 3. SURVIVAL ANALYSIS METHODS

Let T<sup>i</sup> be a random variable that represents GF time for patient i with characteristics X<sup>i</sup> , a p-dimensional covariate vector. Suppose C<sup>i</sup> denotes right censoring times, the distribution of C<sup>i</sup> is independent of T<sup>i</sup> such that min(T<sup>i</sup> , Ci) is observed. Typically, a survival dataset D<sup>m</sup> consists of m i.i.d. representative observations (T<sup>i</sup> , δi , Xi), i = 1, . . . , n and δ<sup>i</sup> = I [T<sup>i</sup> ≤ C<sup>i</sup> , δ = 1 or T<sup>i</sup> > C<sup>i</sup> , δ = 0] is defined as censoring indicator.

## 3.1. Cox Proportional Hazard Model

The Cox PH model was used to analyse the effect of the study predictors on GS. The purposeful method of variable selection employed in this study was based on the Cox PH model (5). First, the effects of all the study covariates on graft survival were assessed univariately with the Cox PH model (22). If T<sup>i</sup> follows the Cox PH model, then the hazard function for T<sup>i</sup> at time t > 0 conditional on X<sup>i</sup> is given by

$$h(t|X\_i) = h\_0(t) \exp(X\_i^\prime \beta),\tag{1}$$

where h0(t) is arbitrary, the unspecified non-negative function of time known as baseline hazard. It corresponds to the hazard when all predictor variables are equal to zero. β denotes the vector of the regression coefficients, which is estimated using the partial likelihood method. The term exp(X ′ β) depends on covariates, but not time. Significant variables at a 25% level of significance in the univariable analysis were included in the multivariable Cox PH model as applied by Hosmer et al. (5) and Bursac et al. (23). Variables were excluded from the model sequentially if they were neither significant predictor of graft survival nor confounders. The procedure for the purposeful method of variable selection is detailed in Hosmer and Lemeshow (5). Under the Cox PH model, a continuous covariate is assumed to have a log-linear functional form. Sometimes the effect of a covariate may not be in a linear association with the log-hazard. Hence, assuming a linear effect when a non-linear effect is applicable results in misspecification, which definitely affects the estimated coefficients and standard errors. The functional form of the continuous covariates was assessed using the plot of martingale residual from a null model and cumulative sums of martingale residual plot (24–26).

One restrictive assumption of the Cox PH model is the PH assumption. The hazard of two individuals with covariates X<sup>1</sup> and X<sup>2</sup> is said to be proportional when the hazard ratio is constant over time. That is, HR = h(t,X1) <sup>h</sup>(t,X2) <sup>=</sup> h0(t) exp(βX1) <sup>h</sup>0(t) exp(βX2) <sup>=</sup> exp(βX1) exp(βX2) <sup>=</sup> exp β(X<sup>1</sup> − X2) . This implies that the ratio of the two hazards remains proportional or constant over time. When the hazard ratio (HR) of a variable is not constant over time, the covariate is said to have a non-proportional or timevarying effect on survival, which suggests that the effect of the covariate changes over time. Test and graphical methods based on scaled Schoenfeld residuals (**r** w <sup>i</sup> <sup>=</sup> <sup>n</sup>evar(βˆ)**r**i) and technique based on cumulative sums of martingale residuals (U(βˆ, t) = P<sup>n</sup> <sup>i</sup>=<sup>1</sup> <sup>X</sup>iM<sup>ˆ</sup> <sup>i</sup>(t)) (25, 27) were used to verify the validity of proportionality for each selected covariate in the final model. The scaled Schoenfeld residuals vs. time were plotted for each covariate. Under common definition, these residuals are expected to randomly distribute around the zero line slope if proportionality is valid. Also, the observed processes plotted along with 50 simulated processes under the null hypothesis of no model misspecification were compared. The non-proportional hazard assumption for any covariate is revealed if the observed processes are atypical of the simulated processes. A clear lack of fit could be concluded for the Cox PH model due to time-varying covariates effects in the model, which violates the PH assumption.

## 3.2. Additive Hazard Model

To circumvent PH assumption and characterize the nature of the time-varying covariates effects through the cumulative regression function plots, we employed the Aalen additive hazard model, given by

$$h(t|X\_i) = h\_0(t) + X^{'} \boldsymbol{\nu}(t). \tag{2}$$

Similar to model 1, h<sup>0</sup> and γ represent the baseline hazard function and vector of time-varying regression coefficients, which may change in magnitude and even sign over time. The flexibility of the Aalen additive hazard is tempered due to the difficulty indirect estimation of the coefficients function. Hence, the cumulative regression coefficients version is estimated based on the least square estimation of the integrated coefficients βi(t) = R t 0 bi(u)du, i = 1, . . . , p. These effects are graphed against time to investigate if the covariates in model 1 have time-varying or constant effects over time. The more beta is from 0, the higher the impact the coefficients has had on the hazard of graft failure over the period of follow-up. As well, a positive and a negative slope with an increase in covariates indicate an increase and a decrease in hazard, respectively. For a covariate with significant effect, the confidence bands are likely not to include the zero line. Both the Kolmogorov-Smirnov and Cramer Von Mises tests (28) were used to assess the time-invariant effects of the covariates. The cumulative martingale residual was used to assess the fit of the covariates in the Aalen additive hazard model. To further assess the nature of these covariates, we fitted a variation of model 2, given by

$$h(t|X\_i) = h\_0(t) + X\_a^{'} \nu\_a(t) + X\_b^{'} \nu\_b. \tag{3}$$

In this version of the additive model, γ<sup>b</sup> (t) and γ<sup>a</sup> represent a vector of covariates with time-varying and constant effects, respectively. A successive test was done to compare the result of this model and that of the previous models.

## 3.3. Accelerated Failure Time Models

All the significant variables from the Cox PH model were also used to fit AFT models. We used the shape of the hazard function to select the appropriate AFT models, as reported by Khanal et al. (29). The baseline hazard function profile (**Figure S1**) displays a monotone decreasing hazard, which is closer to log-logistic (when k ≤ 1), log-normal (when σ > 1) and Weibull distributions (when γ < 1) (12). The survival functions of the selected distributions are S(t) = 1−8 log t−µ σ , o−<sup>1</sup>

S(t) = n 1 + e θ t k and S(t) = exp(−λt γ ). The distributions are characterized by the location or scale (µ, θ, λ) and shape (σ, k, γ ) parameters. In the AFT model, the effect of covariates is constant and act multiplicatively on survival times. The loglinear relationship between the variables and the log of survival time is given by

$$
\log T = \mu + \alpha^{'}X + \sigma \epsilon,\tag{4}
$$

where µ is the model intercept, α is a vector of regression coefficients quantitatively expressing the impact of each explanatory variable on the survival time. A negative value of α indicates that survival time increases with decreasing value of the explanatory variable and vice versa. The exp (α ′ X) is usually referred to as the acceleration factor. σ is the scale parameter and ǫ is the error term, which is assumed to have a specific distribution, such as a logistics or normal distribution. The deviation of log T from linearity is modeled by the error term. The distribution T is based on the probability distribution of ǫ, and the survival function for T can be obtained from the survival function of the distribution of ǫ. The Akaike information criterion defined by AIC = −2l + 2k (30), where l is the log-likelihood of the model and k is the total number of parameters in the model, was used to compare the fit of the AFT models. The best performing model was used to compare the results of the Cox PH and Additive hazard models. To draw valid inferences from the best-performing models, Deviance residuals were used to assess the adequacy of the selected model (5). The deviance residual is express as

$$r\_{Di} = \text{sign}(r\_{Mi}) \left[ -2 \left\{ r\_{Mi} + \delta\_i \log(\delta\_i - r\_{Mi}) \right\} \right]^{1/2},\tag{5}$$

where the quantity rMi is the martingale residual. The sign function defined by sign(.) takes the value −1 or +1 if its argument is negative or positive, respectively. The deviance residuals are normalized transformations of the martingale residuals and have a mean of zero. If the model is valid, the rDi are more symmetrically distributed around zero compared to rMi. The R codes used to perform the analysis is included in the **Supplementary Material**.

## 4. RESULTS

## 4.1. Descriptive Statistics

The descriptive information available for the 915 patients in this study is summarized in **Table 1**. Majority of the patients (85%) received a kidney from cadaveric donors, and white patients accounted for 56% of the total patients in the study. The unadjusted graft failure rates for the study variable categories are also listed in **Table 1**. Most transplant cases concentrated before 1992, which is the midpoint of kidney transplantation in this study (**Figure 2A**). We observed 43% cases of graft failure by the end of the study; hence, the censoring rate is about 58%. Graft survival at 1, 5, 10 and 15 years are 81% (95% CI: 78–84%), 66% (95% CI: 63–70%), 50% (95% CI: 47–55%), and 37% (95% CI: 32– 42%), respectively. The median follow-up was ∼10 years, about 17% grafts failed after the 1st year of follow-up and this period has the highest hazard rate of GF (**Figures 2B–D**). 18% of the cases have missing observations in their records and there are no missing values in the time variables. The Little's MCAR results show that these observations are missing completely at random (p-value = 0.206). MissForest method of imputation was used to address the issue of missing data in this study and the reliability of the method was assessed. The out-of-bag errors estimated by missForest for the continuous variable and categorical variables are 0.02 and 0.14, respectively. This shows good performance of missForest in imputing missing data because the values are close to zero than 1.

## 4.2. Result From the Cox Proportional Hazard Model

The first step considered in the model building procedure was to explore the relationship between each covariate and graft survival time, univariately. At 25% level of significance, evidence of association with GS is suggested for some variables (**Table 1**). These variables were deemed candidate for inclusion in the multivariable model.

The multivariable model containing all the significant covariates in the univariable analysis was fitted (**Table S1**). In order to simplify the model, p-values of the covariates based on the partial likelihood test were examined. "Donor age" has the largest p-value (p = 0.654), which is not statistically significant. Omitting this covariate and refitting the model results in the likelihood ratio (LR) test of 0.202 (**Table 2**). This is not significant (p = 0.653) at 5% level, indicating no improvement over the full model. Furthermore, the change in coefficients (1βˆ) for each covariate remaining in the model was compared with the original model, the result (**Table 2**) shows that donor age is neither a significant predictor of graft survival nor a confounder. Next, "Histological acute rejection" and "Urological ESKD" were subsequently removed from the model. The LR tests with p-values of 0.349 and 0.227, respectively, show that the model without these covariates is not statistically different from the model with these covariates (**Table 2**). However, the removal of "Urological ESKD" influenced the coefficients of "Renal disease ESKD" and "Hypertension" by more than 15%. "Urological ESKD" would have been retained in the model if "Renal disease ESKD" and "Hypertension" were significant predictors of graft survival at 10% level of significance. Therefore, we considered "Urological ESKD" as an unimportant confounder and exclude the three variables from the model.

There is no significant change in the value of −2LL(βˆ) on deleting "inherited ESKD" and "Hypertension" from model 2, sequentially (p-value = 0.169 and 0.145). The deletion did not confound the relationship of any covariate remaining in the model and graft survival. The final covariates in the multivariate model at this stage is shown in Model 3 (**Table S1**).

In the next stage, "Donor recipient-gender match," "Donorrecipient blood group match," and "Clinical acute rejection" that were not significant in the univariable analysis were sequentially added in the multivariable Model 2). Only "Donor-recipient gender" shows a significant relationship with graft survival, with LR test of 4.908 (p =< 0.027). Hence, we re-consider this variable at this stage of model building (Model 4; **Table 2**). The summary of Model 4 is shown in **Table S1**. We compared the variables selected in the final model with an automated method of variable selection, such as stepwise and best-subset (**Table S3**). We observed that automated methods are susceptible to selecting more variables, which are not significantly related to GS at 5–10% significant levels.

### 4.2.1. Assessment of Linearity Assumption

The next step was to assess the functional form of "Recipient age," as the only continuous variable in the final model (Model 4, **Table S1**). **Figure 3** shows a plot of the martingale residual from a null model and the cumulative martingale residual. The



smoothing spline fit shows evidence of linearity for this variable. It is also obvious that the observed processes for this variable are more typical with the 20 simulated realizations from the null distribution with a complimentary p-value of 0.100. This indicates that a linear term is needed for "Recipient age" in the model.

## 4.2.2. Assessment of PH Assumption and Overall Goodness-of-Fit

There was no significant two-way interaction between the covariates in the model at 5% level of significance. We assessed the assumption of the Cox PH model to confirm if the covariates interpreted above only shift the baseline hazard up or down,

TABLE 2 | Partial likelihood ratio test indicating the effect of deleting covariates that are not significant in the multivariable analysis and their highest impact in coefficient change for other covariates.


*<sup>a</sup>Highest change observed in covariates coefficients after deleting each covariate.*

but does not change over the lifetime of a graft. **Figure S2** shows evidence of time-varying effects for some covariates in the model, given that the curves seem not to drift apart steadily, as should be expected in the case of constant effects. **Table 3** shows the p-values of tests based on the scaled Schoenfeld and cumulative residuals for non-proportional hazard assessment. The results of the two tests support evidence of deviation from the proportionality assumption as shown in **Table 3**. The results are graphically illustrated for each covariate in the Cox PH model (**Figures S3, S4**). These figures suggest non-constant effects over time for the aforementioned variables. However, when these covariates interacted with time in the extended Cox PH model, only recipient ethnicity shows a non-constant effect (**Table S2**). The non-constant effect of these covariates indicate a lack of fit in the Cox PH model, which could lead to misleading parameter interpretation.

TABLE 3 | Non-proportionality test in the Cox PH model, *p*-values for scaled Schoenfeld residuals and cumulative residuals (\*) tests.


*Bold variables represent violation of the PH assumption.*

## 4.3. Result From Additive Hazard Models

The covariates in model 4 (**Table S1**) were used to fit the Aalen additive hazard model. The result is comparable to the Cox PH model in identifying the risk factors of graft survival (**Table 4**). However, the Kolmogorov-Smirnov test shows some evidence of time-varying effect for Donor type and recipient ethnicity in this table (p-values < 0.05), this is supported by Von Cramer Mises test (result not included). The plot of the cumulative regression coefficients for the Aalen model is shown in **Figure 4**. There is a linear increase in the hazard of graft failure with an increase in the recipient age and its confidence interval does not include the



zero line, indicating that age has a significant effect on the hazard of graft failure over the years of follow-up. The 95% confidence interval for other plots include the zero line at some time point, indicating covariates with early (e.g., Delayed graft function) and late (donor type) significant effects on graft survival. Only the plot for donor type has a negative effect on graft survival, the effect at some points flattens before it steeply decreases linearly, which by the test is an indication of a time-varying effect. The cumulative plot for recipient ethnicity shows a curvilinear pattern, it displays a steep increase at the beginning of the followup and shows a roughly zero slope after the first 10 years. The plot suggests a time-varying effect for recipient ethnicity and also that this covariate may not have a late significant effect on graft survival.

The cumulative martingale residual together with 50 simulated processes (**Figure 5**) under the Aalen model shows that the covariates' behavior is more typical with the model (p-values > 0.05), indicating a good fit of Aalen model. The result of the semi-parametric version of the Aalen model is shown in **Table 5**, all the covariates as previously reported shows significant effects on graft survival. For the covariates with constant effects as suggested by the Aalen model, their estimates are shown in **Table 5**.

## 4.4. Result From Parametric AFT Models

The AIC values of the models are 2,444, 2,492, and 2,471 for Weibull, lognormal, and log-logistic models, respectively. The rule is that any model that conforms to the observed data should adequately lead to a smaller AIC. Based on this, the Weibull model is the best-performing model. The distribution of the deviance residuals from the Weibull model is mostly within the range of ±3 except three observations that are slightly outside this bound (**Figure 6**). The result of the Weibull model is similar to that of the Additive hazard models in detecting the significant predictors of graft survival and their directional effects (positive or negative effect), although, the interpretations are not the same. For instance, in **Table 5**, the semi-parametric additive hazard model shows that female patients that received a kidney from female donors had an increase in the hazard of 0.0327 compared with male patients that received a kidney from male donors.

Conversely, the Weibull model shows that female recipients of a kidney from female donors had 44% lower graft survival in comparison to male patients that received from male donors. The Weibull models show that the influence of all the predictors except donor type, decelerate graft survival time. Every additional increase in age, on the average age of the recipients, is associated with 5% decrease in graft survival, this indicates that the older the patient, the higher the hazard of graft failure. This is similar to what is observed in the additive hazard model. Also, the results show that graft survival is prolonged (more than twice) among patients that received live kidney transplant compared with those that received a cadaveric kidney transplant.

## 5. DISCUSSION

In this study, 915 adult patients that underwent a kidney transplant at Charlotte Maxeke Academic Hospital Johannesburg, South Africa were analyzed. This study attempts to appropriately employ more statistically justifiable strategies in selecting the best combination of predictors that influence long-term GS post-kidney transplantation. The method of imputation used in this study has been shown (in studies using different biological and medical datasets) to outperform imputation methods, such as multivariate imputation by the chained equation, nearest neighbor and mean imputation

(20, 31). The Cox PH model is the most attractive survival model when a set of covariates are of interest in modeling time to graft failure. Fitting a large number of variables in this model could add noise to the estimated quantities, resulting in collinearity among variables and increase the cost of modeling unnecessary predictors. The purposeful variable selection method based on the Cox PH model becomes more complex when there are too many predictors in the data. However, this procedure of model building involves a combination of science, statistical method, experience and common sense (32). The purposeful method has been applied in previous studies (5, 23, 32). These studies comparatively showed that purposeful variable selection method leads to significant variables, confounding

factors and a richer model compared with other selection methods; when prediction and identification of risk factors are of interest.

Evaluating the PH assumption for all predictors in the Cox PH model should be a fundamental aspect of the modeling process when using the Cox PH model. Including variable(s) not satisfying the PH assumption leads to an inferior fit of a Cox model i.e., the power of tests is reduced for both variables with constant and non-constant HR in the model. Our results provide evidence of time-varying effects for the covariates in the Cox PH model. This shows that it is necessary to assess this assumption based on the fact that clinical variables effects are rarely constant over time.



*Based on the Aalen's additive hazard model, Donor type, and Recipient ethnicity have time-varying effects on graft survival, and their effects are not estimated under McKeague and Sasieni hazard model.*

The Cox PH, additive hazard and AFT models are used in survival to study the association between risk factors and the event of interest in failure time data. The appropriateness of the individual model may not be known in advance for a specific application. The models may capture the risk process equally or sometimes give a different result (10, 29, 33). For many application in public health, the additive hazard may be plausible since the result gives differences in hazard, rather than a hazard ratio. The same applies to the straightforward interpretation of TR as compared to HR. These models may be compared with regards to the p-values of the covariates in the model, since the greatness of p-value shows the power to reject the null hypothesis (10).

We identified that factors, such as "donor age" and "acute rejections" previously shown to be important risk factors of GS (34–36) are neither significant nor confounders in this study. The difference between the findings of this present study and these previous studies could be linked to differences in sample size (number of graft failures observed), year of transplant, duration of follow-up and method of data analysis. Nevertheless, it is noteworthy that the significant predictors of GS observed in this study are in agreement with previous studies (11, 16–18, 34, 35, 37, 38).

Prognostic assessment with the Cox PH model is generally based on patients/donors characteristics at the time of evaluation. These characteristics have a greater tendency to change, following a long period of study. We have shown in this study that when long-term follow-up is of interest, survival prediction may be discordant with the Cox PH model. We have statistically shown that the Cox PH model did not capture all the significant aspects of the data analysis and did not provide adequate fit in this study. We were able to investigate the time-varying covariate effects with the Aalen additive model and fully estimates the effects of the covariate with the AFT model. The need to explore beyond the Cox PH model is revealed in the Aalen plot, the plot can aide a nephrologist to understand the pattern by which the covariates influence graft survival after transplantation. Considering censored quantile regression model could be alternatives when the PH assumption is not valid in the Cox PH model.

This study has several important strengths. We have used a rational approach in analyzing the kidney transplant data generated from a South African transplant cohort study. The results of this historical data analysis could help to understand long-term performance and progress of kidney transplant outcome in this region, and how the risk factors influence the survival of the graft after kidney transplant. The analysis involves a combination of both recipients of a cadaveric and living donor kidney transplant, focusing on graft failure because maximizing graft is paramount important to the recipient of a new kidney and transplant unit. We found in this study that predictors of graft survival may exhibit time-varying effects.

On the other hand, this study also has some methodological limitations. We found that multicollinearity is a problem in using the purposeful method of variable selection, especially when the covariates are highly related. Specifically, we noticed that dropping any of the causes of ESKD influences the coefficients of others. Taking a decision on which variable to add or retain some times is challenging. However, because the procedure is governed by a specific rule at each step, the choice or decision to drop or retain any variable was critically assessed to avoid multicollinearity in the final model. In addition, 57% censoring observed is another limitation in this study.

## 6. CONCLUSION

Additive hazard and AFT models are yet to receive more deserving attention in modeling GS after a kidney transplant. When covariate effects involve certain patterns of heterogeneity in kidney transplant studies, additive hazard and AFT models could offer great flexibility in modeling GS time. The models used in this study describe different features of the relationship between the risk factors and graft failure. Hence, it appears necessary to use these models complementarily to gain a more comprehensive understanding of GS after a kidney transplant.

## DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

## REFERENCES


## ETHICS STATEMENT

Ethical clearance for JF, approved by the Wits Human Research Ethics Committee. Medical clearance certificate number: M121186.

## AUTHOR CONTRIBUTIONS

OA performed the data cleaning and analysis, interpreted the ensuing results, wrote, and edited the manuscript. JF provided the data used for this study, supervised the data cleaning steps, and edited the manuscript. EM supervised the analysis and the manuscript production, and edited the manuscript.

## FUNDING

OA was supported by DELTAS Africa Initiative-SSACAB with funding from the GlaxoSmithKline (GSK). Grant No. 107754/Z/15/Z-DELTAS Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) programmer.

## ACKNOWLEDGMENTS

The authors would like to thank Dr. Petra Gaylard for her assistance in cleaning the study data. The authors would also like to acknowledge the staff and the patients from the Renal Transplant Unit at Charlotte Maxeke Johannesburg Academic Hospital and all the donors who make kidney transplant possible for patients with kidney failure.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00201/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Achilonu, Fabian and Musenge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Electronic Immunization Registries in Tanzania and Zambia: Shaping a Minimum Viable Product for Scaled Solutions

Dawn Seymour <sup>1</sup> \*, Laurie Werner <sup>2</sup> , Francis Dien Mwansa<sup>3</sup> , Ngwegwe Bulula<sup>4</sup> , Henry Mwanyika<sup>5</sup> , Mandy Dube<sup>1</sup> , Brian Taliesin<sup>6</sup> and Dykki Settle<sup>7</sup>

*<sup>1</sup> BID Initiative, PATH, Lusaka, Zambia, <sup>2</sup> BID Initiative, PATH, Seattle, WA, United States, <sup>3</sup> Ministry of Health, National Expanded Programme on Immunization, Lusaka, Zambia, <sup>4</sup> Ministry of Health, Community Development, Gender, Elderly and Children, Dar es Salaam, Tanzania, <sup>5</sup> Regional Digital Health Director–Africa, PATH, Dar es Salaam, Tanzania, <sup>6</sup> Data for Action, PATH, Seattle, WA, United States, <sup>7</sup> Center of Digital and Data Excellence, PATH, Seattle, WA, United States*

### Edited by:

*Jim Todd, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom*

### Reviewed by:

*Ties Boerma, University of Manitoba, Canada Elphas Luchemo Okango, Strathmore University, Kenya*

\*Correspondence: *Dawn Seymour dseymour@digitalimpactalliance.org*

Specialty section: *This article was submitted to Digital Health, a section of the journal Frontiers in Public Health*

Received: *17 April 2019* Accepted: *22 July 2019* Published: *07 August 2019*

### Citation:

*Seymour D, Werner L, Mwansa FD, Bulula N, Mwanyika H, Dube M, Taliesin B and Settle D (2019) Electronic Immunization Registries in Tanzania and Zambia: Shaping a Minimum Viable Product for Scaled Solutions. Front. Public Health 7:218. doi: 10.3389/fpubh.2019.00218*

As part of the work the Better Immunization Data (BID) Initiative undertook starting in 2013 to improve countries' collection, quality, and use of immunization data, PATH partnered with countries to identify the critical requirements for an electronic immunization registry (EIR). An EIR became the core intervention to address the data challenges that countries faced but also presented complexities during the development process to ensure that it met the core needs of the users. The work began with collecting common system requirements from 10 sub-Saharan African countries; these requirements represented the countries' vision of an ideal system to track individual child vaccination schedules and elements of supply chain. Through iterative development processes in both Tanzania and Zambia, the common requirements were modified and adapted to better fit the country contexts and users' needs, as well as to be developed with the technology available at the time. This process happened across four different software platforms. This paper outlines the process undertaken and analyzes similarities and differences across the iterations of the EIR in both countries, culminating in the development of a registry in Zambia that includes the most critical aspects required for initially deploying the registry and embodies what could be considered the minimum viable product for an EIR.

Keywords: immunization, register, registry, digital, patient data, electronic immunization registry, requirements

## INTRODUCTION

Led by PATH and funded by the Bill & Melinda Gates Foundation, the Better Immunization Data (BID) Initiative is grounded in the belief that better data plus better decisions lead to better health outcomes. The Initiative was designed in partnership with countries to create an environment where reliable, easily accessed, and actionable data can be used to improve health service delivery.

The BID Initiative has partnered with two countries, Tanzania and Zambia, over 4 years to develop, test, and roll out a package of data quality and use interventions. The governments of Tanzania and Zambia identified the most critical data-related challenges with immunization service delivery in each country, many of which were shared challenges:


Both countries formed user advisory groups made up of health workers from all levels of the health system (including community health workers, facility staff, district and provincial managers, and national-level staff) to develop a suite of interventions to address these data-related challenges. An iterative, evolutionary approach to developing the solutions was taken, building on existing systems when possible. Although interventions were designed and tested in the regions identified for initial implementation, close collaboration with government employees and agencies at the national-level focused on creating solutions that would be sustainable, that could scale beyond the initial regions, and that could be used in multiple countries with little additional development effort. Among the solutions in the suite of interventions, the most intricate to develop was the electronic immunization registry (EIR), which gives health workers access to immunization data that can be used for decision-making to improve the effectiveness and efficiency of delivering immunization services.

electronic immunization registries (EIRs) have been used in high-income countries for many years and over the past 10 years or so the movement has grown to introduce them in low and middle-income countries. EIR have been shown to improve the quality (1–3), availability, and accessibility to routine immunization data and reporting (4, 5), and provide critical information to help strengthen program performance such as identifying defaulters, increase coverage rates and timeliness of vaccination (6), and help improve stock management. As mobile technology has improved, the ability to extend such systems to low-resource settings has become a reality (7). This growing body of experiences and knowledge led to the decision to incorporate an EIR into the suite of interventions to address the challenges being faced by the two countries.

Although it was not the original intent, the requirements documented by Tanzania and Zambia for the EIR have proven to be on the cutting edge of technology for electronic immunization registries. Initially, it was assumed that existing systems would be readily found that would meet the requirements for an immunization registry. However, this was not the case and the Initiative invested in four different software platforms in the work with Tanzania and Zambia to finally arrive at the two solutions now used in both countries. The software-development work done across the four different platforms has allowed a comparison of the shared and unique requirements between countries, as well as the minimum set of requirements needed to have a usable EIR.

Throughout this process, several best practices were identified, and lessons learned were documented in planning for technology development, working with software developers and country ministries of health, and implementing new technologies successfully. These experiences can serve as a valuable resource for other countries that want to introduce an electronic immunization (or other health service) registry.

## METHODS

In October 2013, the BID Initiative brought together representatives from the ICT and immunization departments of 10 sub-Saharan African countries (including Tanzania and Zambia) to develop a common set of requirements addressing their shared immunization challenges. To inform this process, the Collaborative Requirements Development Methodology (CRDM) was applied (8). The CRDM, created in 2009 by the Public Health Informatics Institute in conjunction with PATH, is used to collect and document business-process workflows and define requirements for the information systems that support those workflows. The initial set of requirements for a national EIR were compiled in the Product Vision for the BID Initiative in 2013 (9).

These documented requirements showed that countries had a forward-looking vision, especially when compared to the functionality of open source systems available at the time. The requirements demonstrated a desire for solutions that could be easily used and adapted in places where there are challenges with infrastructure, Internet connectivity, and electricity, for example. Solutions should function on mobile devices, such as tablets and phones, and meet best practices in data security and privacy. This vision represented a shift from desktop and laptop devices, which constrained nurses to desks and data entry as a separate process, to mobile devices that enabled nurses to collect data while they were performing other immunization processes and use the data on the devices to make decisions. Multiple nurses could also access and use a single shared device, and ultimately the system would be interoperable with other existing systems in the country (especially the national-level health information management system).

These initial requirements were grouped into functional and system requirements. Functional requirements describe what the system should do, and system requirements describe how the system should perform. Examples of functional requirements include registering a new child, searching for a child already registered in the system, and printing reports. System requirements include functionality such as audit logs, data backups, and user-password recovery.

Tanzania used the common requirements collected from the 10 countries in the Product Vision for the BID Initiative to develop country-specific requirements for its EIR in 2014, with the involvement of immunization, ICT, and monitoring and evaluation staff from the Ministry of Health. The Tanzania EIR requirements were then shared in a request for proposal for the software development. Tanzania initially selected the Generic Immunization Information System (GIIS) platform for its EIR, which became known as the Tanzania Immunization Information System (TIIS).

Challenges with the system led to a revised set of requirements based on lessons learned and user feedback in Arusha region where the system was tested and piloted, and a subsequent search for a different platform in 2015. Challenges encountered included synchronizing data between the two devices used in the same facility as well as with the central database, design decisions that increased the cost and ease of maintaining the source code, and projected cost of extending and replicating the system to other countries. Only the Arusha region of Tanzania continued to use TIIS once improvements were made for several more months before making the transition to a new system was underway.

In 2016, a new platform, Open Immunize (OpenIZ), was selected that would address the modified list of requirements that emerged from lessons learned with TIIS. This second system is called the Tanzania Immunization Registry (TImR) (10). TImR was used in three additional regions of Tanzania during the grant period and later replaced TIIS in Arusha region.

Zambia completed the CRDM process to develop and document country-specific requirements for its EIR (with similar involvement of stakeholders from across the ministry), and the first registry began development in 2015 on the District Health Information Software (DHIS2) Patient Tracker platform using the Patient Tracker application. Challenges adapting the software to meet some of the cutting-edge requirements, such as functionality on Android devices, ability to access and update records in offline mode, and support for multiple users per device led to stopping developing on this platform in 2016 (11). As in Tanzania, Zambia refined the requirements for their EIR based on lessons learned to a minimal set that could be adapted and deployed quickly. In 2017, Zambia selected the Open Smart Registry Platform (OpenSRP) for the second version of their EIR, which was called the Zambia Electronic Immunization Registry (ZEIR) (12) (see **Figure 1** for full timeline).

## RESULTS

The initial Product Vision requirements were refined to a modified set during the CRDM process for the first versions of the EIR in each country and further refined in the second round of development. In this iterative manner, systems were developed focusing on practical operations that would meet key workflows in the immunization process.

Tanzania has deployed the second version of its EIR, the TImR, in four regions (Arusha, Tanga, Kilimanjaro, and Dodoma) covering 1,273 facilities, and deployment to remaining regions is slated for 2019–2020. Zambia has deployed the second version of its EIR, ZEIR, in 291 facilities in Southern Province and is actively seeking funds to support maturing its use in the province, as well as scaling to other provinces.

The 5 years of experience developing EIRs in Tanzania and Zambia under the BID Initiative contributed to important lessons in system development, which were documented and disseminated (13, 14). These lessons include how to work effectively with partners, how to create a culture of data use, the importance of change management to support the transition to and adoption of a new system, and how to train health care workers efficiently in using the new tools. Specific to the EIRs, countries were forward-thinking in envisioning the technical functionality necessary to make the registry a successful tool in improving data quality and use. Through the development process, this advanced set of requirements was narrowed to those most critical for the successful initial deployment of an EIR.

In Tanzania and Zambia, the requirements were modified to reflect the collective input of immunization stakeholders across the entire health system. The modified requirements across the EIRs fell into five thematic areas of functionality in immunization service delivery (see **Table 1**).

## Registration and Search Requirements

This theme includes uniquely identifying each child to ensure that the right child receives the right dose of a vaccine at the right time. Unique identification also enables a country to identify a true denominator of the target population that needs routine vaccines and forecast and plan for the appropriate level of stock. Other examples of registration and search functionality include registering a child in a maternity ward or immunization clinic with minimal information (e.g., no given name yet); entering the mother's or caregiver's information, including telephone number and residence; searching with partial information; and searching for a child using a bar code.

## Vaccine Administration Requirements

One key user request was the ability to display the immunization schedule of each child. Vaccine administration includes requirements for generating initial vaccine schedules based on the child's date of birth, as well as identifying children who are due and overdue for vaccines and allowing an authorized user to set the immunization schedule at the national level.

## Client Management Requirements

Client management covers the ability of the registry to identify and consolidate duplicate records and to warn if a child with the same given name, last name, date of birth, and gender already exists in the system. This functionality complements the unique identification of each child, so that nurses can validate that the right child is receiving the right vaccine dose. It also improves forecasting by providing an accurate number of children in the catchment area and birth cohort. This functionality enables the user to update client information, such as entering the child's name or the mother's mobile number.

## Stock Management Requirements

The EIRs enable health facilities to update and view their vaccine stock data. Stock management requirements include adjusting the stock balance based on a number of reasons including expiration dates, breakage, or doses reported in the EIR, alerting when stock levels are low or have expired, and aggregating vaccine-consumption tracking in terms of doses per vaccine type per time period at the service delivery point.

## Reports

Reports are needed for monitoring the performance of service delivery from the facility to the national level. Since many reports

TABLE 1 | Common functionality groupings of requirements across EIRs.


can be generated from individual records, data can be analyzed in multiple ways. Reports need to show vaccination coverage as the percentage of children living in a certain area who were born in a certain timeframe and were vaccinated with a certain dose. Reports should also categorize defaulter information by location and community health worker and report cases of adverse events. Ideally, reports are both automated and simplified, and they are submitted electronically to the district on a monthly basis.

## Minimum Viable Product

As the two countries' requirements were mapped across the four platforms, similarities and differences emerged between the initial Product Vision and the two rounds of system development. These similarities and differences highlighted requirements that could define the functionality of a minimum viable product (MVP) for an EIR that could be used to improve immunization service delivery, deployed to scale in a country, and adapted for use in other countries. By MVP we mean a product with just enough features to satisfy early users, meet the minimal functionalities, and to provide feedback for future releases of the product. The differences in the number of requirements between those initially defined in the Product Vision and those used for the different versions of the EIRs developed in Tanzania and Zambia are noted in **Table 2**. In addition, **Supplementary Table A-1** outlines the requirements outlined in the RFP processes across the various platforms and in the final products of TImR and ZEIR.

In the first-round TIIS used or validated 57 requirements (16.6%) of the Product Vision requirements as needed for the Tanzania context. The Patient Tracker used or validated 55 requirements (16.1%) of the Product Vision requirements as needed for the Zambia context. Across all three (Product Vision, TIIS, and Patient Tracker), 54 requirements remained the same. Some of these included uniquely identifying every person, entering the vaccine vial monitor status, displaying availability of vaccine stock, and producing a report that identified all children due for a vaccination within the next month.

Some of the requirements for the second round of EIR development carried over from the first systems, but several were modified based on lessons learned from testing and deploying the initial EIRs in health facilities. In Tanzania, the TImR used or validated 35 requirements (10.2%) from the initial Product Vision but overlapped more with the TIIS: 55 requirements were TABLE 2 | Total number of functional and system requirements across the Product Vision and four EIR platforms.


the same between the two versions. In Zambia, the requirements for the ZEIR were compiled with CRDM, as well as from lessons learned from testing the Patient Tracker in health facilities. The ZEIR used or validated 29 requirements (8.5%) of the requirements from the Product Vision but overlapped more with Patient Tracker: 73 requirements were the same between the two systems (see **Box 1** for current state of Patient Tracker).

These comparisons showed that the Product Vision for the BID Initiative represented a larger vision for an EIR with more advanced requirements than what was ultimately deployed in both countries. Modifying the requirements throughout the development process narrowed the requirements to the most critical aspects needed for initial deployment of the EIRs. Areas where requirements were reduced or simplified were in stock management, facility management, complex system management, and complex reporting that was not country specific. These "dream" requirements can be applied to future versions of the EIR, especially as the system is used more frequently and scaled.

## DISCUSSION

An EIR is a critical component of interventions to address data collection, storage, and use challenges in immunization service delivery, but such a system can be complex to design and deploy successfully. A comprehensive understanding of system requirements early in the design process is critical for ensuring that the EIR works well and is embraced by its intended users. These requirements provide the building blocks and define the capabilities of the system.

Both Tanzania and Zambia plan to scale their EIRs nationally and to integrate them with other health information systems in use in those countries, in particular the national-level health information management system which is DHIS2 in both countries. The BID Initiative also seeks opportunities to expand these solutions and to continue to learn from their development, including ways to integrate with solutions beyond immunization and routine data collection, such as supply-chain management. In addition, the Initiative's experiences refining the EIR system requirements can lead to more efficient registry development and deployment in countries that would like to put similar systems into use as well as expand to modules that cover health domains like maternal health.

### BOX 1 | Patient Tracker evolution.

The experience with DHIS2 Tracker took place during 2016. In the 2 years since, The University of Oslo has invested significant time and energy to redesign the application based upon feedback from the larger community and a variety of projects such as the work conducted in Zambia with the BID Initiative. The new Android application (called DHIS2 Android Capture App) is fully integrated with the DHIS2 platform, and replaces not only the Tracker app, but also Event and Data Capture. Many key areas have been strengthened and new ones have been added, including improved performance, increased security features, and improved user experience. A new SDK was also developed, allowing for the creation of custom apps on top of DHIS2. The updated version of DHIS2 for Android was released in September 2018, and the SDK will be released by May 2019. You can find more information at www.dhis2.org/Android.

The similarities and differences in requirements, as well as the groupings of common functionalities, were analyzed across the EIRs in use to understand what an MVP could look like for an EIR. This EIR should include the minimal functionality necessary to be used successfully in the health system, provide health workers with the data needed for decision-making, produce key reports for monitoring, and be scalable nationwide, as well as to other countries.

An MVP should also include international standards in the following areas:


The process undergone in Zambia to arrive at the concise minimum set of requirements for the development of the ZEIR on the OpenSRP platform outlines what could be considered an MVP. The experience in Zambia leads to the conclusion that the 85 requirements defined for ZEIR can meet that definition of an MVP for an EIR (see **Supplementary Table A-2** for the full set of requirements). The International Training and Education Centre for Health in Kenya has since adopted and adapted this system, and many other countries and implementing partners have demonstrated and shared it.

## DATA AVAILABILITY

All datasets generated for this study are included in the manuscript and/or the **Supplementary Files**.

## AUTHOR CONTRIBUTIONS

DSey, LW, HM, MD, and DSet contributed to the design. DSey conducted the analysis and interpretation. DSey and LW wrote the first draft. HM and MD wrote additional sections. All authors contributed to manuscript revision and approved the final version.

## FUNDING

The BID Initiative is funded by the Bill and Melinda Gates Foundation, Investment ID: OPP1042273.

## REFERENCES


## ACKNOWLEDGMENTS

PATH worked with a broad group of partners as part of the BID Initiative and specifically with the four platforms discussed in this manuscript. We would first like to acknowledge the Ministries of Health of Tanzania and Zambia, and in particular their immunization departments for their close partnership and leadership in this work. We would also like to acknowledge the many software developers and other partners who collaborated on the work on these various platforms, including Mohawk College, Ona, University of Oslo, Intrahealth, ecGroup Inc., AIRIS Solutions, and BlueCode.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00218/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Seymour, Werner, Mwansa, Bulula, Mwanyika, Dube, Taliesin and Settle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Factors Associated With HIV Infection in Zimbabwe Over a Decade From 2005 to 2015: An Interval-Censoring Survival Analysis Approach

### Rutendo Birri Makota\* and Eustasius Musenge

*Division of Epidemiology and Biostatistics, Faculty of Health Sciences, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa*

### Edited by:

*Michael Johnson Mahande, Kilimanjaro Christian Medical University College, Tanzania*

### Reviewed by:

*Jenny Renju, University of London, United Kingdom Henry Mwambi, University of KwaZulu-Natal, South Africa*

> \*Correspondence: *Rutendo Birri Makota rutendobbirri@gmail.com*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *22 January 2019* Accepted: *27 August 2019* Published: *18 September 2019*

### Citation:

*Birri Makota R and Musenge E (2019) Factors Associated With HIV Infection in Zimbabwe Over a Decade From 2005 to 2015: An Interval-Censoring Survival Analysis Approach. Front. Public Health 7:262. doi: 10.3389/fpubh.2019.00262* Objectives: The main objective of this study was to compare results from two approaches for estimating the effect of different factors on the risk of HIV infection and determine the best fitting model.

Study design: We performed secondary data analysis on cross-sectional data which was collected from the Zimbabwe Demographic Health Survey (ZDHS) from 2005 to 2015.

Methods: Survey and cluster adjusted logistic regression was used to determine variables for use in survival analysis with HIV status as the outcome variable. Covariates found significant in the logistic regression were used in survival analysis to determine the factors associated with HIV infection over the 10 years. The data for the survival analysis were modeled assuming age at survey imputation (Model 1) and interval-censoring (Model 2).

Results: Model goodness of fit test based on the Cox-Snell residuals against the cumulative hazard indicated that Model 1 was the best model. On the contrary, the Akaike Information Criterion (AIC) indicated that Model 2 was the best model. Factors associated with a high risk of HIV infection were being female, number of sexual partners, and having had an STI in the past year prior to the survey.

Conclusion: The difference between the results from the Cox-Snell residuals graphical method and the model estimates and AIC value maybe due to the lack of adequate methods to test the goodness-of -fit of interval-censored data. We concluded that Model 2 with interval-censoring gave better estimates due to its consistency with the published results from literature. Even though we consider the interval-censoring model as the superior model with regards to our specific data, the method had its own set of limitations.

Keywords: interval-censoring, HIV, survival, prevalence, Zimbabwe

## INTRODUCTION

The 90–90–90 targets was launched by the Joint United Nations Programme on HIV/AIDS (UNAIDS) and partners with the aim to diagnose 90% of all HIV positive persons, provide antiretroviral therapy (ART) for 90% of those diagnosed, and achieve viral suppression for 90% of those treated by 2020 (1).

In Zimbabwe, a population based survey carried out in 2016 reported that 74.2% of people living with HIV (PLHIV) aged 15–64 years knew their HIV status. Amoung the PLHIV who knew their status, 86.8% self-reported current use of Antiretroviral treatment (ART), with 86.5% of those who self-reported, are virally suppressed (2). In order for these 90–90–90 targets to be met, prevalence, and incidence rates estimates are crucial in understanding the current status of the HIV epidemic and determine whether the trends are improving in a manner which can facilitate to achieve the 2020 target.

The gold standard for estimating the HIV incidence is to test uninfected individuals for new infections periodically, however this method is feasible though costly and time-consuming. In addition, even if an HIV negative cohort is followed over-time, the exact date of infection is rarely observed (3). In this scenario, an interval can be determined between the latest negative and the earliest positive test dates. Taking into consideration the issue of cost and time, cohort analysis for estimating the HIV incidence rate in a general epidemic will not produce estimates which are representative of the whole population. Due to these reasons, sentinel surveillance systems have been set up to monitor the spread of the pandemic (4). In addition, population-based surveys, in which HIV tests are performed, are carried after every 5-years in Zimbabwe. The advantage of the population-based survey is that, data is more representative of the population than a cohort. On the other hand, the same data does not provide the exact date of the infection but rather provide what is called "current status data."

Current status data occurs when an individual is observed at one single point, and the only information obtained is whether the event of interest has occurred (5). An example of current status data includes information collected during a demographic health survey, in which an individual is tested whether they are HIV positive or negative. If an individual were found to be HIV positive, the individual was recorded as left censored at the time the test was done. If an individual were HIV negative, they were recorded to be right censored. Sometimes current status data can be referred to as case intervalcensored data, with case II interval-censored data referred to as the general case (6). Interval censoring takes into account the range, that is, an interval inside of which one can say the outcome of interest has occurred (7). Given that we would want to determine the factors associated with the hazard of infection using data from these surveys, then survival analysis can be implemented.

In the setting of standard survival analysis, modeling the hazard rate of HIV infection can be achieved by imputing the time-to-onset of disease as the time at diagnostic visit (3). Modeling current status data using the mentioned two approaches may overestimate the hazard rate. However, analyzing this type of data using interval censoring will be a better approach. Although they are documented advantages

of interval censoring compared to the standard cause-specific survival model, the superiority of the interval-censoring remains unclear in estimating the effect of different exposures on the risk of HIV infection. With this argument in mind, the main objective of this study was to compare results from these different approaches for estimating the effect of different factors on the risk of HIV infection and determine the best fitting model.

## MATERIALS AND METHODS

## Study Design and Area

This study used data from Zimbabwe, a landlocked country, bordered by Mozambique on the East, South Africa on the South, Botswana on the West, and Zambia on the North and Northwest. Zimbabwe is sub-divided into 10 Provinces which are: Matabeleland South, Matabeleland North, Mashonaland

East, Mashonaland Central, Mashonaland West, Midlands, Masvingo, Manicaland, Harare, and Bulawayo. Each province is subdivided into districts, and each district is made up of wards. The designs for the three 5 yearly surveys were cross-sectional.

## Zimbabwe Demographic Health Survey Data and Sampling

Demographic and Health Surveys (DHS) are nationallyrepresentative household surveys that have been implemented in approximately 70 countries since 1984 (8–10). They provide data for a wide range of monitoring and impact evaluation in the areas of population, health and nutrition. Data used for the analysis were obtained from 2005–06, 2010–11, and 2015 ZDHS and were retrieved from the DHS programme website (https:// dhsprogram.com) (11). A representative probability sample of 10,800, 10,828, and 11,196 households were selected for the 2005– 06, 2010–11 and 2015 ZDHS, respectively. A two-stage cluster sampling technique was used to select the households. The first stage selected 400, 406, and 400 enumeration areas (EAs) for 2005–06, 2010–11, and 2015 ZDHS, respectively. At the second stage, using a complete listing of households in the selected EAs, a fixed number of households were randomly chosen. This

TABLE 1 | HIV prevalence in Zimbabwe and changes in HIV prevalence (weighted) from the Zimbabwe DHS surveys 2005–06, 2010–11, and 2015.


\**p-values below 0.05 are considered statistically significant.*

TABLE 2 | Akaike information criterion and bayesian information criterion values.


allowed the use of EAs specific weights to be assigned in the design (8–10).

## Measurement of the Outcome (HIV Status) and Explanatory Variables

With consent from the respondent or parent/guardian (for minors), blood samples were collected in all households for HIV testing in the laboratory for females aged 0–49 and males aged 0–54. Blood spots were collected on filter paper from a finger prick and transported to a laboratory for testing. An initial ELISA test was performed, and then retesting of all positive and 5– 10 % of the negative tests with a second ELISA was done. If they were discordant results on the two ELISA tests, a new ELISA or a Western Blot was performed. The data used for this study was obtained from the DHS Data Archives (11) and only included individuals aged 15–49 years. The following explanatory variables were extracted from the dataset: sex, marital status, education level, religion, currently employed, place of residence, STI treatment in the past 12 months and number of sexual partners (8–10).

## Ethical Considerations

The ZDHS HIV testing protocol for all the three surveys was reviewed and approved by the ethical review boards Medical Research Council of Zimbabwe (MRCZ) in Harare, Zimbabwe; the ORC Macro Institutional Review Board in Calverton, Maryland, USA; and the Centers for Diseases Control (CDC) in Atlanta, Georgia, USA (8–10). This work was granted ethical clearance by the University of Witwatersrand's Human Research Ethics Committee (Medical) (No. M151154). The dataset used in this study was obtained through an application made to Measure DHS program, which was approved on the 16th of May 2017. The DHS Program is authorized to distribute, at no cost, unrestricted

FIGURE 3 | Goodness-of-fit tests for the Zimbabwe DHS 2005–06 (left column), 2010–11 (middle column), and 2015 (right column). Model 1 (top row) and Model 2 (bottom row).

survey data files for legitimate academic research. Registration was required for access to data.

the vector of regression coefficient. Model Specification for all the models, refer to **Supplementary Data**.

## Statistical Methodology

Application to Zimbabwe Demographic Health Survey The socio-demographic datasets for men and women records were appended to provide a single analysis dataset for all the three surveys. The appended dataset was then merged using the unique combination of the individual line number, household line number, and cluster (EA) number to the HIV prevalence dataset. All individuals without an HIV test result, never been sexually active, and individuals who did not have an age at first intercourse were excluded from the analysis. In the case of individuals who were HIV positive when the survey was conducted, the age at HIV infection was defined as age at survey date and for individuals who were HIV negative, the age at HIV infection was right-censored at the date of survey in Models 1 (**Figure 1**). Accounting for interval-censoring, the age at HIV infection was interval-censored between age at first sexual intercourse and age at date of survey, but right-censored at the age at survey, for individuals who were HIV negative for Model 2. All the models assumed a parametric Weibull distribution for the baseline hazard λ0(t) which allowed estimation of β which is

## Statistical Analysis

In this study, the outcome or response variable was the HIV status, a binary variable. The study investigated the socio-cultural, socio-economic, behavioral, and demographic factors, which are associated with HIV. Trends in HIV prevalence were assessed using the non-parametric trend test in STATA. A stepwise logistic regression approach was adopted in STATA SE version 15.1 statistical software (12) using the command **svy: swaic** (13) on some selected explanatory variables highlighted earlier. Factors that were significantly associated with HIV from the stepwise survey logistic regression were then considered for the parametric survival analysis. The most suitable baseline hazard function was investigated using the package **icenReg** (14) in R software. The data were modeled assuming age at survey imputation and intervalcensoring (**Figure 1**). The model goodness-of-fit (GOF) test was assessed using the Akaike Information Criterion (AIC) and two graphical methods which included the Cox-Snell residuals. All analysis were performed in STATA SE version 15.1 and R statistical software.

(right column). Model 1 (top row) and Model 2 (bottom row).

## RESULTS

The 2005–06 ZDHS database had 10,800 households in which 42,698 records were retrieved. Of the 42,698 records, 16,082 records were for individuals aged 15–49 years with 7,175 (44.6%) males and 8,097 (55.4%) females. The 2010–11 ZDHS database had 10,828 households in which 41,946 records were retrieved. Of the 41,946 records, 16,651 records were for individuals aged 15–49 years with 7,480 (44.9%) males, and 9,171 (55.1%) females. The 2015 ZDHS database had 11,196 households in which 43,706 records were retrieved. Of the 43,706 records, 18,351 records were for individuals aged 15–49 years with 8,396 (45.8%) males and 9,955 (54.2%) females. The above information is represented in **Figure 2**.

Nationally, the non-parametric trend test (p < 0.001) showed a significant decline of HIV prevalence from 22.4 to 19.6 to 17.7% for 2005–06, 2010–11, and 2015 ZDHS, respectively (see **Table 1**). A similar decline trend was observed for gender, marital status, place of residence, education level, current employment status, STI in the past year preceding the survey and number of sexual partners.

The mean survival time for age at HIV infection for Model 1 was 41.1 years for females, 42.8 years for males in 2005/06 ZDHS; 41.9 years for females, 43.4 years for males in 2010/11 ZDHS and 42 years for females, 44 years for males in 2015 ZDHS. The mean survival time for age at HIV infection for Model 2 was 24.9 years for females, 28.3 years for males in 2005/06 ZDHS; 25.7 years for females, 29.4 years for males in 2010/11 ZDHS and 26.5 years for females, 30.5 years for males in 2015 ZDHS. According to the survival times, Model 2 produced lower times of survival before HIV infection.

The parametric Weibull distribution was used to investigate the factors associated with HIV infection. The Weibull parametric function was suitable to model the baseline hazard distribution, as shown in **Figure 4**. Model goodness of fit test indicated that Model 2 was the best model based on the Akaike Information Criterion (AIC) presented in **Table 2**. A graphical goodness of fit test was performed, where the semi-parametric model was compared to the parametric model. According to the results in **Figure 3**, Model 1 fits the data better than Model 1 and Model 2. Based on **Figure 3**, Model 1 overestimates the survival rates between 15 and 35 years and underestimates the survival rates between 35 and 49 years. However, **Figure 4** with the Cox-Snell residuals shows that Model 1 is better than Model 2 as the estimated cumulative hazards are close to the reference line which is formed by the Cox-Snell residuals. Furthermore, dot charts depicting the importance of variables in the three models were plotted (**Figure 5**). Place of residence was the least important variable for all the three models, while marital status and sex where the most important variables for Model 1 and Model, respectively (**Figure 5**).

The risk of HIV infection was lower in males than females, as shown in **Table 3** for all the models. The risk for HIV infection had a slight decrease from 2005–06 to 2010–11 to 2015 (HR=0.26, 95% CI: 0.23, 0.31), (HR=0.25, 95% CI:0.21,0.29), (HR=0.22, 95% CI:0.19,0.26), respectively with reference to Model 2. Individuals who were married or cohabiting had a lower risk of HIV infection as compared to those who were single. These results were consistent for all three models. However,

Model 1 (top row) and Model 2 (bottom row).

TABLE 3 | Estimated effects of covariates at baseline on the risk of HIV infection based on different survival models, Zimbabwe Demographic Health Survey (ZDHS) 2005/06, 2010/11, and 2015.


*HR, hazard ratio; CI, confidence interval.*

*Model 1 was the weibull parametric survival model with survey age imputation as the time and Model 2 was a weibull interval censoring model.*

the risk of HIV infection for all the survey years was lower in individuals who were separated/divorced/widowed in reference to Model 1, but Model 2 results indicated that the risk was higher for that particular group of individuals as compared to those who were single (**Table 3**). According to the results in Model 1, the risk of HIV infection was almost the same for those who had one or more than two sexual partners, as compared to those who did not have any sexual partners in the past 12 months prior to the survey. However, Model 2, the risk was more than four times for those who had more than two sexual partners, as compared to those who did not have any sexual partners in the past 12 months prior to the survey. Of interest, the risk of HIV infection for those who had more than two sexual partners increased over time for Model 2, with Model 1 having a decreasing trend (**Table 3**).

## DISCUSSION

Frequently, researchers are interested in using standard survival models in determining the failure times, however interval censoring has become increasingly common. The purpose of this study was to identify risk factors for HIV infection using three different models and determine the best fitting model. To identify the best fitting model, we utilized the Akaike Information Criterion (AIC), where the model with the least AIC value was the best fitting model. We also used graphical methods to ascertain the best fitting model, however graphical goodness-of-fit test for interval-censored data is rare with the available methods are still lacking in implementation (15–18). We managed to check the goodness-of-fit for all the models by overlaying the semiparametric model with the fitted parametric model of the survival function and also plotting the Cox-Snell residuals against the cumulative hazard.

The model with interval censored data resulted in better estimates of the risk of HIV infection as compared to the standard survival model i.e., Models 1 based on the AIC value and **Figure 3**. The superiority of Model 2 was due to the ability to precisely mimic some of the results in literature from previous studies which also used population-based HIV surveys or specific cohorts in Zimbabwe. For example, a study in Zimbabwe conducted between 1999 and 2001 reported that they observed an increasing trend of HIV incidence among the educated individuals, which was rather unexpected; however, this might have been due to a higher socio-economic status, a factor reported to be associated with HIV infection in the region of Africa (19, 20). However, a systematic review which explored time trends in the association between educational attainment and risk of HIV infection in sub-Saharan Africa, reported a shift in the HIV epidemic from educated to the uneducated (21). According to a study in which they used the 2005/06 Zimbabwe Demographic Health Survey (ZDHS) to determine the relationship between HIV status and the demographic and socio-economic characteristics among adults in Zimbabwe by construction the risk profile of the average adult, they concluded that there was a significant negative association between HIV infection and education (22). They further clarified that an extra year of schooling to an average of 8 years (i.e., secondary education and above) was associated with a 0.5 percent point decrease in the probability of HIV infection for Zimbabwe (22). Another study which used the 2010/11 ZDHS also reported a lower risk of HIV infection of individuals with secondary level education and above (23). Based on these findings from literature, and the trend observed, the model with intervalcensoring was consistent with the reported results, however, the other two models reported a higher risk of HIV infection for individuals with secondary level education, which was contrary to the findings from literature. according to an in-depth analysis of the 2005/06 ZDHS on the risk factors associated with HIV infection, it was reported that individuals who never had any sexually transmitted infection 12 months prior the survey were significantly associated with a 0.437 times lower risk of HIV infection compared with their counterparts who had a sexually transmitted infection during the same period (24). This result was close to the results obtained for the hazard of HIV infection in the model with interval-censoring. Another example was a study of the analysis of 2005/06 ZDHS, which reported that the likelihood of being HIV infected increased with the number of sexual partners and decreased with the level of faithfulness to a spousal partner. In the same study, it was reported that the odds of being HIV infected were 3 to 4 times greater among those who had two more sexual partner (25). These results were again consistent with results from the model with interval-censoring rather than the other two models for the 2005/06 ZDHS. On a general note, a 2013 study noted that having a large number of life partners increased HIV infection in a cohort from Manicaland (26).

Comparisons of the three models used in this study reveal a consistent match on the factors associated with HIV infection estimated from ZDHS data and the results obtained from previous studies using population-based HIV surveys or specific cohorts in Zimbabwe. For example, in all the three models, and based on all the three surveys, i.e., ZDHS 2005/06, 2010/11 and 2015, females were more at risk of HIV infection than men. Similarly, studies which determined the factors associated with HIV infection using the 2005/06 and 2010/11 ZDHS reported the same findings (22–24). However, though the surveys were the same, they did not use the same analytical methods to reach the same conclusions. Results suggest that marriage was associated with a lower risk of infection based on all the models. This is further supported by a study which analyzed the 2005/06 and 2010/11 ZDHS data (23). The study reported that marital union was positively associated with the decline of HIV infection for both men and women (23). Another study to determine the baseline predictions of HIV-1 acquisition among women reported that being unmarried was the strongest risk factor for HIV-1 acquisition (27). Results from these studies are again consistent with what all the models in our studies. Even though the models reported results similar to what had already been reported in literature, the precision of the model with interval-censoring in explaining some of the covariates is what stood out the most. However, the Cox-Snell residuals clearly showed that Model 1 was the best fitting model. The difference between the Cox-Snell residuals graphical method and the model estimates maybe due to the lack of adequate methods to test the goodness-of -fit of interval-censored data as cited by other authors (15–18).

The main strength of this study dependent on the quality of the data obtained from the surveys. These data were derived from population-based surveys, which in reality provides more reliable and robust data. Another strength of this study was due to the fact that we did not restrict our analysis to one method, however, we had the opportunity to determine the best model to fit the hazard of infection by comparing two different scenarios. For instance, if the median survival time for HIV infection was 5 years given the type of data we had, and the intervals were about 3 to 6 months wide, then we would have no reason to complicate the analysis by considering interval censoring. On the other hand, if the intervals were about 1 year or longer, then accounting for uncertainty in the analysis was necessary, which we did when we implemented the intervalcensoring approach. Another reason for concluding that intervalcensoring gave better estimates was due to its consistency with the published results from literature. Even though we consider the interval-censoring model as the superior model with regards to our specific data, the method had its own set of limitations. These limitations included the wide range of intervals used, which could have underestimated or overestimated the effect of other factors on the risk of HIV infection. Inclusion of competing risks factors in the model would have greatly improved the modeling approach. Further studies can be done on imputation models, which imputes an estimated time of HIV infection based on the data.

## AUTHOR CONTRIBUTIONS

RB and EM conceived of the presented idea and verified the analytical methods and models. RB developed the theory and performed the data analysis. EM supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.

## FUNDING

This work was supported through the Developing Excellence in Leadership Training and Science Africa (DELTA) initiative. The DELTA's Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)'s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa's Development Planning and Coordinating (NEPAD) Agency with funding from the Wellcome Trust [grant 107754/Z/15/Z- DELTAS Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) programme] and the United Kingdom (UK) government. The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust, or the UK government.

## REFERENCES


## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00262/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Birri Makota and Musenge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Incidence Rates for Tuberculosis Among HIV Infected Patients in Northern Tanzania

Edson W. Mollel 1,2 \*, Werner Maokola1,3, Jim Todd1,4, Sia E. Msuya1,5,6 and Michael J. Mahande<sup>1</sup>

*<sup>1</sup> Department of Epidemiology and Biostatistics, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania, <sup>2</sup> Northern Zone Blood Transfusion Center, Moshi, Tanzania, <sup>3</sup> National AIDS Control Program, Dar es Salaam, Tanzania, <sup>4</sup> Department of Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>5</sup> Department of Community Health, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania, <sup>6</sup> Department of Community Medicine, KCMC Hospital, Moshi, Tanzania*

Background: HIV and tuberculosis (TB) are leading infectious diseases, with a high risk of co-infection. The risk of TB in people living with HIV (PLHIV) is high soon after sero-conversion and increases as the CD4 counts are depleted.

### Edited by:

*Vitali Sintchenko, University of Sydney, Australia*

## Reviewed by:

*Pedro Xavier-Elsas, Federal University of Rio de Janeiro, Brazil Carl-Magnus Svensson, Leibniz Institute for Natural Product Research and Infection Biology, Germany*

> \*Correspondence: *Edson W. Mollel e.mollel@kcri.ac.tz*

### Specialty section:

*This article was submitted to Infectious Diseases - Surveillance, Prevention and Treatment, a section of the journal Frontiers in Public Health*

> Received: *19 July 2019* Accepted: *07 October 2019* Published: *24 October 2019*

### Citation:

*Mollel EW, Maokola W, Todd J, Msuya SE and Mahande MJ (2019) Incidence Rates for Tuberculosis Among HIV Infected Patients in Northern Tanzania. Front. Public Health 7:306. doi: 10.3389/fpubh.2019.00306* Methodology: We used routinely collected data from Care and Treatment Clinics (CTCs) in three regions in northern Tanzania. All PLHIV attending CTCs between January 2012 to December 2017 were included in the analysis. TB incidence was defined as cases started on anti-TB medications divided by the person-years of follow-up. Poisson regression with frailty models were used to determine incidence rate ratios (IRR) and 95% confidence intervals (95% CI) for predictors of TB incidences among HIV positive patients.

Results: Among 78,748 PLHIV, 405 patients developed TB over 195,296 person-years of follow-up, giving an overall TB incidence rate of 2.08 per 1,000 person-years. There was an increased risk of TB incidence, 3.35 per 1,000 person-years, in hospitals compared to lower level health facilities. Compared to CD4 counts of <350 cells/µl, a high CD4 count was associated with lower TB incidence, 81% lower for a CD4 count of 350–500 cells/µl (IRR 0.19, 95% CI 0.04–0.08) and 85% lower for those with a CD4 count above 500 cells/µl (IRR 0.15, 95% CI 0.04–0.64). Independently, those taking ART had 66% lower TB incidences (IRR 0.34, 95% CI 0.15–0.79) compared to those not taking ART. Poor nutritional status and CTC enrollment between 2008 and 2012 were associated with higher TB incidences IRR 9.27 (95% CI 2.15–39.95) and IRR 2.97 (95% CI 1.05–8.43), respectively.

Discussion: There has been a decline in TB incidence since 2012, with exception of the year 2017 whereby there was higher TB incidence probably due to better diagnosis of TB following a national initiative. Among HIV positive patients attending CTCs, poor nutritional status, low CD4 counts and not taking ART treatment were associated with higher TB incidence, highlighting the need to get PLHIV on treatment early, and the need for close monitoring of CD4 counts. Data from routinely collected and available health services can be used to provide evidence of the epidemiological risk of TB.

Keywords: tuberculosis, HIV, Tanzania, incidence rates, sub-Saharan Africa

### Mollel et al. Tuberculosis Incidence in Northern Tanzania

## INTRODUCTION

Tuberculosis (TB) is a disease caused by Mycobacteria tuberculosis, which can be latent in humans for a long time without clinical symptoms. Active TB can present as Pulmonary Tuberculosis (PTB) or Extra-Pulmonary Tuberculosis (EPTB), with cardinal features of fever, productive cough, hemoptysis, and weight loss, though the presentation among HIV infected individuals is often atypical. Several factors have been associated with an increased risk of TB incidence, such as poverty, malnutrition, and overcrowding, but the risk of active TB is 16–27 times higher in people living with HIV (PLHIV) compared to those who are HIV negative (1). This is due to the impaired and lowered innate and passive immunity against TB among PLHIV (2), increasing the risk of getting a new TB infection (3) and of progression from latent TB to active TB (3). New TB infections, rather than reactivation, account for 88% of new TB cases among PLHIV (4).

The risk of TB for PLHIV is high soon after sero-conversion (5), and continues to increase with depletion of CD4 count (6). The CD4 count is a diagnostic and/or prognostic marker that normally measures the number of CD4 expressing T-cells (also known as T helper cells). But the risk of TB among PLHIV decreases after starting anti-retroviral therapy (ART) (53).

TB incidence has been falling since 2013, by 2% globally and by 4% in Africa. However, in 2017, 10 million (range 9–11.1 million) new cases of TB were reported worldwide, of which 25% occurred in Africa and 87% in 30 high TB burden countries (Tanzania included) (7). Of the cases, 90% were >15 years of age, 64% were males and 9% were HIV positive (7). The reported cases include only 51% of the estimated 920,000 new TB cases among PLHIV. Of the 1.5 million people enrolled at Care and Treatment Clinics (CTCs) in 2017, 8% were diagnosed with TB in the same year. Africa accounted for 72% of all HIV associated TB cases in 2017 (7). The End TB Strategy has set a reduction target of 80% in TB incidence (new cases per 100,000 population per year), compared to the level in 2015 (7). Tanzania is one of the High TB Burden Countries, and one of the High TB/HIV Burden Countries. In Tanzania, it is estimated that of 154,000 (range 73,000–266,000) new cases of TB in 2017, 31% (48,000 [31,000–69,000]) were also HIV positive (7). But with only 93% of TB patients in Tanzania having test results for HIV, of which 36% were co-infected with HIV, the true burden of TB among HIV positive people could be underestimated. TB has been the leading cause of death among HIV positive individuals (7), so a close monitoring of its occurrence in this subgroup of people is extremely important.

Several factors have been associated with TB incidence among HIV positive individuals including limited functional status, very low CD4 count (<50 cells/µl) (53), anemia, inappropriate vaccinations, cigarette smoking, households with a family size of 3 to 4 people, a lower social class, non-adherence to drugs and severe immunosuppression (8).

Several interventions have been implemented to try to reduce the incidence of TB in Tanzania's general population and among HIV positive individuals. As some patients may present with subclinical TB (9), WHO recommended active TB screening (intensified case finding) for all PLHIV, and infection control (10). In 2010, a gradual implementation of Genexpert MTB/RIF for the early diagnosis of TB among all TB suspects started, and was scaled up in 2013 (11). This test was initially only for HIV positive patients or for those with recurring TB. In 2011, the country introduced Isoniaziad Preventive Therapy (IPT) among PLHIV (12), which appears to be effective at reducing TB incidence (13). Recently, WHO is recommending a "test and treat" policy which requires all individuals being diagnosed as HIV positive to be put on ARVs immediately (14). Data from CTCs in Northern Tanzania provides an opportunity to track the incidence of TB among HIV patients through this spectrum of different intervention programs.

It is important to study TB incidence rates among PLHIV in Tanzania and compare it with the estimated global TB incidence rate that has been calculated by WHO (7). This can guide clinicians and policy makers on interventions and practices to improve health outcomes, and help to develop preventive measures to reduce the magnitude of the problem. This study determined the predictors and TB incidences among HIV positive patients since enrolment at Care and Treatment Centers (CTCs) after a follow up period of 6 years (January 2012 to December 2017), in the Northern part of Tanzania. Hence providing a big picture of the effects of several interventions that have been implemented over the years.

## METHODOLOGY

## Study Design and Settings

This was a retrospective cohort study which included data which have been routinely collected from patients attending CTCs from 1st January 2012 to 31st December 2017 in the Arusha, Kilimanjaro and Tanga regions. Both public and private CTCs were included, categorized as hospitals (at district level and above), health centers and dispensaries. At every visit, all HIV patients attending a CTC have a regular check-up, and a screening for opportunistic infections. The screening for TB follows WHO recommendations through the assessment of symptoms and signs. All these positive on the screening symptoms (either showing a productive cough, persistent low grade fever, night sweats, or weight loss) have to undergo further testing. This can be done using Genexpert MTB/RIF, or sputum microscopy at centers that have no Genexpert MTB/RIF. Genexpert MTB/RIF is a molecular diagnostic tool used for diagnosis of M. tuberculosis (MTB) and Resistance of these strains to Rifampicin (RIF). Sputum from the patient is mixed with Genexpert MTB/RIF buffer solution and is shaken and incubated for 5–10 min, before being pipetted into cartridge of Genexpert MTB/RIF for computer assisted diagnosis. Those diagnosed with TB are given anti-TB medication.

## Study Population and Data Definitions

All patients (above 15 years of age) who were HIV positive and attended one of 489 CTCs in the regions of Arusha, Kilimanjaro, and Tanga during the period of 1st January 2012 to 31st December 2017 were included in this study. Those who already had a TB diagnosis and/or were on TB treatment at the Mollel et al. Tuberculosis Incidence in Northern Tanzania

time of their first visit to the CTC were excluded, whilst those treated for TB before the start of the study duration were not excluded. The start time was taken to be 1st January 2012, or the date of first enrollment at the CTC if enrollment was after 1st January 2012. End time was defined as whichever came first among the following; the date there were last seen at a CTC, the date of death, the date of the first TB incidence, or 31st December 2017. A TB diagnosis was defined as being started on anti-TB medications after being screened for TB during a visit to a CTC, regardless of the method used to confirm a TB diagnosis. The following predictor variables were collected; age, sex, marital status, geographical location, baseline weight, baseline HIV WHO clinical stage, use of ARV, use of IPT, functional status, ARV adherence status, CTC enrollment year, type of ARV regimen, and baseline CD4 counts.

## Data Analysis

Data were de-identified and analyzed using a statistical software package, STATA 15. After data cleaning, categorical data were summarized as frequencies and percentages. Continuous variables were summarized using their median and interquartile range (IQR) or by using their mean and standard deviation.

Incidence rates, and 95% confidence intervals (95%CI), for each level of independent variable were determined, as the number of newly diagnosed TB cases over the person-years at risk. Health facilities were used as a cluster variable, and Analysis of Variance (ANOVA) was used to compare aggregate rates of TB incidences by: health facility levels, health facility types (dispensaries, health centers, and hospitals), health facility ownership (private and public ownership) and regions (Arusha, Kilimanjaro, and Tanga). A Poisson regression model with frailty to adjust for the clustering at health facilities was used to obtain incidence rate ratios (IRR) for TB, and 95% CI for sociodemographic and clinical characteristics of the patients. Crude incidence rate ratios were then adjusted for other independent factors for TB.

## Ethical Clearance

Ethical clearance was obtained from Kilimanjaro Christian Medical University College Research and Ethical Committee (Ethical clearance certificate number 2286). Permission from the Ministry of Health—Tanzania and NACP (National AIDS Control Program) authority to conduct the study was obtained. All patients' privacy and confidentiality were strictly observed throughout the study.

## RESULTS

The study included 78,748 HIV-positive patients who were followed up for 195,296 person-years, with 405 patients recorded as having had a new episode of TB during the follow up, giving an incidence of TB of 2.08 (95% CI 1.88–2.29) per 1,000 personyears (**Table 1**). Looking at ages, the highest incidence rates for TB, 2.45 per 1,000 person-years (95% CI 2.01–2.99), were in patients aged 35–44 years of age. The TB incidence in males (Incidence = 3.70 per 1,000 person-years, 95% CI 3.21–4.27) was higher than in females (Incidence = 1.50 per 1,000 person-years, TABLE 1 | Incidence rates for TB by socio-demographic characteristics at enrolment into HIV services in 78,748 patients in three regions of Tanzania.


95% CI 1.31–1.72). Those who were divorced and those from Arusha region had higher TB incidences with incidence of 2.58 (95% CI 1.95–3.41) and 2.30 (95% CI of 1.74–3.12) per 1,000 person-years, respectively, than others (**Table 1**).

HIV patients with markers of lower immunity or advanced disease (HIV stage 3&4, CD4 < 350 cells/µl, lower weight and poorer nutritional status) had higher TB incidence than others (**Table 2**). Patients with a severely poor nutritional status had a TB incidence rate of 47.74 per 1,000 person-years (95% CI 26.44– 86.21), while those with a moderately poor nutritional status had a TB incidence rate of 9.53 per 1,000 person-years (95% CI 7.12–12.77), and those with an adequate nutritional status had a TB incidence rate of 1.73 per 1,000 person-years (95% CI 1.54–1.95) (**Table 2**). Higher TB incidence rates was found in those who were bedridden (Incidence = 32.20 per 1,000 personyears, 95% CI 24.89–41.65) and those who were ambulatory (Incidence = 31.06 per 1,000 person-years, 95% CI 18.99–54.14), compared to those who were working (Incidence =1.73 per 1,000 person-years, 95% CI 1.55–1.93) (**Table 1**).

Analysis of Variance (ANOVA) was used to compare rates of TB incidences across the following cluster variables:


TABLE 2 | Incidence rates for TB by clinical characteristics at enrolment into HIV services in 78,748 patients in three regions of Tanzania.

facility types (dispensaries, health centers and hospitals), facility ownership (private and public ownership) and regions (Arusha, Kilimanjaro, and Tanga). There was a significant increased risk of TB incidence in hospitals (3.35 per 1,000 person-years) compared to TB incidences in the health centers (1.28 per 1,000 personyears) and dispensaries (1.36 per 1,000 person-years), p-value 0.0306. There were no statistically significant differences in TB incidences for the cluster variables of facility ownership and region (**Table 3**).

After performing a multilevel analysis and controlling for health facilities as clusters, several factors were significantly related to an increased TB incidence among HIV positive patients such as the year of enrollment at a CTC, those enrolled between TABLE 3 | Comparison of cluster level's TB rates per 1,000 person-years.


2008 and 2012 had an IRR of 1.51 (95% CI 1.12–2.03) while those enrolled between 2013 and 2017 had an IRR of 4.05 (95% CI 3.04–5.39). Moderate and severe nutritional status were also significantly associated with TB incidence among HIV patients, with an IRR of 6.94 (95% CI 4.92–9.80) and 28.05 (95% CI 15.09–52.16), respectively. The use of second line ARVs had an IRR of 1.75 (95% CI 1.04–2.97). The study found that several factors were protective against developing new TB among HIV patients, including being female and having CD4 count between 350 cells/µl to 500 cells/µl both of which were protective by 58%, with IRRs of 0.42 (95% CI 0.34–0.50) and 0.42 (95% CI 0.21– 0.87), respectively. Using ARVs was protective by 57%, IRR 0.43 (95% CI 0.33–0.55). CD4 counts above 500 cells/µl was protective by 84%, IRR 0.16 (95% CI 0.07–0.42) and having a working functional status, were even more protective, by 95%, IRR 0.05 (95% CI 0.04–0.07) (**Table 4**).

After adjusting for both health facility type clusters and other important factors, only two factors were found to be significantly positively-related to incidence of TB, which are severe malnutrition, IRR 9.27 (95% CI 2.15–39.95), and being enrolled at a CTC between the years 2008 and 2012, IRR 2.97 (95% CI 1.05–8.43), compared to being enrolled in the years 2003 to 2007. The following factors remained as significantly protective against TB incidence among HIV patients after doing multilevel analysis. Having CD4 counts above 350 cells/µl with IRRs of 0.19 (95% CI 0.04–0.80) and 0.15 (95% CI 0.04–0.64) for CD4 counts between 350 and 500 cells/µl and above 500 cells/µl, respectively. Use of ART was protective by 66%, IRR 0.34 (95% CI 0.15–0.79), while a working functional status was protective against TB incidence by 85%, IRR 0.15 (95% CI 0.05– 0.47) (**Table 4**). In this multilevel model and controlling for health facility types as variable clusters, the intercept variability across facilities was 0.78 (95% CI 0.23–2.73), with standard error (SE) of 0.5.


### TABLE 4 | Poisson regression with multilevel analysis of the determinants of TB incidence in Northern Tanzania.

## DISCUSSION

In this population of PLHIV attending CTCs for the years 2012 to 2017, in Northern Tanzania, the incidence rate of TB was 2.08 per 1,000 person-years, which is higher than the general TB incidence rate of 1.29 per 1,000 Tanzanian general population, for the year 2016, as reported by the Tanzania National TB and Leprosy Program (NTLP), but lower than the WHO TB incidence estimation of 2.7 per 1,000 population, in the year 2017 (7). But this number is within the estimated range of 1.5–4 new cases per 1,000 in most of the 30 high TB burden countries among the general population. Our estimated incidence rate is lower than other studies done in Tanzania, which showed an incidence of TB among HIV patients to be in the range of 8–17 per 1,000 person-years (15), but this was between the years of 2008 to 2010 for patients who were not on ART and before the introduction of IPT. Another study done in a major city in Tanzania, Dar es Salaam, found the incidence rate to be 27 per 1,000 person-years (13). The city is highly crowded, has a high TB diagnostic capacity and according to NTLP has the highest TB case notification rate in the country. In Nigeria, TB incidence was 5.7 per 1,000 personyears, among HIV patients on ARVs for the period of 2004 to 2012 (16). Higher incidences were found in South Africa (17) and Ethiopia (18) with TB incidences of 44 per 1,000 person-years and 86 per 1,000 person-years, respectively. According to WHO incidence rates need to be falling by 4–5% per year up to 2020, in order to reach the End TB Strategy milestone.

A study in Nigeria (16) and another in South Africa (17), found males to have higher TB incidence than females. Age is also associated with TB incidence with patients aged 25– 34 years having the highest incidence, while those aged 15– 24 years having the lowest incidence trend (**Figure 1**). This agrees with other studies performed in Ethiopia (18) and Nigeria (16). Age and sex differences in TB incidence could be due to cultural factors and economical reasons, whereby men and those in the most economically productive age group are less likely to have time to attend clinics and to receive appropriate care, including Isoniazid Preventive Therapy, hence are more likely to be diagnosed with TB (**Figure 2**). Other studies have shown that males have an increased TB prevalence than females

(especially in low- and middle-income countries) due to the fact that men are disadvantaged in seeking and/or accessing TB care in many settings (19). The same pattern has also been observed in Europe (20).

The three regions of Arusha, Kilimanjaro, and Tanga had essentially similar TB incidence rates, and there were no significant differences between public and private facilities. All health facilities in this dataset had at least one TB case, and PLHIV seen at the hospitals for care and treatment had higher TB occurrence than those attending lower level facilities. This may be due to the fact that many lower level facilities have poor or inadequate availability of diagnostic equipment, and have low skilled health care workers. Hence leading to a reduced capacity for diagnosing TB as these co-infected patients tend to present with atypical TB manifestations which are difficult to diagnose. However, this could also have been due to the referral mechanisms, whereby most PLHIV with more advanced HIV and probably with active TB tend to be referred to hospitals for advanced care and treatment. In addition, most of these lower level facilities do not hospitalize their patients. This observation

was more pronounced in the later years of the study, after the increased roll out of Genexpert MTB/RIF machines to many higher level facilities (**Figure 3**).

Severe malnutrition was found to be associated with an increased risk of TB among HIV patients in our study. Similar findings have been observed in other settings, especially among children (21) where malnutrition accounts for 26% of incident TB (22). It is also positively associated with TB progression (23), poor treatment outcomes (21) and delayed recovery (24). This is because it deteriorates the cell mediated and humoral immunity (21). Though most of the studies have focused on children, malnutrition is common among adult TB patients (25), as well as MDRTB patients (26) and in patients with other infections (27). Strategies to manage malnutrition should incorporate routine TB and HIV screening (28), and modifications to TB treatment (29), though some studies have found that these additional strategies are not yet effective (22) and that there is no significant association between nutritional status and TB severity (30). Even though this is controversial, but if implemented effectively, these strategies might prevent TB incidence and increase the probability of being cured (21, 31), and decrease the risk of TB mortality especially among children (32).

Incident TB increased almost three times for those enrolled in CTCs between 2008 and 2017 compared to those enrolled between 2003 and 2007, the reason for which could be that most of those enrolled between 2003 and 2007 were already on ARVs for some time when we started following up in 2012. In general TB incidence among HIV patients has been declining since 2012, with the exception being during 2017. This decline is congruent with the recent global and African data on TB epidemiological burden, that there has been a consistent decline in TB over the last decade, as well as in the most affected areas of Sub Saharan Africa (7). Success can be attributed to an improved health system; increased use of Isoniazid preventive therapy; increased HIV prevention and awareness (33); early TB diagnosis (11) and treatment; all of which reduces the risk of TB transmission, whilst also strengthening the collaboration between HIV and TB control activities when combined with other diseases. For the year 2017, which had a higher incidence of 3.54 per 1,000 person-years (**Table 1**), this could be attributed to the increased roll out of Genexpert MTB/RIF and that probability that most hospitals had these molecular tests installed.

Our findings have shown that a CD4 count above 350 cells/µl, or being on ARVs, was protective against the development of incident TB. HIV increases TB risk through increasing the risk of acquiring TB (3), or through reactivation of latent TB (2). Immunity against other infections including TB is compromised by HIV infection (34). The risk doubles after sero-conversion (5) and remains high within the first 3 years after enrollment at CTCs (18). TB risk is five times higher in the African region (1), and especially for those with a history of previous TB disease (15), and with a decreasing CD4 count (6), similar to our findings. As found in other studies (35), the risk of TB incidence is there even among HIV patients who are on IPT. The risk decreases with the use of ARVs (33) as our study has found but can still remain 5 times higher compared to HIV uninfected persons (36) despite an increase in their CD4 count. ART benefits are more pronounced in preventing TB in patients with a lower CD4 count (37), this is because ARVs in general restore immunity and so protect against developing TB, but this is controversial as other studies have found that ART use does not reduce TB incidence (38). Even so, this ART protection seems to be lost when one is on second line ARVs (39) as was also found in this study. Others have suggested controlling for hemoglobin levels as low hemoglobin can be used as a predictor of TB among these patients on ARVs (40).

Working functional status was significantly protective against TB incidence as found in other studies (53). Other studies found the risk factors for TB among HIV patients to be low hemoglobin (40), increased HIV viral load (41), genetics (42), WHO HIV stage 3, not taking Isoniazid (43), smoking, diabetes, alcohol use, crowded living, poverty (44), variation in compliance to taking ARTs (45), as well as social inequality in access to sanitation and health expenditure per capita. Other studies have shown that the following subpopulations have an increased risk of TB incidence: prisoners (46), migrants (47), health care workers (48), miners (4), and contacts of indexed TB cases (49). Other studies have found that other protective factors against TB incidence include the use of IPT (13, 50), cash transfers (51), as well as early TB detection and treatment initiation (52).

This study's strengths include the use of routinely collected data from a large number of patients and the inclusion of all health facilities providing electronic data for HIV care and treatment across the three regions. It has also assessed and adjusted for the differences between health facilities that might affect TB incidence. The study was limited in that only HIV positive patients who were identified to have a definitive diagnosis of TB, or to have started TB medications, during follow up were considered as TB incidence. The analysis has included all new cases of TB, including those with no microbiological TB confirmation. However, there could be an underestimation of new TB cases among HIV patients if there were no diagnostic tools available for presumptive TB cases (especially in lower level health facilities), or if there was a delay or gap between the diagnosis of TB and starting anti-TB medications. We recommend future studies on determining the mechanisms of nutritional effects on TB incidence and progression; the association between second line ARVs and TB incidence; and determining why the risk of TB remains higher even after improving the CD4 count of HIV patients on ART. We also recommend further analysis of the TB diagnosis cascade to see how effective the system is for TB diagnosis.

## CONCLUSION

Despite the prolonged decline of TB burden in most of the African countries, in the last decade TB is still a public health problem especially among HIV patients. Poor nutritional status and being enrolled at the CTCs after 2007 were significantly associated with TB incidence among HIV patients attending CTCs since 2003 in Tanzania. So there needs to be effective collaborative TB control strategies encompassing other diseases including HIV, and a continuing improvement of health system including the CTCs. Having a working functional status, a high CD4 count (above 350 cells/µl), and using ART were protective against TB incidence among HIV positive patients, though the use of second line ARVs was found to be a risk factor for developing TB. So all patients should be put on ARVs as soon as possible, and close monitoring of all patients with a CD4 count <350 cells/µl.

## DATA AVAILABILITY STATEMENT

Study's data can be accessed from EM after permission and approval from the NACP and the Government of Tanzania.

## AUTHOR CONTRIBUTIONS

EM, JT, MM, and SM designed the study and wrote the manuscript. EM, JT, and WM retrieved the data. EM and JT analyzed the data. All authors approved the final version of the manuscript.

## FUNDING

EM received funding through the SEARCH Project for his PhD studies under which this analysis was conducted. The SEARCH project was funded by the Bill and Melinda Gates Foundation (OPP1084472).

## ACKNOWLEDGMENTS

Thanks to all staff of Kilimanjaro Christian Medical University College and National AIDS Control Program for their inputs and support in making this work complete. EM is a PhD student, and this work is part of the PhD project that was funded by the SEARCH PROJECT (Sustainable Evaluation through Analysis of Routinely Collected HIV data), a collaboration between the London School of Hygiene and Tropical Medicine and the Ministry of Health in Tanzania, and is funded by Bill and Melinda Gates Foundation entitled Using routinely collected public facility data for program improvement in Tanzania, Malawi and Zambia (OPP1084472).

## REFERENCES


India. The Foundation for AIDS Research, as a part of the International Epidemiologic Databases to Evaluate AIDS. Boston, MA (2016).


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Mollel, Maokola, Todd, Msuya and Mahande. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Arts and Tools for Using Routine Health Data to Establish HIV High Burden Areas: The Pilot Case of KwaZulu-Natal South Africa

Njeri Wabiri <sup>1</sup> \*, Inbarani Naidoo<sup>1</sup> , Esther Mungai <sup>2</sup> , Candice Samuel <sup>3</sup> and Tryphinah Ngwenya<sup>2</sup>

*<sup>1</sup> Social Aspects of Public Health Research, Human Sciences Research Council, Pretoria, South Africa, <sup>2</sup> Kwa-Zulu Natal (KZN) Provincial Treasury Global Fund Supported Programme, Pietermaritzburg, South Africa, <sup>3</sup> KZN Provincial Department of Health-GIS Directorate, Pietermaritzburg, South Africa*

### Edited by:

*Jim Todd, University of London, United Kingdom*

### Reviewed by:

*David Gathara, KEMRI Wellcome Trust Research Programme, Kenya Joseph Ouma, University of the Witwatersrand, South Africa*

> \*Correspondence: *Njeri Wabiri nwabiri@hsrc.ac.za*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *15 July 2019* Accepted: *25 October 2019* Published: *12 November 2019*

### Citation:

*Wabiri N, Naidoo I, Mungai E, Samuel C and Ngwenya T (2019) The Arts and Tools for Using Routine Health Data to Establish HIV High Burden Areas: The Pilot Case of KwaZulu-Natal South Africa. Front. Public Health 7:335. doi: 10.3389/fpubh.2019.00335* Background: To optimally allocate limited health resources in responding to the HIV epidemic, South Africa has undertaken to generate local epidemiological profiles identifying high disease burden areas. Central to achieving this, is the need for readily available quality health data linked to both large and small geographic areas. South Africa has relied on national population-based surveys: the Household HIV Survey and the National Antenatal Sentinel HIV and Syphilis Prevalence Survey (ANC) amongst others for such data for informing policy decisions. However, these surveys are conducted approximately every 2 and 3 years creating a gap in data and evidence required for policy. At subnational levels, timely decisions are required with frequent course corrections in the interim. Routinely collected HIV testing data at public health facilities have the potential to provide this much needed information, as a proxy measure of HIV prevalence in the population, when survey data is not available. The South African District health information system (DHIS) contains aggregated routine health data from public health facilities which is used in this article.

Methods: Using spatial interpolation methods we combine three "types" of data: (1) 2015 gridded high-resolution population data, (2) age-structure data as defined in South Africa mid-year population estimates, 2015; and (3) georeferenced health facilities HIV-testing data from DHIS for individuals (15–49 years old) who tested in health care facilities in the district in 2015 to delineate high HIV disease burden areas using density surface of either HIV positivity and/or number of people living with HIV (PLHIV). For validation, we extracted interpolated values at the facility locations and compared with the real observed values calculating the residuals. Lower residuals means the Inverse Weighted Distance (IDW) interpolator provided reliable prediction at unknown locations. Results were adjusted to provincial published HIV estimates and aggregated to municipalities. Uncertainty measures map at municipalities is provided. Data on major cities and roads networks was only included for orientation and better visualization of the high burden areas.

**65**

Results: Results shows the HIV burden at local municipality level, with high disease burden in municipalities in eThekwini, iLembe and uMngundgudlovu; and around major cities and national routes.

Conclusion: The methods provide accurate estimates of the local HIV burden at the municipality level. Areas with high population density have high numbers of PLHIV. The analysis puts into the hand of decision makers a tool that they can use to generate evidence for HIV programming. The method allows decision makers to routinely update and use facility level data in understanding the local epidemic.

Keywords: routine health facility data, Africa, HIV, "Hotpots", Big Data, spatial interpolation

## INTRODUCTION

The HIV epidemic in South Africa is complex with diverse factors driving the epidemic regardless of spatial boundaries. The epidemic is also heterogeneously distributed in different geographic areas (1). The urgent need to better understand subnational variations in HIV epidemiology is key to programme planning. It is therefore important to conduct HIV epidemic appraisals across different geographic areas to help better characterize the drivers of the epidemic and ensure that HIV intervention programmes match the local epidemic context, with resources allocated to interventions that have the greatest impact locally. Central to achieving this, is the need for readily available quality health data linked to both large and small geographic areas. Over the years, South Africa has relied on national population based surveys such as the Household HIV Survey (2) and the National Antenatal Sentinel HIV and Syphilis Prevalence Survey (ANC) (3) amongst others, to provide data for informing policy decisions. However, these surveys are conducted approximately every 2 and 3 years creating a gap in data availability and evidence required for decisions at provincial, districts, and municipalities levels. At these administrative levels, timely decisions are required with frequent course corrections in the interim. Actions emanating from policy directives require information from various sources to be streamlined sometimes at a rapid pace to be used effectively (4). Hence, there is a need to use whatever health information is at hand, in the best possible way to inform such decisions. The South African District Health Information System (DHIS) contains aggregated routine health data from public health facilities, and can be used to close this gap. The DHIS was developed to collect aggregated routine data from all public health facilities, intended to support decentralized decision making and health service management (5). It is used in several other low and medium income countries (LMIC) (6). The DHIS data can be integrated with high resolution population data to, for example, generate estimates of HIV disease burden by estimating the number of PLHIV at any geographic level. Availability of such estimates at low geographic areas is a powerful tool for decision makers who need to prioritize allocation of limited health resources (4) and can also be used as case studies for ongoing epidemic monitoring. The study is a collaboration between the Human Sciences Research Council spatial analysts, the KwaZulu-Natal (KZN) Provincial Treasury Global Fund Supported Programme, and the KZN Department of health including decision makers at district and municipality levels.

The purpose of this study is to describe the methodology used to produce the estimates of HIV disease burden at a 100 m resolution to municipality and district level using routine facilities HIV testing data to support local decision making. We outline step-by-step approach for the decision makers to follow to produce estimates for guidance in decision making.

## MATERIALS AND METHODS

## Data

The study combine three types of data.

## Age-Structure Data

We obtained the age–structured data as defined in the South Africa Census 2011 and the Mid-year population estimates, 2015 from the Statistics South Africa (StaSSA) (7).

### DHIS HIV Data for 15–49 Years Old Clients'

The DHIS HIV data describes quality checked totals for confirmed HIV tests at public health facilities. The data used in the study consist of a total of 887 public health facilities including mobile unit services each with recorded geographic coordinates (longitude/latitude), from 51 local municipalities including the metros, obtained from the KZN Provincial Department of Health DHIS for the reporting period 2015/16. Twenty three facilities did not have positivity rates data and were excluded. The data for the 2015/16 reporting period was the most complete to undertake the spatial analysis, as at the time the DHIS team was updating the DHIS data collection forms for the following reporting years. The mobile facilities (183), though in some cases were located in close vicinity to the main hospitals or clinics, had their own unique 1st test cases and hence were treated as unique data points in the analysis. In terms of age, the DHIS data records three age categories, 0–14, 15–49, and 50 years and above, and data is not disaggregated by sex, which is one of the key limitations of the data. For this study the details of included data indicator, level of aggregation, data sources, and facilities inclusion criteria are provided as **Supplementary Material**. The HIV positivity indicator represents the proportion of clients 15–49 years on whom an HIV test was done who tested positive for the first time at public health facilities as aggregated annually in the DHIS. The following calculation was applied to generate HIV positivity at facility level:

$$\begin{aligned} & \quad HIV \; positive \; at \; facility \; (15-49) \\ &= \frac{HIV \; 1st \; test \; HIV \; positive \; (15-49)}{Total \; HIV \; 1st \; test \; (15-49)} \; ^\ast 100 \end{aligned}$$

The data does not include self-reported positives which is noted as a limitation in using HIV positivity rate as a proxy for prevalence.

## Gridded-High Resolution Population Data

For the population data we used the Worldpop gridded high resolution (100 m grid cell) population data from https://www. worldpop.org, WorldPop Data Repository (8). The gridded population data is based on well-tested models incorporating population density, land cover, and urban/rural disaggregation. In addition, the data are validated and calibrated using national census data. The WorldPop data has also been used widely in modeling disease burdens across the world including generation of the HIVE-Map model supported by UNAIDS (9). There is generally good agreement in results derived from different methods including the HIVE-Map, disaggregation of Spectrum projections, and small-area estimation. This gives us confidence in using the World population data in this study to generate a raster surface of the population aged 15–49 years at 100 m resolution.

## METHODS

All analyses were performed using the Arc-GIS10.0 Ver 2.18.24 (10) software which is also available in the KZN Department of Health data source platform making it easier for decision makers to use the tool with updated routine data. The analysis can also be done with open sources geographical information system (QGIS) (11). To establish the high HIV burden areas, the study provides a step by step approach that can be easily replicated by the local decision makers with updated data.

## Step 1

Combine 2015 South Africa high resolution (100 m) gridded population and 2015 South Africa mid-year population agestructure data estimates, to generate a raster surface of the number of people aged 15–49 years old in KZN at 100 m resolution (population15–49).

## Step 2

Use the health facilities HIV positivity rates among 15–49 years old to conduct hot spots analysis as a first step to identify areas that are statistically significant hotpots (i.e., locations of health facilities with a significantly high number of HIV positive cases) and vis-versa (cold spots) (12). Then apply the inverse weighted distance [IDW, (12)] interpolation method on health facility positivity rate data for 15–49 years olds individuals to generate unadjusted HIV positivity raster surface at 100 m resolution (positivity15to49\_unadjusted). The IDW is a deterministic technique; exact interpolator, which means theoretically we produced the exact value given at a sample point. The IDW approach generates a continuous smooth HIV positivity surface from point based data by calculating the parameter values at an unmeasured point using a distance-weighted average of data points. The IDW uses only the values of the known sample points to estimate unknown points of interest. For the study we used 100 data points and the squared distance to reduce the number of calculations making the IDW approach effective for reducing the amount of computation needed to produce an estimate. Selection of the 100 data points for estimation around unknown locations was done using the adaptive approach, which is more relevant in health in order to get a closer match of the spatial distribution of the population and thus reduce the smoothing of information. Also, an adaptive bandwidth of equal number of points makes it possible to achieve a smoothing effect that adapts to the high irregularity of spatial distribution among the facility locations, selecting the facility locations according to the observed population distribution. Surface generated is more accurate for densely populated areas (as more observations are available) and strongly smoothed in sparsely surveyed areas.

## Step 3

Use ArcGIS spatial analyst map algebra to combine the gridded surface map for 15–49 years population group (population15- 49) and HIV positivity surface map (positivity15to49\_unadjusted) to get the unweighted surface map of PLHIV (plhiv15- 49\_unadjusted), which is then proportionately rescaled to the Provincial and National South Africa's published HIV estimates to generate the adjusted surface of 15–49 years olds PLHIV at 100 m resolution. For local decision making, results are then aggregated to required administrative units (i.e., districts, municipalities) by adding pixels of the surfaces to find the total number of people PLHIV per unit. Aggregate total population per unit can also similarly be calculated from the population surface map. Further, dividing PLHIV per unit by total population per unit provides adjusted HIV positivity surface per unit for the 15–49 years old population group. We also estimated an error surface per administrative unit to show the quality of the estimate. To assess uncertainty of the estimates at administrative unit, we adopted Larmarange and Bendaud approach (13), and for each administrative unit compared the number of observations(obs) in each unit and the number of points (N) used in the spatial estimation of the positivity surface, and define estimates as "uncertain" if 0 < obs < N/2 (estimates are mostly based on observations from neighboring units), "moderately good" if N/2 < obs < N (estimates are partially based on observations from the same unit), and "good" obs is at least N (estimates are based on observations from the same unit).

Other methods that have been used to generate local level estimates include kernel density estimation with adaptive bandwidths, Bayesian modeling and small area estimation (14–19). All these methods have been mostly applied using national household HIV surveys and the Antenatal Care Sentinel Surveillance datasets rather than routinely corrected health facility data. Our study demonstrates the use of routine data and spatial interpolation methods in estimating high disease burden areas and can be rolled out to other regions in South Africa and also to other LMIC countries with available routine health data.

## RESULTS

**Figure 1**, left shows the "hotpots" with high HIV positivity rates surrounded by other facilities with high values.

FIGURE 1 | Estimation of HIV disease burden based on positivity rates data DHIS 2015/16. Left-HIV positivity at health facilities, 2015/16; Right-population unweighted HIV positivity surface at 100 m resolution.

The cold spots (green) have low HIV positivity and are surrounded by other facilities with low values. This map identifies locations of health facilities with a significantly high number of HIV positive cases. The "hotspots" are clearly identified in eThekwini, uMgungundlovu, iLembe, uThungulu and uMkhanyakude. From the map of interpolated HIV positivity surface (**Figure 1**, right), the main "hotspots" areas are in eThekwini, iLembe and uMngundgudlovu. Overlaying the major cities and national routes show high burden areas (hotspots) around major cities and routes (**Figure 1**, right).

An additional approach of estimating the disease burden is the surface map of the number of PLHIV at grid level (**Figure 2A**) and estimates of number of PLHIV aggregated at selected administrative unit (Municipality, **Figure 2B**) for decision making Based on these maps, they major hotspots (red) are eThekwini municipality and a municipality in uMgungundlovu identified as carrying the highest HIV disease burden with high number of PLHIV. The rest of municipalities are yellow-green. Due to the effect of the area population size, areas with high population, mostly the major urban areas, have high absolute numbers of PLHIV, over areas with low population size, even when they both have same HIV positivity rates. All these maps shows complementary pictures of the burden of HIV in KwaZulu Natal.

**Figure 3** shows estimates of measure of uncertainty at municipality level, with estimates being uncertain in ∼16 out of 51 municipalities. These were mostly the areas where estimates were based on samples data or observations from neighboring areas.

## DISCUSSION

In a constrained funding environment, identifying areas of high disease burden allows decision-makers to target resources for the greatest impact. We provide a step-by-step approach that can allow local decision makers to use routinely updated facilities data autonomously to reproduce estimates at subnational levels to guide efficient allocation of resources. The study results have identified municipalities with high HIV disease burden using public health routine facility data for 15–49 years old. The high burden areas are around the major urban centres including municipalities in UMgungundlovu, eThekwini; places near major road networks; and along the coastal belt, observed by overlaying layers of road networks and cities. The findings parallel those of recent studies (1, 20, 21) that have provide subnational HIV prevalence estimates, but using survey data. Wanyeki et al. (20) study used routine facility-level Prevention of Mother to Child Transmission (PMTCT) data to indicate high burden areas at district level, showing that the HIV burden is concentrated in main urban centers similar to findings in this study. Dwyer-Lindgren et al. (1) study explored within-country variation at a 5 × 5-km resolution revealing substantial within-country variation in the prevalence of HIV among adults (aged 15–49 years) across sub-Saharan Africa 2000–2017, and similarly Gutreuter et al. (21), using Antenatal surveillance survey (ANC) as a covariate provided substantially improved precision in many district-level estimates of HIV prevalence in the general population using national survey data.

Our study shows different but complementary pattern of disease burden based on HIV positivity surface and the PLHIV surface map which can be attributed to large effect of area population size on numbers of PLHIV. Gutreuter made similar observations on his work on district prevalence estimates. The concentration of population in urban centers including eThekwini means a high number of PLHIV around the urban centers, with lesser number of PLHIV in the rural and remote areas. Accessibility also mean we have a high number of people using facilities selected areas near major access routes. Similarly, a study by Tanser et al. (19) showed that HIV is localized in areas neighboring major routes. All these studies used seroprevalence and/or ANC to map HIV prevalence at subnational levels. The current study focused on modeling and mapping geographical areas with high HIV disease burden using routine facility data. With ability to delineate high disease burden areas, with readily available tables for different administrative levels, the local decision makers can then generate the associated risk profiles to guide decision making.

Few peer reviewed research articles have used routine facility level data to model the disease burden for decision making. The method applied in this study is readily applicable in other settings including other low and medium income countries (LMIC), but will involve working together with the data custodians in those countries. Working with data custodians will also help to improve the data systems by identifying data gaps and improving on tools for data collection providing rich data for better analysis and local planning. The analysis did not take into account contribution of associated risks including male urethral syndrome (MUS), other sexually transmitted infections, and teenage pregnancies. Further research will entail application of spatial multi-criteria decision making (22) approaches for incorporating risk factors in a bid to further define potential high-risk areas. It is also important to explore data on HIV service coverage as tremendous value can be derived from linking health facility data to community research datasets to generate population-level estimates of coverage with HIV services, which is at the heart of the South Africa National Strategic Plan (NSP) strategy to "focus for impact" (23).

## STUDY LIMITATIONS

There are limits to using available routine health data for spatial modeling. First, data on HIV testing are routinely collected in public health care facilities using paper-based registers corresponding to distinct HIV care and treatment service spectrum. The data are aggregated per health facility and fed DHIS. The manual data entry has the potential of introducing data capture errors in the system. A role out of web-based systems of data capture entry, currently underway in some regions will help eliminate some of these problems as the users can directly capture the information at the source. But this will also mean availability of good internet connectivity.

Second, routine data has the inherent limitation with respect to maintaining and accurately recording unique identifiers that can link patients across the different facilities. This also poses challenges when compiling aggregate data due to possible double counting for patients who visit multiple facilities. For this study, we only used first test cases to avoid double counting of individuals. An improvement to the overall DHIS system should include the use unique identification to help track individuals across the health facilities making them easily identifiable, while maintaining individuals' confidentiality. In addition, the fact that individuals tested at health facilities self-select, means that those who do not access services at public health facilities are not part of the analysis.

Third, the routine health facility testing data does not include self-report cases, which despite the bias associated with self-reported measurements can be used to complement the reported HIV positivity rates as a good proxy measure of HIV prevalence (23). The routine health facility indicators data age breakdown is limited to only three age categories, 0–14, 15–49, and 50 plus years; does not capture sexual risk data; and sex and gender breakdown data. This means for example, no key population specific data can be segregated from available routine datasets. This is a serious limitation in available routine data sets. Excluded from the analysis include data from prisons and data on preferences for service utilization since they are not part of the routinely collected data. We also focused on all adults 15–49 years excluding antenatal data.

Fourth, in terms of the spatial interpolation approach, assumptions made including (i) taking the age-structure of population to be the same across all administrative units (i.e., the spatial distribution of individuals 15–49 years is the same as the spatial distribution of the overall population as estimated by Worldpop); (ii) assuming available data are up to date and data quality rules are applied uniformly and consistently across the health facilities; (iii) assuming all populations have equal HIV risk, which is a concern as this could mask key and vulnerable populations; and finally (iv) assuming that neighborhood areas tend to have similar HIV positivity [Tobler's first law (TFL) of geography (24)] also need to be noted. Additionally, because the mobile clinic data is geographically mapped to the same coordinate points of its parent facility (point where mobile is working from) and not mapped according to service delivery routes, this is likely to skew the outcome when data is projected geospatially.

The study is based on one time point and future analysis should include more time periods to establish trends.

## DATA AVAILABILITY STATEMENT

The datasets analyzed for this study are from the Health District Health Information system owned by the Kwa-Zulu-Natal department. Data can be obtained by signing a Data user agreement form with the Department of Health that stipulates that the use of datasets in research communication, scholarly papers, journals and the like is encouraged with acknowledgment of the Kwa-Zulu Department of Health as the data source.

## ETHICS STATEMENT

The study received full ethics approval from the UMgungundlovu Health Ethics Research Board (Ref: UHERB 003/2019) which is registered with the South Africa National Health Research Ethics Council (NHREC) under reference REC-051010-026.

## AUTHOR CONTRIBUTIONS

NW and IN conceptualized the validation methods, and wrote the first draft. EM, CS, and TN provided access to the data. NW incorporated all comments and generated final draft. All authors reviewed and approved the final manuscript for submission.

## FUNDING

This study was supported by the KZN Global Fund Supported Programme (KZN-GFSP), KZN provincial Treasury under grant name: ZAF-C-KZN, Grant No. 1028.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00335/full#supplementary-material

## REFERENCES


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Wabiri, Naidoo, Mungai, Samuel and Ngwenya. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# HIV Disease Progression Among Antiretroviral Therapy Patients in Zimbabwe: A Multistate Markov Model

Zvifadzo Matsena Zingoni 1,2 \*, Tobias F. Chirwa<sup>1</sup> , Jim Todd<sup>3</sup> and Eustasius Musenge<sup>1</sup>

*<sup>1</sup> Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa, <sup>2</sup> Ministry of Health and Child Care, National Institute of Health Research, Harare, Zimbabwe, <sup>3</sup> Department of Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom*

### Edited by:

*Samuel Manda, South African Medical Research Council, South Africa*

### Reviewed by:

*Birhanu Ayele, Stellenbosch University, South Africa Patrick Musonda, University of Zambia, Zambia Lawrence Kazembe, University of Namibia, Namibia*

> \*Correspondence: *Zvifadzo Matsena Zingoni zmatsena28@gmail.com*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *10 May 2019* Accepted: *23 October 2019* Published: *15 November 2019*

### Citation:

*Matsena Zingoni Z, Chirwa TF, Todd J and Musenge E (2019) HIV Disease Progression Among Antiretroviral Therapy Patients in Zimbabwe: A Multistate Markov Model. Front. Public Health 7:326. doi: 10.3389/fpubh.2019.00326* Background: Antiretroviral therapy (ART) impact has prolonged survival of people living with HIV. We evaluated HIV disease progression among ART patients using routinely collected patient-level data between 2004 and 2017 in Zimbabwe.

Methods: We partitioned HIV disease progression into four transient CD4 cell counts states: state 1 (CD4 ≥ 500 cells/µl), state 2 (350 cells/µl ≤ CD4 < 500 cells/µl), state 3 (200 cells/µl ≤ CD4 < 350 cells/µl), state 4 (CD4 < 200 cells/µl), and the absorbing state death (state 5). We proposed a semiparametric time-homogenous multistate Markov model to estimate bidirectional transition rates. Covariate effects (age, gender, ART initiation period, and health facility level) on the transition rates were assessed.

Results: We analyzed 204,289 clinic visits by 63,422 patients. There were 24,325 (38.4%) patients in state 4 (CD4 < 200) at ART initiation, and 7,995 (12.6%) deaths occurred by December 2017. The overall mortality rate was 3.9 per 100 person-years. The highest mortality rate of 5.7 per 100 person-years (4,541 deaths) was from state 4 (CD4 < 200) compared to other states. Mortality rates decreased with increase in time since ART initiation. Health facility type was the strongest predictor for immune recovery. Provincial or central hospital patients showed a diminishing dose–response effect on immune recovery by state from a hazard ratio (HR) of 8.30 [95% confidence interval (95% CI), 6.64–10.36] (state 4 to 3) to HR of 3.12 (95% CI, 2.54–4.36) (state 2 to 1) compared to primary healthcare facilities. Immune system for male patients was more likely to deteriorate, and they had a 32% increased mortality risk (HR, 1.32; 95% CI, 1.23–1.42) compared to female patients. Elderly patients (45+ years) were more likely to immune deteriorate compared to 25–34 years age group: HR, 1.35; 95% CI, 1.18–1.54; HR, 1.56; 95% CI, 1.34–1.81 and HR, 1.53; 95% CI, 1.32–1.79 for states 1 to 2, state 2 to 3, and states 3 to 4, respectively.

Conclusion: Immune recovery was pronounced among provincial or central hospitals. Male patients with lower CD4 cell counts were at a higher risk of immune deterioration and mortality, while elderly patients were more likely to immune deteriorate. Early therapeutic interventions when the immune system is relatively stable across gender and age may contain mortality and increase survival outcomes. Interventions which strengthen ART services in primary healthcare facilities are essential.

Keywords: antiretroviral therapy, disease progression, human immunodeficiency virus, mortality, multistate Markov models, Zimbabwe

## INTRODUCTION

Over the last 15 years, remarkable strides have been made to tackle the human immunodeficiency virus (HIV) pandemic globally. The Sub-Saharan Africa (SSA) region is disproportionately affected by the pandemic, accounting for more than 50% of people living with HIV (PLHIV) (1, 2). Antiretroviral therapy (ART) treatment remains the backbone of HIV treatment and prevention. Globally, it was estimated that 59% of PLHIV were receiving ART in 2017 (2).

Zimbabwe is one of the countries in SSA affected by HIV infection. The country had an estimated 1.3 million PLHIV and an adult prevalence of 13.3% in 2017 (3). The country's ART coverage was estimated at 84% for adult patients in the same year (3). There has been a reduction in the number of new HIV infections and HIV-related deaths between 2010 and 2016 (4), and this can be attributed to ART as the main driver. ART drugs help boost the immune system of the PLHIV (5), which leads to viral load suppression, and an increase in CD4 cell counts. Both CD4 cell counts and viral load are key prognostic markers in measuring HIV disease progression (6). The World Health Organization (WHO) recommends the use of viral load in monitoring HIV disease progression among ART patients. Viral load suppression has been incorporated as one of the ultimate indicators in the UNAIDS 90-90-90 fast track targets (7). However, over the years, CD4 cell counts have been extensively used as a marker for HIV disease progression.

Disease progression and immune recovery can be evaluated using either time homogenous or time inhomogenous semiparametric multistate Markov models using CD4 cell counts (8). Application of these models in the assessment of HIV progression has been used in the past decades (9), and many studies have recently employed them (9–14). The use of CD4 cell counts as a prognostic marker for HIV disease progression has been well-documented (11, 12, 15–17). However, across studies, there is variation in terms of the number of discrete multistate model states, the cutoff points defining each state, the type of transitions which can either be reversible or irreversible, and the number of transitions to be estimated.

In this new era of "test and treat all" regardless of CD4 cell counts, HIV patients are initiated on ART as soon as they are tested positive. However, this does not rule out the possibility of having patients who present late for HIV diagnosis with an advanced immune deterioration. This put forward the importance of understanding the HIV disease progression across all possible disease states since patients are initiated on ART with different immune stages. Once the HIV-infected patients are initiated on ART, they are still exposed to difference factors which may still affect their ART adherences. Therefore, it is important to understand the different trajectories that patients follow in HIV disease progression to inform policy makers on possible interventions to be carried out and encourage the patients on the need to adhere on ART for their own improved health outcomes, all in the quest to achieve zero HIV incidences by 2030 (18).

Zimbabwe adopted the WHO recommendation on the decentralization of ART services from higher levels of care to primary healthcare (PHC) facilities to increase ART coverage, access and uptake to those in need, and increase ART patient retention. This approach resulted in lessening the work burden in the higher levels of care (19) through task shifting of HIV management and ART service cascading down to PHC facilities (20–23). As a result, the ART sites in Zimbabwe increased from 282 in 2008 to 1,556 in 2017 (3). However, in primary health care, patient turnaround time is increased, there is lack of resources and skilled personnel, which may compromise the quality of service delivery; consequently, ART outcomes are compromised. Therefore, there is a gap to understand HIV progression patterns among ART patients after ART decentralization since the health facility type that a patient is enrolled in may influence their progression or recovery patterns.

This study aims to describe HIV disease progression and immune recovery implementing the multistate model approach based on CD4 cell counts intermediate states among adult patients on ART in Zimbabwe using patient-level data adjusting for the health facility type. The multistate model provides an in-depth understanding on the general immune deterioration (decrease in CD4 cell count) patterns, immune recovery (increase in CD4 cell count) patterns, and death outcome. Unique to this study is the inclusion of the health facility type in the analysis to account for ART services decentralization effect on transition rates.

## MATERIALS AND METHODS

The study was carried out in Zimbabwe, a country with eight provinces and two metropolitans. The country is landlocked bordered by South Africa, Botswana, Mozambique, and Zambia. We conducted a retrospective analysis of cohort data from a sample of PLHIV receiving ART under the Zimbabwe national ART program. We used individual records from 538 health facilities linked to the electronic patient management system (ePMS) (3). From patients attending these health facilities, all routine clinic visits with CD4 count data were used from 1st January 2004 to 31st December 2017 in this analysis.

We included patients aged 15 years and above at ART initiation (baseline) with complete ART initiation dates, gender, and subsequent follow-up information from the dataset. We excluded patients with no information on CD4 cell counts and patients with baseline CD4 measurements only. We also excluded patients who were classified as lost to follow-up or who transferred to other health facilities to reduce the complexity of the multistate model. The patients who were alive at the end of the study were right censored at their last clinic visit before 31st December 2017.

We extracted demographic characteristics such as age (15– 24, 25–34, 35–44, and 45+ years), gender, and education level (none, primary, secondary, tertiary). We also extracted data on health facility type (primary health care, district, provincial, or central hospitals) and time of ART initiation (2004–2007, 2008–2012, and 2013–2017). Clinical characteristics included for analysis were regimen type (first line, second line), WHO clinical staging (WHO I/II, WHO III/IV), tuberculosis status (negative, positive, not assessed) from the routine monitoring records of each visit to the clinic by the patients. HIV disease progression was defined using the WHO-based CD4 cell counts bands of HIV-related immunodeficiency: the no significant immunodeficiency (CD4 ≥ 500 cells/µl) as state 1, the mild immunodeficiency (350 cells/µl ≤ CD4 < 500 cells/µl) as state 2, the advanced immunodeficiency (200 cells/µl ≤ CD4 < 350 cells/µl) as state 3, the severe immunodeficiency (CD4 < 200 cells/µl) as state 4, and the absorbing state death as state 5.

## Statistical Analysis

The patient's retrieved data were cleaned and managed in Stata 15.1 (24). All the preliminary analyses were conducted in Stata software. After data argumentation, the five-staged semiparametric time homogenous multistate Markov model was fitted in R software (25) using the msm package. We fitted a model with reversible transitions (26); therefore, states 1–4 were transient states, while state 5 was non-transient as depicted in **Figure 1**.

The fitted model was adjusted for demographic factors (sex, health facility type, and ART initiation period). The semiparametric time homogenous multistate Markov estimated transition intensities (transition rates or hazard rates), transition probabilities (survival function) between the defined CD4 cell count states, mean sojourn time, and the total length of stay in states before making any transitions. Time-varying mortality rates were estimated using time inhomogenous model, which assumes that the transitions change with time, and this reflect the reality in infectious disease progression models; hence, this is normally the preferred model. These models usually assume the Markovian process that the transition intensity depends only on the current time and state occupied, i.e., it is independent of the previous transitions. In other terms, these models were assumed to have "memory loss." We used the markovchain library in R to test if the Markov assumption is satisfied. The null hypothesis of this test is that the Markov property holds. We randomly selected patients' sequences to be tested and we obtained p > 0.05; therefore, we failed to reject the null hypothesis that the sequences are Markovian.

## The Multistate Markov Model

A multistate process is a stochastic process [X(t), t ∈ T] with finite state spaceS = {1, 2, 3, 4, 5} where T = [0, τ ] τ < ∞ is the period of observation (27). These models can either be discrete-time Markov chains (transitions occur at fixed points in time) or continuous-time Markov chains (transitions occur at any point in time) (28). For a continuous time Markovian process, the transition intensity (instantaneous incidence rate), λjk(t), of a patient from state X(t) = j at time t to state k at time

t + δt is defined as:

$$\begin{split} \left. \lambda\_{jk}(t) \right| &= \left. \frac{d}{dt} \left. p\_{jk} \right|\_{t=0} = \lim\_{\delta t \to 0} \frac{p\_{jk}(t, t + \delta t)}{\delta t} \\ &= \lim\_{\delta t \to 0} \frac{p\_{jk}(X(t + \delta t) = k | X(t) = j)}{\delta t} \end{split} \tag{1}$$

where pjk is the probability from state j to k, δt is the change in time. For example, in our case, the transition intensities in Equation (1) form the (j, k) entry of the transition rate matrix, denoted by Q(t):

$$Q(t) = \begin{pmatrix} -\lambda\_{1\bullet} & \lambda\_{12} & \lambda\_{13} & \lambda\_{14} & \lambda\_{15} \\ \lambda\_{21} & -\lambda\_{2\bullet} & \lambda\_{23} & \lambda\_{24} & \lambda\_{25} \\ \lambda\_{31} & \lambda\_{32} & -\lambda\_{3\bullet} & \lambda\_{34} & \lambda\_{35} \\ \lambda\_{41} & \lambda\_{42} & \lambda\_{43} & -\lambda\_{4\bullet} & \lambda\_{45} \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix}$$

whose rows sum to 0, that isP k∈S λjk = 0 for all j, and the diagonal entries (interpreted as changes in transition probability) are defined by conversion as λjj(t) = λj = − P j6=k λjk(t) for all j ∈ S.

These transition intensities under the Markov process can be calculated as the product of the flow rate µ<sup>j</sup> and the conditional probability of a transition to statek, given that a transition is madej 6= k(ρjk). From the Q(t) values, we can calculate the probability that the next state after state j is statek, for each j and k calculated as pjk = −λjk/λj . Once the transition intensity matrix is obtained, the transition probability matrix can be obtained using the Chapman–Kolmogorov forward differential equations. The detailed explanation is provided in **Appendix**. The probability matrix can be computed from the estimated transition intensities using P(t) = exp [Q(t)] where [P(t)] is the transition probability matrix defined as:

$$P(t) = \exp\left[Q\left(t\right)\right] = \begin{pmatrix} \pi\_{11} & \pi\_{12} & \pi\_{14} & \pi\_{14} & \pi\_{15} \\ \pi\_{21} & \pi\_{22} & \pi\_{23} & \pi\_{24} & \pi\_{25} \\ \pi\_{31} & \pi\_{32} & \pi\_{33} & \pi\_{34} & \pi\_{35} \\ \pi\_{41} & \pi\_{42} & \pi\_{43} & \pi\_{44} & \pi\_{45} \\ 0 & 0 & 0 & 0 & 1 \end{pmatrix}$$

The probability πjk that a patient in state j at time t will be in state k at time t + δt is given by:

$$
\pi\_{jk}(s, t) = P[X(t + \delta t) = k | X(s) = j) \tag{2}
$$

wheres, t ∈ T, and s ≤ t. These transition probabilities satisfy the following conditions:

$$\begin{aligned} \text{(i)} \pi\_{jk}(t+s) &= \sum\_{r \in \mathcal{S}} \pi\_{jr}(t) \,\pi\_{rk}(s) \text{ for all} \, t \ge 0, s \ge 0 \text{ and } j, k \in \mathcal{S};\\ \text{(ii)} \sum\_{k \in \mathcal{S}} \pi\_{jk}(t) &= 1 \text{ for all } j \in \mathcal{S} \text{ and } t \ge 0 \text{ and} \\ \text{(iii)} \quad \text{(v)} \quad \text{a } \varepsilon &= 0 \text{ with } 1 \le \varepsilon \le 1 \text{ and } \varepsilon \le \varepsilon \end{aligned}$$

(iii) πjk(t) ≥ 0 for all j, k ∈ S and t ≥ 0.

The maximum likelihood procedures (8, 29) can be used to estimate these transition intensities as a product of probabilities of transition between observed states, overall individuals i = 1, 2, .., M and observation times rwhich are observed n times, as shown below:

$$L(Q) = \prod\_{i=1}^{M} \prod\_{r=1}^{n\_i - 1} L\_{i,j} = \prod\_{i,r} \pi \left[ s(t\_{ir})s(t\_{i,r+1}) \left( t\_{i,r+1} - t\_{ir} \right) \right] \tag{3}$$

Each component Li,<sup>r</sup> is the entry of the transition probability matrix and the s(tir) th row and the s(ti,r+1) th column, evaluated at a pair of consecutive observed state at timest<sup>r</sup> andtr+1. This likelihood function, L(Q), is maximum in terms of log(λjk)to compute the estimates ofλjk, using the standard optimization algorithms which make use of the derivatives of the likelihood. This likelihood assumes that the sampling times are ignorable (non-informative).

## The Total Length of Stay and Mean Sojourn Time

The mean sojourn time is defined as the mean expected holding time or the average time a patient spends in each state in a single stay before making any transition to other states. The average length of stay in a single state before making any transitions to either lower or higher CD4 cell count states is estimated by a negative inverse of the j th diagonal entry of Q(t), that is (−1/λjj). The total length of stay, L<sup>k</sup> , in each of the four states excluding death is defined as the anticipated exposure time spent by an individual in each state during the study period before death. This time is estimated as time spent in state k between two successive time points (t1, t2) given by:

$$L\_k = \int\_{t1}^{t2} P\_{jk}\left(t\right) dt\tag{4}$$

where j is the initial state which usually is equal to one and is useful in the presence of reversible transitions.

## Semiparametric Regression Model

To adjust for the effects of the covariates on the transition rates, we proposed a semiparametric Cox proportional hazard regression model. The transition rates depend on the covariates vector matrix **Z**, that is,

$$
\lambda\_{jk}[t|\mathbf{Z}(t)] = \lambda\_{jk0} \exp[\boldsymbol{\beta}\_{jk}^T \mathbf{Z}(t)] \tag{5}
$$

where βjk = βjk<sup>1</sup> , βjk<sup>2</sup> , ..., <sup>β</sup>jkz<sup>T</sup> is a vector of the regression coefficients associated with vector **Z** (t) for the transition from state j to state k. The baseline hazard function is denoted byλjk<sup>0</sup> . In this study, we assumed time-independent covariates. Parameter estimation was based on the maximization of the hazard function (the transitional intensities). We fitted eight models in total [starting with a no covariates (unadjusted) model, followed by four univariate models and three with at least two covariates]. The additional covariates after the univariate models were added sequentially and only covariates without missing information were considered in the adjusted model.

## Model Diagnostics

Selection of model of best fit with covariates was performed using a likelihood ratio test define as −2 ln(Ls( ∧ θ )/L<sup>g</sup> ( ∧ θ ), where Ls( ∧ θ ) is the likelihood of the reduced (no covariate) model L<sup>g</sup> ( ∧ θ ) and is the likelihood of the full (with covariates) model, which follows a chi-square distribution with n degrees of freedom. Significance was set at 5% level of significance. The aim was to get a parsimonious model that explains best the model.

## Ethical Considerations

We used data with no personal identification; however, we used the individual unique identifier for the analysis. We sort permission to use the dataset from the Ministry of Health and Child Care, Zimbabwe, and this study was granted ethical approval by the University of Witwatersrand's Human Research Ethics Committee (Medical) (Clearance Certificate No. M170673).

## RESULTS

## Descriptive Characteristics of Patients and Total Transitions Observed

From the 538 clinics, a total of 390,771 patients were seen between 1st January 2004 and 31st December 2017. Of these total patients, we excluded 197,618 (50.6%) patients with no CD4 cell counts and 129,731 (33.2%) patients with one CD4 cell count measurement. The remaining 63,422 patients of whom 65.4% were female contributed 205,711 years of total analysis time at risk and under observation from 491 health facilities form part of the analysis. The descriptive characteristics are shown in **Table 1**. Most patients were enrolled in district or mission hospitals (45.7%) and from facilities in the rural areas (74.7%). There was an overwhelming significant difference in the baseline characteristics by CD4 count states in this cohort, p < 0.05. The median follow-up time was 2.63 [interquartile range (IQR), 1.14– 4.94] years, median duration between visits was 0.63 (IQR, 0.25– 1.88) years, and the median number of visit was 3 (IQR, 2–4) visits. Most patients were classified in WHO clinical stage III/IV (58.6%, n = 36,626).

## Observed Transitions Between States

As displayed in **Table 2**, the 63,422 patients contributed 140,867 transitions between the follow-up period of which 12.6% (n = 7,995) were mortalities. The highest contribution of the observed transitions of 114,561 (81.3%) came from those patients who remained in the same state over time without making any transition to other states. At baseline, majority of the patients were in state 4 (CD4 < 200) (38.4%, n = 24,325) and state 3 (200 ≤ CD4 < 350) (29.1%, n = 18,437). Similarly, this was the picture at the end of the study; however, relative to baseline numbers, there was a non-significant decline in the total number of patients in state 3 (200 ≤ CD4 < 350) (p = 0.2621), while a significant decline was observed in state 4 (p = 0.0478). Majority of the deaths at the end of the study came from state 4 (CD4 < 200) and state 3 [200 ≤ CD4 < 350], which accounted for 27.6% (n = 2,208) and 56.8% (n = 4,541), respectively.

Immune recovery is observed when a patient makes a transition from lower CD4 cell counts states to higher CD4 cell counts states (particularly 350 ≤ CD4 < 500 state to CD4 ≥ 500 state, 200 ≤ CD4 < 350 state to 350 ≤ CD4 < 500 state and CD4 < 200 state 4 to 200 ≤ CD4 < 350 state), while immune deterioration is experienced if a patient makes a transition from higher CD4 cell count states to lower CD4 cell count states (particularly CD4 ≥ 500 state to 350 ≤ CD4 < 500 state, 350 ≤ CD4 < 500 state to 200 ≤ CD4 < 350 state, and 200 ≤ CD4 < 350 state to CD4 < 200 state). There were more transitions (n = 8,031) from lower CD4 cell counts states to higher CD4 cell counts states (state 2 to 1 = 2,493, state 3 to 2 = 2,606, and state 4 to 3 = 2,932) as compared to higher CD4 cell counts states to lower CD4 cell counts states transitions of the corresponding reversible transitions (n = 5,425). This result is an indication of immune recovery in this cohort.

## Time Homogenous Transition Rates and Probabilities

The transition rates and probabilities were estimated using the time-homogenous multistate Markov model incorporating the semiparametric Cox survival function, and results are displayed in **Table 3**. Generally, there were higher transition rates from lower CD4 cell count states to lower CD4 cell counts states compared to the reversible corresponding transitions. Results show that moving from state 2 (350 ≤ CD4 < 500) to state 1 (CD4 ≥ 500) was 1.49 (0.16085/0.10783) times more likely than moving from state 1 (CD4 ≥ 500) to 2 (350 ≤ CD4 < 500); hence, a high probability of immune recovery. Patients in state 2 (350 ≤ CD4 < 500) were 1.38 (0.11264/0.08188) times more likely to move to state 3 (200 ≤ CD4 < 350) compared to moving from state 3 (200 ≤ CD4 < 350) to state 2 (350 ≤ CD4 < 500). This finding was a clear indication of immune deterioration between the two states. Transition rate from state 4 (CD4 < 200) to state 3 (200 ≤ CD4 < 350) was 1.02 (0.05261/0.05147) times more likely compared to the transition from state 3 (200 ≤ CD4 < 350) to state 4 (CD4 < 200) indicating immune recovery from state 4 (CD4 < 200), but this was not statistically significant.

We estimated the probabilities for which state is next after the currently occupied state. The results show that an individual in state 1 (CD4 ≥ 500) had a probability of 41.2% to move to state 2 (350 ≤ CD4 < 500); an individual in state 2 (350 ≤ CD4 < 500) had a 45.7% probability to move to state 1 (CD4 ≥ 500); an individual in state 3 (200 ≤ CD4 < 350) had 35.3% probability to move to state 1 (CD4 ≥ 500); and an individual in state 4 (CD4 < 200) had 28.1% probability of death. The cumulative probability of moving from higher CD4 cell counts states to lower CD4 cell counts states increased over time. The probability of moving from state 1 (CD4 ≥ 500) to state 2 (350 ≤ CD4 < 500) changed from 8.3% at 1 year to 16.2% at 6 years; state 2 (350 ≤ CD4 < 500) to state 3 (200 ≤ CD4 < 350) transition changed from 9.2% at 1 year to 20.2% at 6 years and state 3 (200 ≤ CD4 < 350) to state 4 (CD4 < 200) transition changed from 4.7% at 1 year to 15.3% at 6 years. Similarly, the probabilities of moving TABLE 1 | Sociodemographic and clinical baseline characteristics at antiretroviral therapy (ART) initiation of all study participants from the Zimbabwe national ART program, 2004–2017.


*(Continued)*

### TABLE 1 | Continued


\**The denominator of the proportions for this variable is33,857.*

\*\**The denominator of the proportions for this variable is61,334.*

\*\*\**The denominator of the proportions for this variable is62,982.*

\*\*\*\**The denominator of the proportions for this variable is62,616.*

\*\*\*\*\**The denominator of the proportions for this variable is62,517.*

TABLE 2 | Number of the total observed patients' transitions between the five states, the total number of patients at antiretroviral therapy (ART) initiation ("beginning state") and the total number of patients at 31st December 2017 (and the "end state") among ART patients in Zimbabwe national ART from 2004 to 2017.


from lower CD4 cell counts states to higher CD4 cell counts states increased over time. The probability of moving from state 2 (350 ≤ CD4 < 500) to state 1 (CD4 ≥ 500) changed from 13.6% at 1 year to 26.2% at 6 years; state 3 (200 ≤ CD4 < 350) to state 2 (350 ≤ CD4 < 500) transition changed from 6.8% at 1 year to 14.7% at 6 years and state 4 (CD4 < 200) to state 3 (200 ≤ CD4 < 350) transition changed from 4.5% at 1 year to 14.5% at 6 years.

## Time Inhomogenous Mortality Rates

The transition rates for mortality were also estimated, and results are shown in **Table 4**. The overall mortality rate in this cohort was 3.9 (95% CI, 3.8–4.0) per 100 person-year. Stratifying by the CD4 states, the mortality rates per 100 person-years increased with a decrease in CD4 cell counts: state 1 (CD4 ≥ 500) (rate = 1.8; 95% CI, 1.1–2.1), state 2 (350 ≤ CD4 < 500) (rate = 2.7; 95% CI, 2.4–3.1), state 3 (200 ≤ CD4 < 350) (rate = 3.3; 95% CI, 3.1–3.8), and state 4 (CD4 < 200) (rate = 5.9; 95% CI, 5.7– 6.1). Hence, the mortality burden was highest in state 4 (CD4 < 200) compared to other states, and these mortality rates were significantly different (log rank test p < 0.001). The Kaplan– Meier curve further confirmed the survival probabilities of this cohort stratified by state, that mortality risk increases with a decrease in CD4 cell count (**Figure 2**). However, the fundamental difference was between the mortality in state 3 (200 ≤ CD4 < 350) and state 4 (CD4 < 200) vs. the mortality in state 1 (CD4 ≥ 500), and state 2 (350 ≤ CD4 < 500).

In general, the time-varying mortality rates decrease with an increase in time since ART. The cohort experienced high


TABLE 3 | Estimates of transition rates (intensities) per person-years and probability matrices and 95% confidence intervals (CI) for the time-homogenous multistate Markov model among antiretroviral therapy (ART) patients in Zimbabwe national ART from 2004 to 2017.

mortality rates in the first year of ART initiation averaging at 3.5 (95% CI, 3.4–3.7) per 100 person-years. There was a sharp drop (seven-fold) in mortality rate from first to the second year [hazard ratio (HR) = 6.95(0.3512/0.0505); 95% CI, 6.78–7.14]. Gradually, the mortality rates further decrease over time by the end of the follow-up period. Mortality patterns across states followed this similar trend to the overall pattern. In the first 3 years, mortality rates had an inverse relationship with the CD4 cell counts, and there was an overwhelming difference in these rates between the states.

This study forecasted the total length spent in each of the CD4 states by HIV patients on ART before death and estimated the mean sojourn (holding) time for each state as shown in **Table 5**. The results show that, when an individual enters state 4 (CD4 < 200), the time he or she spends in this state for a single stay before moving to another state was estimated to be 4.74 (4.64–4.83) years on average. This result could be linked to the time taken by a patient in this state to respond to ART and subsequently boost immunity since this is the worst state in our HIV progression model. Since the holding times for all states are relatively long, therefore, HIV disease progression in this cohort was relatively slow.

It was also of interest to forecast the total length of stay for states 1–4 before death, which is and quite informative in the presence of reversible transitions. The results show that an individual will stay 11.3 years in state 1 (CD4 ≥ 500), 5.5 years in state 2 (350 ≤ CD4 < 500), 7.2 years in state 3 (200 ≤ CD4 < 350) and 6.9 years in state 4 (CD4 < 200) before death. In general, these results reflected that an HIV patient on ART is expected to spend more time in the highest CD4 cell counts state compared to other states.

## Covariates Effects on Immune Recovery and Deterioration Transition Rates

We further included time-independent covariates (health facility level, ART initiation period, and sex) and age in the multistate Cox proportional hazard model, and the results are displayed in **Table 6**. This model was a better fit using a likelihood ratio test compared to the model without covariates (p < 0.001). Adjusting for other covariates, the higher levels of health facility are more likely to have patients moved from lower to higher CD4 cell count states. Provincial or central hospital individuals were predominantly more likely to move from state 4 (CD4 < 200) to state 3 (200 ≤ CD4 < 350) (HR = 8.30; 95% CI, 6.64–10.36) followed by the state 3 (200 ≤ CD4 < 350) to state 2 (350 ≤ CD4 < 500) transition (HR = 8.04; 95% CI, 6.41–10.10) compared to PHC patients. This means that patients at the provincial or central hospital had a high probability of immune deterioration once they are on ART compared to PHC patients. For district or mission hospital patients, state 3 (200 ≤ CD4 < 350) to state 2 (350 ≤ CD4 < 500) was the predominant transition (HR = 4.41; 95% CI, 3.96–4.87), followed by the state 4 (CD4 < 200) to state 3 (200 ≤ CD4 < 350) transition (HR = 3.97; 95% CI, 3.61– 4.37), compared to PHC patients. Similarly, this was a positive indication of immune recovery for patients in district or mission hospital compared to PHC patients.

TABLE 4 | Estimated time-varying mortality rates per person-years and 95% confidence intervals for the time-inhomogenous multistate Markov model among ART patients in Zimbabwe national ART from 2004 to 2017.


TABLE 5 | Estimates of mean sojourn time and the total length of stay for the time-homogenous multistate Markov model among antiretroviral therapy (ART) patients in Zimbabwe national ART from 2004 to 2017.


Adjusting for other covariates, age was generally associated with immune deterioration transitions (CD4 ≥ 500 state to 350 ≤ CD4 < 500 state, 350 ≤ CD4 < 500 state to 200 ≤ CD4 < 350 state, and 200 ≤ CD4 < 350 state to CD4 < 200 state). Compared to the 25–34 years age group, there was no significant difference in immune deterioration transitions. However, the results showed that the older the patient, the more likely he or she is to become immune deteriorated. This result was observed in elderly patients (45+ years) with a pronounced risk of immune deterioration across age groups. With reference to 25–34 years age group, both the 35–44 years and the 45+ years age groups were predominantly more likely to move from state 2 (350 ≤ CD4 < 500) to state 3 (200 ≤ CD4 < 350) transition (HR = 1.31; 95% CI, 1.14–1.51) and (HR = 1.56; 95% CI, 1.34– 1.81), respectively. Holding other covariates constant, sex was significantly associated with immune deterioration transitions. Male patients had an increased risk of immune deterioration compared to female patients: state 1 (CD4 ≥ 500) to state 2 (350 ≤ CD4 < 500) (HR = 1.15; 95% CI, 1.01–1.28), state 2 (350 ≤ CD4 < 500) to state 3 (200 ≤ CD4 < 350) (HR = 1.23; 95% CI, 1.10–1.38) and state 3 (200 ≤ CD4 < 350) to state 4 (CD4 < 200) (HR = 1.67; 95% CI, 1.49–1.86). Moving from state 3 (200 ≤ CD4 < 350) to state 4 (CD4 < 200) was predominant in male compared to female patients.

## Covariates Effects on Mortality Rates

In overall, mortality was high among patients in state 4 (CD4 < 200) in this cohort. The mortality risk was pronounced among patients in provincial or central hospitals than those in district hospitals if in state 1 (CD4 ≥ 500) (HR = 1.89; 95% CI, 1.32– 2.67), state 2 (350 ≤ CD4 < 500) (HR = 3.36; 95% CI, 2.05–5.52), state 3 (200 ≤ CD4 < 350) (HR = 1.25; 95% CI, 0.73–2.16), and state 4 (CD4 < 200) (HR = 2.23; 95% CI, 1.80–2.74). State 2 (350 ≤ CD4 < 500) mortality risk was predominant in the provincial or central hospitals. This means that PHC facilities had a low risk of mortality in this cohort compared to both a higher level of care facilities. Interestingly, the mortality risk was much more pronounced among the 15–25 years age groups than other age groups. The mortality risk for state 1 (CD4 ≥ 500) was 3.71 (95% CI, 2.90–4.76), state 2 (350 ≤ CD4 < 500) (HR = 1.66; 95% CI, 1.09–2.53), state 3 (200 ≤ CD4 < 350) (HR = 1.71; 95% CI, 1.32– 2.21, and state 4 (CD4 < 200) (HR = 1.71; 95% CI, 1.47–1.98). Patients who were aged 45 years and above were more likely to immune deteriorate compared to 25–34 years age group: HR, 1.35; 95% CI, 1.18–1.54; HR, 1.56; 95% CI, 1.34–1.81, and HR, 1.53; 95% CI, 1.32–1.79 for state 1 to 2, state 2 to 3, and state 3 to 4, respectively. Male patients were more likely to die compared to female patients: state 1–5 (HR = 1.56; 95% CI, 1.26–1.92), state 3–5 (HR = 1.32; 95% CI, 1.15–1.51), and state 4 to 5 (HR = 1.32; 95% CI, 1.23–1.42). Considering the ART initiation period, mortality risks were pronounced among patients who initiated ART in 2013–2017: state 2 (350 ≤ CD4 < 500) (HR = 4.89; 95% CI, 2.22–10.79), state 3 (200 ≤ CD4 < 350) (HR = 4.14; 95% CI, 2.47–6.96), and state 4 (CD4 < 200) (HR = 9.15: 95% CI, 7.12–11.79).

## DISCUSSION

This study's objective was to describe HIV disease progression (immune deterioration) and immune recovery among adult patients on ART in Zimbabwe using patient-level data after ART decentralization. This study made use of semiparametric time homogenous and time inhomogenous multistate Markov models based on four CD4 cell counts intermediate transient states and mortality as the absorbing state. This study was a quantitative secondary data analysis of the routinely collected patient-level data through ePMS among HIV-infected patients on ART in Zimbabwe between 2004 and 2017. The study findings were comparable to other earlier studies and indicated a poor immune recovery in PHC facilities compared to higher levels of care facilities. This study observed significant findings to evaluate HIV disease progression and immune recovery based on CD4 cell counts among ART patients between 2004 and 2017 in Zimbabwe after the decentralization of ART services. The estimated mortality rate of 3.9 per 100 person-years is low and patients in state 4 (CD4 < 200) had the highest risk of death (5.9 per 100 person-years on average) compared to other states. This finding was evident throughout in the timevarying analysis of rates. The high rates in state 4 (CD4 < 200) were consistent over time; however, there was a sharp drop by seven-fold from 1 to 2 years since ART initiation. There finding of high rates in lower CD4 cell count states is comparable to finding from previous work in India and South Africa (13, 14). Immune deterioration pronounced in patients aged 45 years and above, provincial or central hospital levels of care and male patients. However, immune recovery was also observed in this cohort since there were higher transitions and transition rates from lower CD4 cell counts states to higher CD4 cell counts states. Moreover, patients in the high levels of care (district and provincial or central hospitals) had an increased probability of immune recovery compared to PHC facilities; however, mortality was high in the high levels of care. Male patients had an increased risk of mortality compared to female patients in this cohort.

Generally, there was a gradual improvement in CD4 cell count after ART initiation. This result was evident by the higher immune recovery rates compared to immune deterioration rates.


TABLE 6 | Multiple variable estimates of the hazard ratios and 95% confidence intervals from the time-homogenous multistate Cox proportional hazard model among antiretroviral therapy (ART) patients in Zimbabwe national ART from 2004 to 2017.

*The covariates included in the adjusted model were those with complete observation information and considered possible risk factors for the individual transitions.* \**Reference: Primary health care facilities.*

\*\**Reference: 25–34 years age group.*

\*\*\**Reference: female patients.*

\*\*\*\**Reference: 2004–2007 time period.*

*Bold face values are significant at 5%.*

This is an indication of effective ART treatment to HIV infected individuals and that if ART is initiated at early phases of HIV infection (with baseline CD4 cell count at least 350), immune recovery and reduced progression can be achieved since the immune system is intact. This matches the findings reported in South Africa in a similar population (17). This study also found out that a patient in state 1 (CD4 ≥ 500) is estimated to spend 11.3 years in higher CD4 cell count state before death, which is similar to other findings (11). This means that if individuals have a good immunity which can be attributed to the ART regimen efficacy, they tend to live longer than those with weak immunity. This study further found that the probability of mortality increases with a decrease in CD4 cell count, which concurs with findings from similar settings (17, 30). This is explained by the fact that being in an AIDS-defining stage leads to the highest probability of mortality. The highest mean sojourn time was in state 4 (CD4 < 200) compared to other states. This finding can be explained by the fact that patients with deteriorated immunity (low CD4 cell count) take a longer time to respond to treatment and boost their immunity before moving to lower states (31). Research has shown that CD4 cell count may remain unchanged despite the suppressed viral load due to weak CD4cell recovery in other patients (32). This is the limitation of using CD4 cell count; hence, use of viral load in monitoring the efficacy of ART treatment is recommended (33).

We found that the higher the level of care, the better the probability of immune recovery. Patients enrolled in either provincial or central hospitals and district facilities had an increased probability of immune recovery relative to those in PHC. The risk of immune recovery increased with an increase in care regardless of the immune status of the patient. This result can be supported by high resources through government channels or donor-funded and skilled personnel at the high levels of care (21). As much as patients prefer PHC facility for ART services because of reduced transport cost, easy to access (20), they are most likely understaffed. In addition, PHC are at times overburdened resulting in a high patient care turnaround time (34–37). Surprisingly, we observed relatively high mortality rates among patients enrolled in higher levels of care since one would anticipate the opposite to occur. However, this finding could be explained by either the referral system of patients within the patient care cascade or "silent-transfer" of patients from one health facility to another seeking better care (38–40). This means that the tertiary health facilities were more likely to receive patients who are more seriously ill and with a greater likelihood of death (38, 41, 42).

As we accounted for interindividual variability effects to get more insight on HIV disease progression in this cohort, we found that HIV patients who were aged 15–24 years at ART initiation tend to have a higher mortality than patients aged 25–34 years, and the progression to death was much more pronounced if a patient was coming from state 1 (CD4 ≥ 500) or state 2 (350 ≤ CD4 < 500). This finding supports other earlier studies which showed that adolescents are heavily burdened by chronic complications; hence, require high level of patients management (43). In addition, this group is prone to stigma, vulnerable, and prone to various chronic comorbidities as well as being and the transitional stage of becoming independent without much parental care. Intensifying community-based support for caregivers can help reduce poor health outcomes in adolescence (44). However, more research is required to further confirm this observed association in our study. Patients aged 45 years and above showed a higher risk of immune deterioration compared to younger patients (25–34 years), which was similar to other studies which reported that younger people have a higher probability of immune recovery than the elderly (11, 12). In addition, this could be explained by the immune response in older patients is weak compared to young people, that is, the capacity to generate CD4 cell counts and suppress viral load is reduced in elderly patients (45). Moreover, this could be explained by the fact that this age group is highly associated with of non-communicable diseases like hypertension and diabetes. Managing an HIV patient with multiple comorbidities is known to be complex and also intake of different drugs results in overlapping drug toxicity and lowering of the ART drug effect (35). As a result, most patients with comorbidities (communicable or non-communicable diseases) may either default ART treatment or ART drug becomes less effective due to the presents of other medications an individual is on; therefore, these patients subsequently get worse. These results confirm the need for test and treat regardless of disease stage and age which have much positive influence in patients aged 45 years and above (46, 47).

In our study, we found that male patients had higher rates of immune deterioration. This was quite pronounced on the transition from state 3 (200 ≤ CD4 < 350) to state 4 (CD4 < 200). In addition to this, we also observed poor survival outcomes among male patients. This finding is consistent with other results from Shoko and Chikobvu (17) who found out that men were six times more likely to move to higher CD4 cell count state. Another study which supports this result reported that male patients gain fewer CD4 cell counts as compared to female patients, and they have an increased immunological non-response than female patients (48). However, this finding contradicts other earlier studies which documented that gender difference does not exhibit any significant differences in HIV disease progression (11, 12). The participants in this study were predominantly female, and this could mirror the fact that female patients have better involvement in HIV issues and their health-seeking behavior compared to male patients. Female patients have multiple entry points in HIV care like efficient linkage of ART treatment in antenatal clinics and prevention-of-mother-to-child programs which results are better immune recovery than male patients (48). Male involvement in HIV care strategies needs to be enhanced to compliment female role in HIV prevention (49–53). Therefore, there is a need to scale up HIV testing rate among men and intensify repeated testing and increasing acceptance of HIV care linkages. With the critical societal role played by men, they improve decision making within a household and society at large if they are fully involved in HIV programs (54). There is need to intensify existing strategies like male circumcision, selftesting, HIV programs at workplaces, and recreational places and also come up with flexible clinic hours and conditions which accommodate men like shortening clinic turnaround time and increase privacy (48).

Our results should be viewed in light of some limitations. The dataset used had incomplete information especially in the clinical parameters which resulted in dropping off a considerable portion of the data. In addition, this study could not adjust for ART adherence, which is an important issue in HIV disease progression since it directly associated with the probability of moving to a lower CD4 cell count state if a patient fails to adhere to treatment. This study also considered patients from ART centers linked to the ePMS; this might have caused overestimation or underestimation of the transition intensities reported in this study. The analysis was solely based on the time homogenous assumption which is much more useful in the presence of heavy right censoring. Earlier studies have shown that, if a patient on ART is virally suppressed, if there is no treatment uptake violation, that patient is likely to continue recovering well. However, this violates the Markov and memory loss properties of these models, and this limitation affects the time-homogenous Markov process models. Other assumptions like non-Markovian, semi-Markovian, or hidden Markovian can be explored incorporating interval censoring and assuming time-varying effects. This model could not account for frailty terms to explain unobserved individual heterogeneity and spatial effects to show regions with an increased likelihood for a particular transition.

Moreover, this study covers the period in which ART initiation guidelines were changed three times; hence, there could be some bias in the estimates. In addition, the period covered is mainly when the country was conducting targeted differential monitoring, whereby most of the patients who had their CD4 measurement taken were mostly those carried out on the discretion of the physician. Authors acknowledge the measurement error (55) associated with CD4 cell counts in ART monitoring since a patient's measurement may indicate a lower CD4 when in fact the patients had recovered, hence the switch to use viral load in ART monitoring.

There could be possible participant inclusion bias in this study since we excluded those who were lost to follow-up (LTFU) ending up with a subsample. The exclusion of this group was to have a less complicated model with fewer states since this group would be a stand-alone compartment. However, this may have impacted in the generalizability of our research findings in that the model used is not a complete picture of the transition patterns in an ART program as some of the exit points have been excluded. Majority of the patients who became LTFU were mainly those who were very sick (with a CD4 < 200) and if tracked there could be a possibility that some of them would have died (56). The implications of such a LTFU pattern normally lead to data missing not at random in longitudinal time to event studies. Had we included the LTFU group and right censored them in their last observed states, this would have caused an upward bias of the Kaplan–Meier curve, which at times may affect the generalizability of the findings (57). In future studies, it would be essential to include the LTFU and withdrawals states in the model to have detailed transition patterns of these outcomes in an ART program. Our data could not allow us to estimate transitions to AIDS since the information was not available and exhaustively adjust for comorbidities which might be linked to the observed transition patterns in this cohort other than tuberculosis. However, tuberculosis was not included as a covariate because of the highly computational intensive of this reasonably huge dataset if many covariates are added. Hence, we restricted our analysis to demographic covariate so that we attain convergence. A notable limitation in this study is the low mortality rate of which most deaths were for those patients who initiated ART in the 2013–2017-year period. The plausible explanation for this could be an issue of a biased dataset in terms of capturing patient's information. It is most likely that the majority of the deaths that occurred earlier may have been lost during data capturing from patients files to the electronic database since this was a retrospective exercise. Thus, we are most likely to have the long-term survivors from the early period.

## CONCLUSION

Multistate models are crucial in providing the general disease trajectories through intermediates states to alert program response before an adverse event occurs. Our findings have significant implication in the continuum of HIV care. It is prudent to target early ART treatment initiation to prevent subsequent immune deterioration. Once this is achieved, survival outcomes and quality of life can be improved with the subsequent reduction in opportunistic infections. Strengthening of PHC facilities in ART is imperative in decentralization environment. More aggressive male involvement strategies should be enhanced to strengthen male involvement in HIV care, and adolescents/young adult management has to be upscaled to prevent ART defaulting and avert poor health outcomes.

## DATA AVAILABILITY STATEMENT

The dataset used for this study can be found through an application process from the Ministry of Health and Child Care in Zimbabwe which is the custodian of the ePMS data through the AIDS/TB Unit who manages and oversees the ePMS data collection process.

## REFERENCES


## AUTHOR CONTRIBUTIONS

ZM cleaned and analyzed the data and drafted the manuscript. JT and TC reviewed the manuscript and advised on analysis. EM guided and oversaw the analysis and reviewed the manuscript. All authors reviewed the final manuscript before submission.

## FUNDING

This work was supported by the Developing Excellence in Leadership, Training and Science (DELTAS) Africa Initiative Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) (Grant No. 107754/Z/15/Z). The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS) Alliance for Accelerating Excellence in Science in Africa (AESA) and was supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust (Grant No. 107754/Z/15/Z) and the UK government. The views expressed in this publication are those of the authors and not necessarily those of the AAS, NEPAD Agency, Wellcome Trust, the UK government, or Zimbabwe Ministry of Health and Child Care.

## ACKNOWLEDGMENTS

Our acknowledgments go to the Ministry of Health and Child Care, AIDS/TB Units department for the support and compilation of the data used in this study. We also thank the Division of Epidemiology and Biostatistics at the School of Public Health for their assistance in the getting ethical approval of this study.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00326/full#supplementary-material


synthesis, and model calibration. Medical Decision Making. (2005) 25:633–45. doi: 10.1177/0272989X05282637


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Matsena Zingoni, Chirwa, Todd and Musenge. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Effectiveness of Lifelong ART (Option B+) in the Prevention of Mother-to-Child Transmission of HIV Programme in Zambia: Observations Based on Routinely Collected Health Data

Brian Muyunda1,2 \* † , Patrick Musonda1†, Paul Mee3†, Jim Todd3† and Charles Michelo1†

*<sup>1</sup> Department of Epidemiology and Biostatistics, The University of Zambia School of Public Health, Lusaka, Zambia, <sup>2</sup> Ministry of Health, University Teaching Hospital, Lusaka, Zambia, <sup>3</sup> Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom*

### Edited by:

*Mitsunori Ogihara, University of Miami, United States*

### Reviewed by:

*Shihao Yang, Harvard Medical School, United States Abu Saleh Mohammad Mosa, University of Missouri, United States*

> \*Correspondence: *Brian Muyunda muyundamwinanu@ymail.com*

*†These authors have contributed equally to this work*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *07 May 2019* Accepted: *16 December 2019* Published: *17 January 2020*

### Citation:

*Muyunda B, Musonda P, Mee P, Todd J and Michelo C (2020) Effectiveness of Lifelong ART (Option B*+*) in the Prevention of Mother-to-Child Transmission of HIV Programme in Zambia: Observations Based on Routinely Collected Health Data. Front. Public Health 7:401. doi: 10.3389/fpubh.2019.00401* Background: Mother to child transmission of HIV (MTCT) is a global challenge affecting many countries especially in sub-Saharan Africa. In 2009 about 370,000 infants were infected with HIV mainly through MTCT and most of them in sub-Saharan Africa. We aimed to determine the effectiveness of Option B+ compared to other options in reducing rates of early MTCT of HIV infections in Zambia.

Methods: This was a retrospective cohort study based on routinely collected data using SmartCare in Zambia. Survival analysis with Cox Proportional Hazard regression was used to determine association between MTCT and regimen type of mothers. Kaplan-Meier (K-M) curves were used to compare MTCT for infants born to mothers option B+ to those on other options, and Wilcoxon (Breslow) test was used to establish statistical significance.

Results: Overall (*n* = 1,444), mother-baby pairs with complete data were included in the analysis, with the median age of mothers being 33 (28–38) years; and 57% of these women were on Option B+. MTCT rate was estimated at 5% (73/1,444) [*P* = 0.025]. A Kaplan-Meier estimate showed that HIV Exposed Infants (HEI) of mothers on Option B+ had lower MTCT rate than those who were on other MTCT prevention interventions [Wilcoxon test; chi2 = 4.97; *P* = 0.025]. Furthermore, The Nelson Aalen cumulative hazard estimates indicated similar evidence of option B+ being more effective than other options with some statistical significance [HR = 0.63, *P* = 0.068]. HEI of option B+ mothers had 50% reduced risk of having HIV infection compared to option A/B [adjusted HR = 0.4; 95% CI = 0.28–0.84; *P* = 0.010]. HEI to women who were married had an increased risk 50% of getting infected compared to those not married [adjusted HR = 1.5; 95% CI = 3.43–6.30; *P* < 0.001]. Exposed infants whose mothers had assisted delivery had 3 times increased risk of getting infected compared to those born through normal vaginal delivery [Adjusted HR = 3.2; 95% CI = 0.98–10.21; *P* = 0.050].

**87**

Conclusions: The use of Option B+ as PMTCT intervention was found to be more effective in reducing MTCT of HIV compared to other options. Scaling up access to life-long ART and improving retention for women on treatment can potentially reduce further vertical transmission.

Keywords: PMTCT, pregnant women, option B+, routine data, HEI, Zambia

## INTRODUCTION

Mother to child transmission of HIV (MTCT) is a global challenge affecting many countries especially in Sub-Saharan Africa. In 2009 about 370,000 infants were infected with HIV mainly through MTCT with most of them in Sub-Saharan Africa (1). By 2012, the number of newly infected infants globally had come down to 260,000 (2, 3) and by 2015, only 150,000 children were newly infected with HIV at birth [(4); UNGASS]. In Zambia, MTCT is one of the key drivers of HIV epidemic with 10% of all new HIV infections, and 90% of infections in children attributable to MTCT. Without antiretroviral therapy, 15–30% of babies born to HIV positive women are infected during pregnancy and delivery, while a further 5–20% become infected through breastfeeding (4–6). In resource constrained countries, approximately one third of HIV infected children die before 1 year and more than half die before their second year (7–12).

In 2008, 16.4% of women attending antenatal clinic (ANC) in Zambia were HIV positive, putting 80,000 infants at risk of getting infected through MTCT (5, 13). The Zambia ministry of health integrated PMTCT into Maternal and child health (MCH) to help reduce MTCT of HIV and to decrease both maternal and child mortality (4, 6). In an effort to further reduce MTCT of HIV, in 2013 Zambia adopted the World Health Organization (WHO) guidelines and introduced Option B+ as a new strategy within the PMTCT program. In the same year, the national PMTCT program recommended that all infants born to HIV positive mothers had a virological antigen test for HIV within the first 6 weeks and a second test at 6 months of life. HIV rapid antibody tests would only be used at the age of 12 and 18 months to check on the HIV status of the infants (14). Option B+ requires initiation of all HIV positive pregnant and breastfeeding women onto lifelong Antiretroviral therapy (ART), regardless of CD4<sup>+</sup> cell count or WHO clinical staging.

Before adopting option B+, the Zambian National Guidelines for PMTCT, updated in 2007–2009, demanded that women eligible for lifelong combination Antiretroviral Therapy (cART), option A/B, were those with absolute CD4 count ≤350 cells/mm<sup>3</sup> (regardless of clinical stage). Option A regimen included AZT starting at 14 weeks gestation followed by single dose Nevirapine (sd-NVP) and AZT/3TC at delivery for 7 days postpartum for mother and daily NVP from birth until 1 week after breastfeeding cessation or 4–6 weeks if no breastfeeding or mother on triple ART for the infant. On the other hand, Option B included Triple ARV Prophylaxis at 14 weeks gestation and ending at delivery or 1 week after breastfeeding cessation and Daily NVP or twice daily AZT for 4–6 weeks when replacement feeding and daily NVP for 6 weeks when breast feeding. This criteria had a negative impact on the effectiveness of option A/B regimen because of resource constraint challenges that included (1) capacity of health centers to assess CD4 count, (2) availability of CD4 count results at clinics for decision making, and (3) capacity to initiate cART. One study conducted in Zambia showed that test results of 33.5% of blood samples collected for CD4 count were never returned to the clinic. Only a minority of HIVpositive pregnant women were assessed for CD4 count and had their test results available. Among HIV-positive women whose CD4 count results were available, 47% were eligible for cART due to the cell count threshold of ≤350 cells/mm<sup>3</sup> . Frequent breakdown of CD4 count machines, insufficient number of trained laboratory technicians to run CD4 count laboratory equipment, lab fees applied in some facilities for CD4 count, and clerical errors all compounded the problem (4, 5, 15, 16). Women who were not eligible for lifelong ART were given a short course of prophylactic treatment designed to protect the infant from MTCT of HIV (**Table 1**).

Option B+ initiative started in Malawi in 2011 because of the country's high HIV prevalence; short birth intervals (median = 3 years), high fertility (total fertility rate = 5.7), extended breastfeeding and a limited laboratory capacity (14, 17, 18). Zambia shared many of Malawi's characteristics for MTCT of HIV. However, there has been great controversy on the adoption of the option B+ strategy as the best approach to achieving elimination of mother-to-child-transmission from its inception in 2011 in Malawi and its adoption in other resource constraint regions with limited laboratory capacity. This study aimed to establish the effectiveness of option B+ compared to other PMTCT interventions and the factors associated with MTCT. The results will help fill the increasing gap between established policy on PMTCT strategies, particularly option B+ as an effective approach to reduce HIV transmission and the social practices associated with program feasibility, accessibility, uptake, and retention in care.

There is substantial body of literature from elsewhere on the effectiveness of the PMTCT programme in reducing transmission from mother to child, but data from Africa about the operational effectiveness of Option B+ in the

**Abbreviations:** MTCT, Mother to Child Transmission; PMTCT, Prevention of mother to Child Transmission; EID, Early Infant Diagnosis; HEI, HIV Exposed Infants; ANC, Antenatal Care; eMTCT, elimination of Mother to Child Transmission; ART, Antiretroviral Therapy; cART, Combined Antiretroviral Therapy; HIV, Human Immuno-Virus; DBS, Dry Blood Sample; DNA PCR, Deoxyribonucleic Acid Polymerase Chain Reaction; CDC, Centers for Disease Control and Prevention; UNZABREC, University of Zambia Biomedical Research Ethics Committee; MOH, Ministry of Health; WHO, World Health Organization.


TABLE 1 | Treatment algorithm and transition of PMTCT strategies in Zambia for HIV positive women and their exposed babies.

PMTCT are sparse (17, 19–27). Particularly in Zambia the effectiveness of option B+ has not been evaluated, as far as we are aware, thereby raising concerns on its effectiveness in the elimination of MTCT. This study measured the MTCT rate of HIV on Option B+ compared to other options in cohorts of mother-baby pairs that were part of the national PMTCT programme.

## METHODS

## SmartCare Design

SmartCare is an electronic health management record system which stores individual patient information at about 600 government health facilities in Zambia. It can be used for monitoring of patient treatment and outcomes and for reporting health service delivery at district, provincial and national levels across all districts of Zambia (14). SmartCare is a public domain, data system using microchip, touch screen, and solar technologies to improve health records of patient care and to enable public health reporting for persons attending health facilities. It records all patient interactions and subsequent visits to the health facility which includes clinical appointments, laboratory and pharmacy data with a unique identification number. This electronic health record system stores patient information on a computer, as well as a smart card, and easily produces reports at facility, district, provincial or national levels. SmartCare provide greater continuity of clinic based care; and increases the privacy of sensitive medical information in services for Family Planning, sexually transmitted infections and HIV. For pregnant women, SmartCare is used for Ante-natal clinical visits and enrolment into PMTCT services. SmartCare aims to reduce the burden of paperwork on health staff and improve the quality of information and decision support for patients, while providing automated information flow into the government's existing Health Management Information System (ZHMIS) (28).

## Option B+ Effectiveness Design Sampling and Setting

This was a retrospective cohort study of HIV-infected women and their infants with data recorded in SmartCare. The Zambia ART programme has opt-out HIV testing for all eligible pregnant women, with all women having a positive HIV test result and those with known positive status be enrolled into the PMTCT programme. In the PMTCT programme they receive a comprehensive intervention to prevent MTCT of HIV. Since 2013, all women enrolled into the PMTCT programme are treated with lifelong ART (option B+) regardless of their CD4 cell count or WHO staging (Zambia Consolidated PMTCT Guidelines) (see **Table 1**).

All women enrolled in the PMTCT programme with records in the SmartCare database between 2007 and 2017 were included in the study. The outcome was measured in HEI born to women who were HIV positive.

## Data Extraction and Management

Data were abstracted from HIV infected pregnant and breastfeeding women who were enrolled into PMTCT and ART registers using SmartCare database between 2007 and 2017. All records of HEI were paired with the HIV-infected women. The extracted data included the demographic characteristics of the pregnant women at enrolment, their entry point through HIV counseling and testing, history of antenatal care for the most recent birth, full birth history on labor and delivery, ART regimen type, mode of delivery, postnatal, and follow up data for both mother and new-born baby; social-economic status; and educational attainment.

## Statistical Analysis

All analyses were done using Stata software version 14 (Stata corporation College Station, Texas). Descriptive analysis were used for the characteristics of the pregnant women and their babies. The outcome of interest was time to an HIV positive result in the babies, with those who tested HIV negative censored at the date of the negative test.

Using Kaplan-Meier (K-M), graphs were used to show time to HIV positivity, comparing women with different characteristics. Survival analysis were done to determine and compare the rate of transmission between HIV positive pregnant women on Option B+ compared with those on the other regimen in the PMTCT program. The Wilcoxon (Breslow) test was used to establish statistical significance of the difference in the survival rates between option B+ and other interventions. Both single and multiple Cox proportional hazards regression models were conducted to determine the rate of transmission of HIV and potential confounders of MTCT. The Nelson Aalen cumulative hazard estimates were used to assess the risk of transmission between the two regimen. The validity of the proportional hazard assumptions were assessed using stph-plots for the treatment regimens. Bivariate analysis using Pearson's chi squared test was

TABLE 2 | Characteristics of HIV positive women on Lifelong ART (Option B+) from Zambia SmartCare routinely collected data, 2005–2017.


*Sample size* = *1,444.*

*Median (IQR) age* = *33 (28–38) years.*

*Mean baseline CD4*<sup>+</sup> *cell count* = *467 cells/ml (SD 246.5). 95% CI obtained using Pearson chi squared.*

used to determine the crude associations between Option B+ and infant HIV status. The rate of vertical transmission was used as a proxy to measure effectiveness at a rate of 5% or less according to WHO universal goal (4).

## Ethics Consideration

Permission was sought from Zambia Ministry of Health (MOH) and Centers for Disease Control Zambia (CDC) to use SmartCare

TABLE 3 | Bivariate analysis of background characteristics and Regimen type of HIV Positive mothers from Zambia SmartCare routinely collected data, 2007–2017.


*Sample size* = *1,444.*

*Median (IQR) age* = *33 (28–38) years.*

*Mean baseline CD4*<sup>+</sup> *cell count* = *467 cells/ml (SD 246.5).*

\**P-Values obtained using Pearson chi squared.*

\*\**P-Values obtained using Fishers Exact.*

*Bold values represent a significance of p* < *0.2.*

patient data. A waiver was obtained from the University of Zambia Biomedical Research Ethics Committee (UNZABREC) reference Number 010-04-18 which granted permission to conduct this study on HIV cascade in PMTCT and associated factors. All SmartCare data had personal identifiers removed to maintain confidentiality and anonymity of the participants.

## RESULTS

## Participation and Distribution

A total of 1,444 mother and their infants were matched to the SmartCare record for their infants and included in the analysis. The mothers were aged 15–50 years with a Median (IQR) age of 33 (28–38) years. Further, 87% (1,185) were married and 56% (660) had attained primary education only. In addition, 50% (710) of the mothers reported a parity of (0–1) whilst 10% (145) had five or more previous births, but normal vaginal delivery was reported by almost all 97% (1,386) women.

The mean baseline CD4<sup>+</sup> cell count was 467 cells/ml (SD 246.5). Of the total women, 40% (580) had been enrolled on Option B+, of which only 1.4% (12) reported nonadherent (**Table 2**).

HEI of positive mothers on option B+ regimen had a reduced transmission rate of 2.9% (17/580) compared to other regimen (P = 0.003). MTCT was higher among the women aged 25–34, accounting for 7% (46/697) (P = 0.034). Furthermore, married women had a higher transmission rate of 12% (6/47) compared to those not married 5% (60/1,125) (P < 0.001). HIV positive mothers who delivered through assisted means had a higher likelihood to transmit the Virus to their infant, 12% (3/25) compared to those with a spontaneous vaginal delivery 5% (70/1,386) or cesarean section 0% (0/18) (P = 0.184) (**Table 3**).

## Key Predictors for Mother to Child Transmission of HEI

In the survival analysis, overall, a total follow up time of 38,520 months was experienced by 1,444 children born to HIV positive mothers. There were 73 recorded HIV positive tests, giving a MTCT of 5.1 per 100 livebirths. A Kaplan-Meier (K-M) estimate showed that HIV exposed infants of mothers recruited on option B+ had lower MTCT than those recruited on the other options. A Wilcoxon (Breslow) test for equality of survival functions showed statistical significance (chi2 = 4.95, P = 0.025) for the observed difference in HIV survival rates between Option B+ and other PMTCT options. Proportional hazard assumptions were assessed using stph-plots for treatment regimen and were satisfied.

Furthermore, The Nelson Aalen cumulative hazard estimates (HR = 0.63, P = 0.025) and the smooth hazard estimates indicated similar evidence of statistical significance for a difference in transmission of HIV between infants exposed to Option B+ and Option A/B mothers (**Figures 1–3**).

Exposed infants to HIV positive option B+ mothers had 50% reduced risk of having HIV infection through vertical transmission compared to those exposed to option A/B mothers [adjusted HR = 0.4; 95% CI = 0.28–0.84; P = 0.010]. HEI to option B+ women who were married had an increased risk 50% of getting infected compared to those exposed to mothers not married [adjusted HR = 1.5; 95% CI = 3.43–6.30; P < 0.001]. Furthermore, Exposed infants whose mothers had assisted delivery had 3 times increased risk of getting infected compared to those who had normal vaginal delivery [Adjusted HR = 3.2; 95% CI = 0.98–10.21; P = 0.050] (**Table 4**).

## DISCUSSION

In 2014, the national HIV vertical transmission rate recorded was at 9% (29). Our current data suggest an HIV transmission rate of 5%. This finding is actually lower than sub-Saharan average rate as observed in one cohort study conducted in Ethiopia and other African countries which showed that out of the 221 live births from HIV positive mothers, MTCT rate was approximately between 8 and 10% (30). The Global eMTCT Plan recommends providing comprehensive PMTCT services to at least 95% of pregnant women and reduce MTCT to <5% by the year 2015 and zero transmission by 2030 (31).

TABLE 4 | Cox proportional hazard analysis of background characteristics and Regimen type of HIV Positive woman from Zambia SmartCare routinely collected data, 2018.


*[a] Missing Confidence intervals because of missing standard errors due to stratum with single sampling unit.*

The level of HIV testing uptake among ANC women has substantially increased from the time PMTCT was introduced in Zambia in 1999. However, failure to attend all clinical appointments, adherence to treatment for option B+ mothers contributes to missed opportunities for early infant Diagnosis (EID), and as a result many are not tested until after 24 months as observed in this study. Similar findings were observed in a retrospective follow up study in Sub-Saharan Africa, from 2004 to 2009 on HIV testing of infants ≥18 month, which posed a challenge as only 896 (10.6%) of infants completed the follow up HIV testing, of which 106 infants were found to be positive representing 14.3% vertical transmission rate (30). Among the key challenges faced in the diagnosis of infants are lack of training to collect and handle Dry Blood Sample (DBS), results not collected from central laboratory and misplacement of results within the health facility before reaching the mothers. An infant is presumed HIV uninfected if they had negative DNA PCR assays at 6 and 14 weeks of age. A child is classified as HIV uninfected if both antibody tests are negative at or after 18 months. Its only by addressing these challenges that option B+ benefits can be realized.

Several potential limitations to this study were observed. Firstly, the main limitation was the inability of SmartCare to link mothers to their infants, which was very crucial for this study as it affected the sample size. Secondly, Option B+ (lifelong ART) coverage in Zambia was gradual after its adoption in 2013 and many of the health facilities were still using option A/B and this could have reduced the effect of change. Besides data was available only in facilities that had SmartCare system active and functional. Another limitation is that being routinely collected data, critical variables such as EID, mother-baby pair link, retention patterns, lost to follow up, and heath care utilization including the frequency of a patient's appearance in the ANC records were missing. The frequency of a patient's health care utilization could have been used to adjust for those visiting the heath care facility more often which would have indicated who experienced the event more quickly, and thus bias the time-to-event analysis. Despite these challenges, however, the data provides better estimates on the effectiveness of option B+. Besides, most of the facilities affected with option B+ roll-out were mainly in rural areas. Furthermore, considering that this is one of the first studies, as we are aware, to document the experience of implementing option B+ in Zambia which will help accelerate toward a 2030 ambitious goal of zero HIV transmission, we believe this study is very worthy to be undertaken.

Option B+ regimen for mothers and infants offers significant benefits for transmission prevention, maternal health and public health program delivery. It presents distinct advantages in terms of transmission prevention to uninfected partners and increased simplicity potentially improving program feasibility, access, uptake and cost effectiveness. Despite these benefits, however, concerns have been raised about the safety of ART exposure to fetuses and infants as well as adherence challenges for pregnant and breastfeeding mothers (32). Similar Option B+ benefits were observed in a cohort comparative study of 102 women on ART prior to Option B+ to a cohort of 109 women on Option B+ conducted in Malawi, which showed that women on Option B+ had fewer WHO 3/4 conditions, higher CD4 count and lower mortality compared to those in pre Option B+ (22). This high mortality and poor health of pre option B+ women posed a direct effect on the health and survival of their infants. This is because pregnant women with a high viral load and lower CD4 count are more likely to transmit HIV virus to their new-born babies (33). Furthermore, in another study, women on option B+ had low mortality compared to those on CD4 cell count or WHO clinical stage criterion group. Mortality among the women on option B+ during pregnancy was 0.4% while those enrolled based on CD4 cell count or WHO clinical stage 3/4 criterion recorded mortality of 3% (17).

Another study conducted on retention of pregnant and breastfeeding women in Malawi also observed that most women (83%) starting ART with Option B+, over 17% were lost to follow up in the 6 months period and most of them occurred in the first 3 months of therapy. The results further showed that option B+ women who started therapy during pregnancy were 5 times more likely than pre option B+ women never to return for their next clinical follow up (34). These results indicate that although Option B+ therapy possibly had a better outcome, retention of women in care and lost to follow up especially after delivery was a challenge. The possible explanation could have been that women started on option B+ still enjoyed a good measure of health because of their good immune system, having a high CD4 cell count and low viral copies, and would not follow up their clinical appointments and adhere to therapy potentially increasing the chances of HIV transmission to the infant (22, 33). Universal access to HIV testing in ANC and 100% linkage to care and treatment coupled with strategies to improve retention and adherence to treatment is crucial to further reduce vertical transmission rate.

## CONCLUSION

In Zambia Option B+ has been found to be more effective in reducing MTCT rates to lower acceptable levels than any other options thus opening opportunities for scaling up access to life-long ART and improving retention and contribute to potentially reduced vertical transmission sustainably. However, these findings also suggest the need for programmatic efforts to identify other maternal health survival bottlenecks that could hamper universal access to PMTCT interventions for all motherbaby pairs on lifelong ART in poorly accessed groups. This may include strategies to prevent missing of clinical appointments, infant post-natal follow up and eventual non-retention. Lastly but not the least, these findings also indirectly suggest the need for further integration of ANC services to include innovative PMTCT interventions as part of a total service package.

## RECOMMENDATIONS

Since Early Infant Diagnosis at recommended time is directly linked to care and treatment, supporting existing Government measures to retain HIV-infected women in eMTCT programme in order to improve access to universal HIV treatment and care among women is key in addressing barriers to increased uptake of PMTCT. Strengthening HIV testing in ANC especially in rural health facilities, encouraging women to adhere to treatment and attend all clinical appointments as well as providing initiatives that seek to overcome barriers to treatment are some of the ways that can help improve maternal and new-born health. Furthermore, health workers should ensure that HIV-infected women, on option B+ are retained in care and bring their babies for clinical appointments and testing at recommended schedules.

## DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the Zambia Ministry of Health. These datasets can be obtained on request from the Zambia Ministry of Health.

## ETHICS STATEMENT

Permission was sought from Zambia Ministry of Health (MOH) and Centers for Disease Control Zambia (CDC) to use SmartCare patient data. A waiver was obtained from the University of Zambia Biomedical Research Ethics Committee (UNZABREC) reference Number 010-04-18 which granted permission to conduct this study on HIV cascade in PMTCT and associated factors. All Smartcare data had personal identifiers removed to maintain confidentiality and anonymity of the participants.

## AUTHOR CONTRIBUTIONS

BM conceived the study ideas, design, analyzed data, wrote the draft manuscript, and wrote the final manuscript. CM participated in the study design, methods, analysis, and edited the final manuscript. PMe contributed to the analysis, edited the manuscript, and made contributions to the final analysis. JT participated in the study design, methods, analysis, edited the manuscript, and contributed to the final analysis. PMu edited the manuscript and contributed to the final analysis.

## FUNDING

This study was supported by the Bill and Melinda Gates Foundation (OPP1084472), which provided the means for the corresponding author to undertake the analysis of Zambian routine health data, through a SEARCH fellowship.

## ACKNOWLEDGMENTS

We would like to express our warm gratitude to all members of staff School of Public Health, University of

## REFERENCES


Zambia for their technical support during the research process. Our special gratitude also goes to SEARCH project, London School of Hygiene and Tropical Medicine for their technical support during manuscript writing and analysis. We are deeply grateful to the Zambia Ministry of Health (MOH) for granting us permission to use SmartCare patient data.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Muyunda, Musonda, Mee, Todd and Michelo. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Characterizing a Leak in the HIV Care Cascade: Assessing Linkage Between HIV Testing and Care in Tanzania

Richelle Harklerode<sup>1</sup> \*, Jim Todd<sup>2</sup> , Mariken de Wit <sup>2</sup> , James Beard<sup>3</sup> , Mark Urassa<sup>3</sup> , Richard Machemba<sup>3</sup> , Bernard Maduhu<sup>4</sup> , James Hargreaves <sup>2</sup> , Geoffrey Somi <sup>5</sup> and Brian Rice<sup>2</sup>

*1 Institute for Global Health Sciences, University of California, San Francisco, San Francisco, CA, United States, <sup>2</sup> London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>3</sup> National Institute for Medical Research, Mwanza, Tanzania, <sup>4</sup> Magu District Council, Mwanza, Tanzania, <sup>5</sup> Ministry of Health, Community Development, Gender, Elderly and Children, Dar es Salaam, Tanzania*

Background: In Tanzania, HIV testing data are reported aggregately for national surveillance, making it difficult to accurately measure the extent to which newly diagnosed persons are entering care, which is a critical step of the HIV care cascade. We assess, at the individual level, linkage of newly diagnosed persons to HIV care.

### Edited by:

*Remco P. H. Peters, University of Pretoria, South Africa*

### Reviewed by:

*Kate Rees, University of the Witwatersrand, South Africa Joshua Murphy, Wits Health Consortium (WHC), South Africa*

\*Correspondence:

*Richelle Harklerode richelledh@yahoo.com*

### Specialty section:

*This article was submitted to Infectious Diseases - Surveillance, Prevention and Treatment, a section of the journal Frontiers in Public Health*

Received: *19 September 2019* Accepted: *19 December 2019* Published: *30 January 2020*

### Citation:

*Harklerode R, Todd J, de Wit M, Beard J, Urassa M, Machemba R, Maduhu B, Hargreaves J, Somi G and Rice B (2020) Characterizing a Leak in the HIV Care Cascade: Assessing Linkage Between HIV Testing and Care in Tanzania. Front. Public Health 7:406. doi: 10.3389/fpubh.2019.00406* Methods: An expanded two-part referral form was developed to include additional variables and unique identifiers. The expanded form contained a corresponding number for matching the two-parts between testing and care. Data were prospectively collected at 16 health facilities in the Magu District of Tanzania.

Results: The records of 1,275 unique people testing HIV positive were identified and included in our analysis. Of these, 1,200 (94.1%) responded on previous testing history, with 184 (15.3%) testing twice or more during the pilot, or having had a previous HIV positive test. Three-quarters (932; 73.1%) of persons were linked to care during the pilot timeframe. Health service provision in the facility carrying out the HIV test was the most important factor for linkage to care; poor linkage occurred in facilities where HIV care was not immediately available.

Conclusions: It is critical for persons newly diagnosed with HIV to be linked to care in a timely manner to maximize treatment effectiveness. Our findings show it is feasible to measure linkage to care using routinely collected data arising from an amended national HIV referral form. Our results illustrate the importance of utilizing individual-level data for measuring linkage to care, as repeat testing is common.

Keywords: HIV, surveillance, linkage to care, Tanzania, HIV care cascade

## INTRODUCTION

It is vital that persons who are newly diagnosed with HIV are linked to care in a timely manner to maximize treatment effectiveness and the potential of treatment as prevention (1–3). In recognition of this, in 2015 the World Health Organization (WHO) included linkage to care (the number and percentage of people living with HIV who are receiving HIV care) as one of its 10 global indicators to measure progress and drive action toward the United Nations Joint Program on AIDS (UNAIDS) 90-90-90 targets (4, 5).

Timely linkage to care remains an issue in sub-Saharan Africa, including Tanzania (6–10). The prevalence of HIV in Tanzania among adults aged 15–49 years is 5.1% (11), with an estimated 1.4 million people living with HIV (12). In 2016, Tanzania adopted the WHO guidance that antiretroviral therapy (ART) be initiated among all persons diagnosed with HIV regardless of CD4 count (12).

The National AIDS Control Program (NACP) is the coordinating unit for HIV patient monitoring and surveillance in Tanzania. To inform routine program monitoring activities, the NACP maintains aggregate HIV testing services (HTS) data, and anonymized individual level care and treatment clinic (CTC) data. A CTC number (assigned as a patient unique identifier for all treatment within a clinic) facilitates some de-duplication of these individual level data (13). With only aggregate HTS data being reported, it remains difficult to accurately numerate those newly diagnosed with HIV, or to identify persons who have been diagnosed but have not entered care.

A situational assessment was conducted in Tanzania in 2015 to evaluate the feasibility of generating individual-level longitudinal HIV data from point of diagnosis to entry into care and initiation of ART, and to leverage such data in a comprehensive strategic HIV information system, such as case surveillance (14). The WHO recommends HIV case surveillance as an integral step in securing strategic information for public health surveillance (15, 16). The situational assessment identified linkage between HIV testing and entry to care and treatment as a weakness when considering the feasibility of establishing HIV case surveillance in Tanzania (14).

To enhance NACP's capacity to track progress toward the UNAIDS 90-90-90 targets, and potentially act as a precursor for a national HIV case surveillance system, we describe a pilot project where we developed surveillance methodology to routinely collect HIV program data using a standardized referral form to assess and characterize linkage to care among persons newly testing HIV positive.

## METHODS

The pilot included 16 health facilities offering HIV testing in the Magu District of the Mwanza region in Tanzania. The Magu district was chosen for the pilot as a similar expanded referral form had been utilized in one district health facility in 2008 (17). The pilot health facilities (all offering HTS) were purposely selected to ensure variation in type and size (health center, dispensary, antenatal clinic (ANC) or standalone testing site), services offered [voluntary counseling and testing (VCT), antenatal care and/or HIV care and treatment], ownership (government or private), and type of community served (rural settlement, fishing village or roadside community). The HTS offered within these facilities included VCT, ANC, laboratories [for provider-initiated testing and counseling (PITC)], and prevention of mother to child transmission of HIV (PMTCT).

Healthcare workers routinely collect data in registers on all persons testing for HIV at health facilities (18). We expanded the standard paper-based NACP referral form into a two-part paper form that included additional variables to be collected from all clients testing HIV positive. In this two-part form, health workers completed 19 questions on both parts of the form, with an additional four questions on the part retained by the health facility. These four question were: other names used by patient, whether they had previously received an HIV positive test result and, if they had, time and place of that test (**Table 1**). The "other names used" variable was included to facilitate patient followup and case matching. Previous positive test was introduced to identify potential duplicate records arising from repeat testing. The expanded form also contained a corresponding form number for matching the two-parts between testing and care.

We trained facility staff to collect pilot data prospectively on the referral form from 1st January 2017 to 31st December 2017. Facility staff were provided a small stipend (2,000 TSH ∼US\$1) for correct completion of each part of the form. Facility staff were instructed to complete both parts of the form; the first part of the form, with the extra questions, was retained, with the second part given to the client who was then instructed to provide it to the CTC at first attendance. Staff at the CTC were instructed to complete the bottom portion of the form (the section on HIV care) when the patient enrolled in HIV-related care.

A pilot fieldworker visited each participating facility once a month to gather completed referral forms and to review completeness. Data from the forms were entered into an Access database. The database was developed by the research team and included inbuilt validation rules to minimize data entry errors (for example, checking the age against year of birth, and the sex and age against pregnancy status), and to facilitate the matching of corresponding form numbers on the two-parts. Records with missing key variables (name, referral form number, sex, or age) were excluded from data analysis. De-duplication of cases was performed using personal identifying information, either using the three given names, sex, and age of the participant, or by matching one of the names, age, and sex with the same residence for exact matches.

Linkage to care was assessed through matching corresponding parts of the two-part referral form based on the form number and review of patient name. For persons who presented at the CTC without their referral form, a probabilistic match, similar to one used in a South Africa study, based on a close match of three out of four variables, name, sex, age, and/or residence, was used to link the CTC attendance to the referral form (19). Our definition of linkage to care is therefore based on evidence (referral form or probabilistic matching) of having presented at a CTC at least once. Time elapsed between testing and linkage to care was measured according to a person's date of HIV diagnosis (initial diagnosis) and date first seen in CTC (as presented on their twopart referral form). To allow follow-up time for linkage to care of those diagnosed toward the end of 2017, the last extraction of data occurred on 24th March 2018.

To assess ART use among persons linked to care, data from participating CTCs were extracted from electronic systems or the paper-based CTC form. A persons CTC number (as presented

### TABLE 1 | Data availability for the expanded referral form.


\**Ten-cell leader is an appointed number of households (originally 10 but often now higher), reporting to village, and sub-village authorities.*

on their referral form) was used to match data to their CTC record. The records of persons linked to care for whom key variable information was missing, or for whom the CTC file could not be identified, were excluded from this analysis. In receipt of ART was defined as being on ART at any point during the follow-up period.

Ethical approval was obtained from the Tanzanian National Research Ethics Committee (NatREC Ref NIMR/HQ/R.8a/Vol.IX/2097 extended on 9th March 2018 with NIMR/HQ/R.8c/Vol.II/961) and the London School of Hygiene and Tropical Medicine (#11844). Data used in these analyses were password protected and all study coordinators, data abstractors, and analysts signed a confidentiality form.

Analysis was performed in Stata 14 (Stata Corp., USA). Frequencies and cross tabulations were conducted, as was logistic regression to obtain odds ratios and 95% confidence intervals for factors associated with linkage to care. P-values are reported to show statistical compatibility of the data with the null hypothesis.

## RESULTS

Between 1st January 2017 and 31st December 2017, 1,312 persons were diagnosed with HIV across the 16 pilot facilities and had their information captured. Data quality activities resulted in the records of 32 (2.4%) people being further investigated. Of these, 13 were excluded from the analysis due to missing sex or age. The records of an additional 24 (1.8%) people were identified as having a duplicate record as a result of repeat testing within the follow-up period. Of these 24, five re-tested within the same facility and 19 re-tested at a different pilot facility. In total, the records of 1,275 unique people testing HIV positive were identified and included in our analysis.

Completeness for most variables exceeded 98%. The exceptions were, name of a person who lived in the same house (1,119; 87.8%), telephone number (708; 55.5%), and the additional question on previous positive HIV test (1,200; 94.1%).

## Demographic and Testing Characteristics

**Table 2** presents the demographic and testing characteristics. The male to female ratio was 1:1.4, with pregnant women accounting for 21.4% (159) of female participants. Median age at diagnosis was 32 years. A quarter of people (298; 23.4%) came from Kisesa ward (where the main health center in the Magu District is situated), with three other wards contributing 10% each (129 in Bujashi, 137 in Nyanguge, and 127 in Kitongo). The two most common diagnosing HTC types were PITC (556; 43.6%) and VCT (362; 28.4%). Interestingly, **Table 2** shows almost one in five men testing HIV positive to have done so through partner testing in ANC/PMTCT.

## Previous HIV Positive Test

The characteristics of the 1,200 people reporting previous test history are presented in **Table 2**. In total, 160 (13.3%) people reported having had a previous HIV positive test result. In addition, 24 people tested positive twice during the pilot period, resulting in a total of 184 (15.3%) people having

TABLE 2 | Demographic, testing, and clinical characteristics of persons testing HIV positive\*.


\**Where cell counts* <*5 masked a range was given to protect anonymity of participants.*

\*\**N* = *817 [932 people were linked to care (as identified through matching the two parts of the referral form or probabilistic matching); of these it was possible to match the CTC data to the HTC referral testing records for 818; of these 818, 1 had missing ART status].*

more than one HIV positive test. Females were more likely to report a previous positive test (15.5%; 109) compared to males (10.2%; 51) (p = 0.008).

One in five (28/153; 18.3%) previously tested within 1 year of their current test, with four in five (122/153; 79.7%) previously testing within 4 years. Three quarters (120/158; 75.9%) previously tested positive in a site different to the one they presented at during the pilot.

## Linkage to Care

The 1,275 HIV positive people were asked to which facility they wished to be referred (all but two responded). Almost all people (1,223; 95.9%) chose a referral facility included in the pilot, with 80.0% (1,019/1,273) asking to be referred to the CTC situated in the facility in which they had just tested HIV positive.

In total, 932 (73.1%) of people testing HIV positive were successfully linked to care (as identified by a referral form or probabilistic matching) during the pilot period. The majority of persons linked to care were seen at the facility to which they were referred (880; 94.4%), and/or were first seen the same day they received their test result (756; 81.1%). Among those not seen the same day, median time from diagnosis to entering care was 3 days (IQR 2–7 days).

Forty (4.3%) of the 932 people linked to care did not provide their referral form when first attending. Of these, all attended on a different date to that on which they were referred, and 11 (27.5%) attended a different facility to the one in which they were tested.

**Table 3** presents rates of, and factors associated with, linkage between testing, and care. In univariate analysis, ward of residence, pregnancy status, and testing facility services type were associated with linkage between testing and care (all p < 0.05). These associations all persisted after multivariate adjustment. In summary, the following groups were those most likely to be linked to care: living within the Magu district or catchment wards; currently pregnant; received HIV positive test in a dispensary facility that offers CTC services.

Of the 932 patients liked to care, we were able to find a CTC record for 909 (97.5%, 818 through matched CTC number and names on referral form and 91 through probabilistic matching), confirming they were not only linked to, but also enrolled in, care. Of these 909 people, 756 (83.2%) initiated ART (representing 59.3% of total positives), with the remaining 153 (16.8%) being seen only once at the facility. The majority (489; 64.7%) of persons who initiated ART did so on their first visit to the clinic. Among those remaining, median time from entry to care and ART initiation was seven days (IQR 6–13 days).

Based on information on previous positive test, 108 (67.5%) people were determined to have previously attended a CTC. Three (1.9%) were determined to have transferred from another CTC having already been in receipt of ART, and eight (5.0%) had previously been in receipt of ART (all more than a year previously).

## DISCUSSION

The pilot findings support the use of an expanded referral form for assessing, at the individual-level, linkage of newly diagnosed persons to HIV care. The system is simple and replicable, providing quality data (variable completeness typically >90%) in a high HIV burden, resource-poor setting. The pilot demonstrated that the paper-based referral system facilitated the collection of key variables necessary to describe and track the demographic and clinical characteristics of persons diagnosed and receiving HIV care, to de-duplicate records in a systematic manner, and to enhance our understanding of the HIV epidemic.

Three in four people diagnosed with HIV subsequently enrolled in care; with the overwhelming majority doing so the same day they tested positive, and at the facility to which they were referred. Factors most strongly associated with being linked to care were being currently pregnant, and being diagnosed positive in a dispensary facility that offers CTC services. Whereas, eight in10 pregnant women diagnosed with HIV were subsequently linked to care (highlighting there is still room for improvement within this highly important group) it is of interest that no significant association between HTC type (which includes ANC and PMTCT) and linkage to care was observed. A possible explanation for this, is that testing in ANC/PMTCT included not only women but also a sizeable number of male partners.

The association with the type of CTC services being offered at the testing facility suggests structural barriers to access of care, which can only be resolved through enhanced service planning. Other studies conducted in Tanzania have found similar associations. A study conducted in the Mbeya region found that people testing positive at a facility with a CTC, were 78% more likely to link to care compared to people testing at mobile/outreach sites (7). Another study in the Tanga region found a significant association between early entry in care and point of diagnosis, level of education and CD4 count (9). A study in the Kilimanjaro region showed that people testing in a community VCT facility were twice as likely to delay care compared to people testing in a hospital outpatient department (10).

Our finding that approximately a quarter of cases did not link to care during the pilot, is similar to the findings of other studies in Tanzania. The study in the Kilimanjaro region found that only 70% had presented to for HIV care within 6 months after receiving a positive HIV test result (10), and in the Mbeya region study, 78% linked to care within 6 months (7). Linkage to care rates vary throughout sub-Saharan Africa. In Kenya, 88% of adults linked to care within three months (20); a field assessment in Mozambique showed 67% linked to care (21); a retrospective cohort study in Cape Town, South Africa identified linkage to care as 63% (22); in Lesotho, a study found linkage to care at a facility to be 43% versus 69% if ART was initiated at home (23); and a systematic review showed linkage to care of newly diagnosed with HIV in sub-Saharan Africa ranged from 10–79% (24). As the expanded referral form includes location information it will be possible for facilities to intervene to locate and engage with persons not immediately linking to care; this important intervention was not assessed during the pilot.

Approximately 15% of people included in the pilot were found to have tested HIV positive more than once, of these 70% were previously linked to care. This finding highlights the importance of obtaining individual-level data on previous HIV test results, and potentially, a need to consider adjusting for such duplicate records in prevalence estimates based on standard aggregate reporting (wider geographically based estimates would be required to inform such an adjustment).

Introducing a form number on both parts of a two-part referral form assisted with matching a persons records between testing and care. We should note, however, that the form number does not help match cases that retest for HIV. De-duplication of these records was conducted, using additional identifiers available on the referral form. It is possible that the provision of a small stipend resulted in high variable completeness which, in turn, contributed to the ability to effectively match records. This process would be more difficult on a larger scale; an algorithm would need to be tested on a larger dataset, as is used in other countries without a national unique identifier (25–27).

The NACP already has a standard referral form and since the majority of data are already routinely collected, the work effort for facility staff to continue to complete the two-part form would be minimal. The additional cost of data entry of the referral forms could also be seen as minimal when compared to the enhanced availability and accuracy of the data. The pilot utilized field staff to gather forms from the facility on a monthly basis. This was feasible at the scale the pilot was conducted but could prove to be a human resource burden at national scale. Additionally, the pilot utilized an Access database, for implementation at national level, data entry and analysis would need to be incorporated into the existing national system. An



*Numbers are as among those providing a response.*

\**Reference group.*

\*\**Significant associations at the 95% level are shown in italics.*

\*\*\**Variables identified in univariate analysis as being associated with linkage to care included in multivariate model.*

\$*Catchment areas for the 16 health facilities; dispensaries serve a village within a ward whereas health centers serve a ward.*

assessment should be undertaken on alternate strategies for data collection and entry.

Due to practical limitations, the pilot was conducted in only a portion of the facilities in one district. As such, linkage to care could have occurred at a facility outside of the pilot. No hospitals were included, therefore the pilot does not assess the burden the linkage system would have on a larger facility which has more cases of HIV. Despite these limitations, the pilot provides valuable information that is not available elsewhere and that can be used to determine the way forward for linking patients to HIV care and identifying potential duplicate records.

Case matching and de-duplication are of vital importance for accurately assessing the 90-90-90 targets and HIV care cascade, and for moving toward HIV case surveillance. The ability to distinguish between new and previously reported cases increases the quality of the data, and of analyses using these data. The 2017–2022 HIV Strategic Plan in Tanzania calls for strengthening linkage mechanism, and notes that a challenge to achieving this is the lack of reliable referral data (28). To utilize de-duplicated data to monitor the HIV epidemic and ensure linkage from testing to care timely manner, we recommend a two-part referral form be used for all HIV diagnoses. A new initiative by NACP to collect all HTC data in the CTC electronic database may also provide a better understanding of HTC and CTC attendance, although will require rigorous evaluation.

## REFERENCES


## DATA AVAILABILITY STATEMENT

The datasets for this article contain protected health information and are not publicly available. Requests to access the datasets should be directed to Jim Todd, Jim.Todd@LSHTM.ac.uk.

## AUTHOR CONTRIBUTIONS

The concept for the study was developed by JT, MU, BR, and RH. RM coordinated data collection. RH, MW, JT, JB, JH, RM, and BR conducted data analysis and/or data interpretation. All authors read the manuscript, provided feedback, and approved the final version.

## FUNDING

This work was supported by the Bill and Melinda Gates Foundation [OPP1120138L] and the Global Fund to Fight AIDS, Tuberculosis, and Malaria [TNZ-911-G14-S].

## ACKNOWLEDGMENTS

We wish to thank the MeSH Consortium for their oversight in this work and all of the staff from the health facilities who collected data for the pilot.

therapy initiation: a prospective 3.5 year cohort study of HIV positive testers in northern Tanzania. BMC Infectious Dis. (2016) 16:497. doi: 10.1186/s12879-016-1804-8


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Harklerode, Todd, de Wit, Beard, Urassa, Machemba, Maduhu, Hargreaves, Somi and Rice. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Performance of and Factors Associated With Tuberculosis Screening and Diagnosis Among People Living With HIV: Analysis of 2012–2016 Routine HIV Data in Tanzania

### Edited by:

*Zisis Kozlakidis, International Agency for Research on Cancer (IARC), France*

### Reviewed by:

*Carl-Magnus Svensson, Leibniz Institute for Natural Product Research and Infection Biology, Germany Jose Roberto Lapa E. Silva, Federal University of Rio de Janeiro, Brazil*

\*Correspondence:

*Werner Maokola drwernerm@yahoo.com*

### Specialty section:

*This article was submitted to Infectious Diseases - Surveillance, Prevention and Treatment, a section of the journal Frontiers in Public Health*

Received: *17 June 2019* Accepted: *18 December 2019* Published: *06 February 2020*

### Citation:

*Maokola W, Ngowi B, Lawson L, Mahande M, Todd J and Msuya SE (2020) Performance of and Factors Associated With Tuberculosis Screening and Diagnosis Among People Living With HIV: Analysis of 2012–2016 Routine HIV Data in Tanzania. Front. Public Health 7:404. doi: 10.3389/fpubh.2019.00404* Werner Maokola1,2 \*, Bernard Ngowi <sup>3</sup> , Lovetti Lawson<sup>4</sup> , Michael Mahande<sup>2</sup> , Jim Todd2,5 and Sia E. Msuya<sup>2</sup>

*<sup>1</sup> National AIDS Control Program/Ministry of Health, Community Development, Gender, Elderly and Children, Dar es Salaam, Tanzania, <sup>2</sup> Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi Urban, Tanzania, <sup>3</sup> National Institute of Medical Research, Dar es Salaam, Tanzania, <sup>4</sup> Zanklin Medical Center, Abuja, Nigeria, <sup>5</sup> London School of Hygiene and Tropical Medicine, London, United Kingdom*

People Living with HIV (PLHIV) should be screened for tuberculosis (TB) at every visit to the HIV care and treatment clinic (CTC), and those with positive results on screening should undergo further diagnostic investigations. We evaluated the performance of the TB diagnosis cascade among PLHIV attending CTC between January 2012 and December 2016 in three regions of Tanzania: Dar es Salaam, Iringa, and Njombe. We used descriptive epidemiology to evaluate performance and logistic regression to determine odds ratios (OR) for factors associated with TB screening and further TB diagnosis after positive TB screening. We analyzed 169,741 PLHIV who made 2,638,876 visits to CTC between January 2012 and December 2016. We excluded 2,074 (0.80%) visits as these involved PLHIV enrolled in CTC with a prior TB disease diagnosis. Of the 2,636,802 visits, 2,524,494 (95.67%) had TB screening according to national guidelines, of which 88,028 (3.49%) had TB screening positive results. Of the 88,028 visits with a positive TB screening, 27,810 (31.59%) had no records for further TB diagnosis following positive TB screening. Of all visits with positive TB screening, 32,986 (37.50%) had a TB disease diagnosis. On multivariate logistic regression, those who visited with World Health Organization (WHO) clinical stage four (aOR = 3.61, 95% CI 3.48–3.75, *P* < 0.001), enrolled in health center (aOR = 1.26, 95% CI 1.24–1.29, *P* < 0.001), enrolled in Iringa region (aOR = 1.54, 95% CI 1.50–1.57, *P* < 0.001), and enrolled in 2015 (aOR = 1.20, 95% CI 1.18–1.24, *P* < 0.001) were more likely to have no TB screening. Visits involving those who were of the female sex (aOR = 1.14, 95% CI 1.11–1.18, *P* < 0.001), enrolled in Njombe region (aOR = 4.36, 95% CI 4.09–4.65, *P* < 0.001), and enrolled in 2016 (aOR = 2.62, 95% CI 2.49–2.77, *P* < 0.001) were more likely to have no further TB diagnosis after positive TB screening. The study documented high performance of TB screening for PLHIV in HIV CTCs but a low transition of presumptive TB case undergoing further investigations. Better systems are needed for ensuring presumptive TB cases are diagnosed including using more efficient diagnostic methods like Gene pert.

Keywords: tuberculosis, screening, diagnosis, HIV, Tanzania

## INTRODUCTION

Despite wide use of antiretroviral (ARV) drugs, tuberculosis (TB) is still a public health challenge for People Living with HIV (PLHIV) (1). In autopsy studies among PLHIV in Africa, TB was present in 21–54% of people, and TB was the cause of death in 32–45% of the PLHIV in the study (2). The risk of TB is higher among PLHIV; HIV-positive individuals are up to 26 times more likely to have active TB disease compared to HIV negative individuals, and globally at least 30% of PLHIV have latent TB (3). Worldwide, in 2018, 10.0 million TB cases were notified, and of these 24% were from the African continent. TBHIV accounted for 8.6% of the notified TB cases. Of the 1.5 million TB deaths reported in 2018, 17% (251,000) of deaths were HIV positive (1). In Tanzania, which is among 30 high TB-burdened countries, a total of 142,000 TB cases were diagnosed with TB disease in 2018, of which 40,000 (28%) were HIV positive (1).

To reduce TB among PLHIV, the World Health Organization (WHO) recommends Intensified TB case finding (ICF) among PLHIV attending HIV care and treatment clinics (CTC), which entails active TB screening for symptoms and signs using standardized TB screening questions (4). In Tanzania, those who screen positive undergo further TB diagnosis according to the National Guidelines (5). TB screening using symptoms and signs among PLHIV is important to increase TB disease diagnosis, and ICF is thus a gateway to TB management among PLHIV (6, 7). There is an urgency to build better TB prevention in resourcelimited settings, but to do so we need to evaluate how well the current cascade is working (8).

TB diagnosis cascade among PLHIV in CTCs faces a number of challenges, including those lost to follow up in the diagnosis cascade (9, 10). A study in Northern Uganda looking at improved TB case notification among both HIV-positive and HIV-negative individuals found a TB positivity rate of 3.5% among the 385 HIV positive who were positive on the screening questions. A prospective study in India found that 30% screened TB positive, and 35% of these were referred for TB diagnostic tests, and 15% had confirmed TB (11). In Kenya, a study reported a high TB screening among 1,020 newly diagnosed PLHIV; 98% of PLHIV were screened, but only 16% of those screened positive underwent further TB diagnosis evaluation, and 26 (2.6%) were eventually diagnosed with TB (12). In Ethiopia, 72% of PLHIV with positive TB screening were linked to sputum microscopy for TB with the remaining PLHIV diagnosed using other methods (13). A study conducted in Ghana also found that sputum for smear microscopy was requested for 58.7% of those who needed (14).

In this paper, we analyzed routine HIV data from PLHIV enrolled in CTC from January 2012 and December 2016. We evaluated the performance of the TB diagnosis cascade for PLHIV attending CTC and determined factors associated with TB screening and further TB diagnosis after positive TB screening. The aim of the paper was to provide evidence to the Tanzanian Ministry of Health Community Development, Gender, Elderly and Children (MoHCDGE) on the performance of TB diagnosis cascade among PLHIV attending CTC in Tanzania for quality improvement. The findings from this study will also be applicable to other developing countries where HIV and TB are also prevalent.

## MATERIALS AND METHODS

The study involved electronic patients' records that were routinely collected and contained in the CTC database from 317 health facilities in three regions: Iringa, Njombe, and Dar-es-Salaam. The study used retrospective data from PLHIV enrolled in CTC from January 2012 to December 2016. PLHIV enrolled into CTC with an existing diagnosis of TB disease from TB clinics were excluded from the study.

The United Republic of Tanzania comprises of mainland Tanzania and Zanzibar with a population of about 44 million in 2012 (15). Tanzania Mainland has a generalized HIV epidemic with the first case of HIV reported in 1983. By 1986, all regions in the country had reported at least one case (16). The Tanzania HIV Indicator Survey showed an overall prevalence of HIV of 5.1% in 2011/2012 and 4.7% in 2016/17. The three regions were purposely selected as they had higher HIV prevalence according to the recent National HIV survey (17).

Tanzania have rolled out widespread, freely available antiretroviral therapy services in HIV CTC ever since 2004. For TB management in CTC, PLHIV were screened for TB using the standardized WHO questions at every visit to the clinics. Those who screened positive were further evaluated for TB disease using either sputum examination, radiology, or clinical diagnosis according to a TB management algorithm (5). Those with confirmed TB disease were started on anti-TB therapy. PLHIV who were negative on the TB screening were assessed for their eligibility for Isoniazid Preventive Therapy (IPT). Those who became eligible for IPT were kept on daily Isoniazid tablets for 6 months to prevent active TB disease. TB infection control measures were another set of interventions implemented in HIV clinics to prevent TB transmission among PLHIV and staff (**Figure 1**).

## Data Extraction Process and Variables Descriptions

De-identified secondary data were extracted from the main national HIV care and treatment database. The independent variables of interest included age, sex, WHO clinical stage, health

facility type (Hospital, Health center or Dispensary), health facility ownership (public or private/Faith Based Organization), region during CTC enrolment, TB screening status (screen or not screened), and TB diagnosis status after positive TB screening (Further TB diagnosis or no further TB diagnosis). The final dataset excluded all PLHIV with known TB diagnosis before CTC enrolment.

## Data Analysis

Stata version 14 (Stata Corporation, College Station, Texas, USA) was used for data cleaning, merging, and analysis. Descriptive statistics used mean and standard deviation for continuous variables and frequencies and proportions for categorical variables. For each estimate, a 95% Confidence Interval (95% CI) was also calculated. Bivariate logistic regression was used to determine odds ratios (OR), and 95% CI for the association between independent and dependent variables. The likelihood ratio test was used to obtain chi-square statistic and p-value for each analysis. Finally, all independent variables with a p of ≤0.2 in a bivariate analysis were subjected in a multivariate logistic regression to determine the adjusted OR and 95% CI for independent factors associated with TB screening and further TB diagnosis after positive TB screening. In the final analysis, a cut-off p ≤ 0.05 was considered statistically significant.

## Ethical Consideration

The study involved secondary analysis of unlinked data; hence, there was no contact with human subjects. The permission to conduct the study was obtained from the National Institute of Medical Research-Tanzania, and permission to use routinely collected data was obtained from National AIDS Control Program by signing a data transfer agreement. This study was carried out as part of the PhD research by Werner Maokola with ethical permission for the PhD from KCMU college.

## RESULTS

A total of 171,743 PLHIV were enrolled in HIV CTC in the Dar es Salaam, Iringa, and Njombe regions from January 2012 to December 2016. The mean age for the PLHIV was 35 years (Range: 0–97 years). A total of 2,074 PLHIV were not included in the analysis as they had TB disease before CTC enrolment. Of the remaining 171,669, the majority of PLHIV were aged 25–49 years; 123,581 (71.99%), females; 117,925 (68.69%), had working functional status; 164,076 (95.58%), and had WHO clinical stage one; 62,535 (36.43%). Most of the study participants were also enrolled in 2016, 37,788 (22.01%), were from dispensary level, 65,405 (38.10%), and were from public health facilities, 123,964 (72.2%) (**Table 1**).

The study cohort made a total of 2,638,876 visits during the follow-up period; however, 2,074 visits (0.80%) visits were excluded from the analysis as they involved TB disease diagnosis before CTC enrolment. Of the remaining 2,636,802 visits, 2,524,494 (95.67%) had TB screening. Of these, 88,028 (3.49%) had TB-positive screening results. Of the visits with TB screening, 48,930 (55.58%) had sputum examinations, 10,496 (11.92%) had a chest X-ray, 792 (0.90%) had a TB diagnosis through clinical criteria, and 27,810 had no records for further TB diagnosis following positive TB screening. Of all visitors with a positive TB screening, 32,986 (37.50%) had a TB disease diagnosis (**Figure 2**). TABLE 1 | Baseline characteristics of PLHIV enrolled in CTC between 2012 and 2016 in three regions of Tanzania *N* = 171,669.


*WHO, World Health Organization.*

### Factors Associated With TB Screening

On multivariate logistic regression analysis, visitors with WHO clinical stage four (aOR = 3.61, 95% CI 3.48–3.75, P < 0.001) and who were enrolled in a health center (aOR = 1.26, 95% CI 1.24–1.29, P < 0.001), enrolled in the Iringa region (aOR = 1.54, 95% CI 1.50–1.57, P < 0.001), and enrolled in 2015 (aOR = 1.20, 95% CI 1.18–1.24, P < 0.001) were more likely to have no TB screening. Visits that involved females (aOR = 0.68, 95% CI 0.67– 0.69, P < 0.001) and working functional status (aOR = 0.45, 95% CI 0.43–0.48, P < 0.001) were less likely to have no TB screening (**Table 2**).

## Factors Associated With Further TB Diagnosis After Positive TB Screening

Upon multivariate logistic regression analyses, visits involving the female sex (aOR = 1.14, 95% CI 1.11–1.18, P < 0.001), enrolment in the Njombe region (aOR = 4.36, 95% CI 4.09– 4.65, P < 0.001), and enrolment in 2016 (aOR = 2.62, 95% CI 2.49–2.77, P < 0.001) were more likely to have no further TB diagnosis after a positive TB screening. Visitors with working functional status (aOR = 0.62, 95% CI 0.58– 0.67, P < 0.001), enrolment in hospitals (aOR = 0.39, 95% CI 0.38–0.41, P < 0.001), and attending public health facilities (aOR = 0.68, 95% CI 0.65–0.71) were less likely to have no further TB diagnosis after positive TB screening (**Table 3**).

## DISCUSSION

The present study provided an analysis of the TB diagnosis cascade for PLHIV enrolled in HIV CTCs in three regions with high HIV prevalence in Tanzania. The study cohort consisted mainly of all PLHIV attending CTC with majority at early stages of the infection (WHO clinical stage one and working functional status). Efforts have been made to make sure that people know their HIV status and that

those found to be infected enroll into care. TB is the most common comorbidity among PLHIV, and it is important to ensure all those enrolled in CTC are assessed, diagnosed, and treated at every opportunity in accordance with Tanzania national Guidelines.

Ninety-five percent (95.67%) of the visitors recorded in the study cohort had TB screening. Routine TB screening is in line with the WHO recommendations (18), and such high TB screening scores among PLHIV has also been reported in other studies (12, 19–21). TB screening is known to be good to rule out TB disease in PLHIV with a negative predictive value of 97.7% (22). It is important to know that TB screening is consistently carried out in all CTC, but we found that TB screening was unlikely among PLHIV with advanced disease (WHO clinical

### TABLE 2 | Factors associated with TB screening: *N* = 2,636,691.


stage 4), enrolment in health centers, enrolment in the Iringa region, and who enrolled in 2015. Health care providers may find it easier to administer TB screening questionnaire to healthier PLHIV than to those with unfavorable functional status who may already have had other investigations for comorbidities. Our observation that TB screening performance varied across health facility types, across administrative regions, and across implementation time calls for tailored interventions to improve TB screening.

Our study found that only 3.49% of visits resulted in a positive screening score over a 4 year period. This was similar to a study in Kenya where routine screening in a cohort of PLHIV attending HIV services had <1% visits where a positive screen was recorded (21). In another study in Kenya among 1,060 newly enrolled PLHIV, 62% reported symptoms of TB, but only 26 cases of TB were found. This demonstrates the higher prevalence of symptoms among PLHIV prior to attending HIV services and the low sensitivity of TB screening (12). Our finding is similar to other longitudinal repeated screening of cohorts of PLHIV (21). The lower positive TB screening in cohorts attending HIV services can be attributed to several factors. In these programs, ART is now available to all PLHIV, and this is known to reduce TB morbidity, mortality and incidence (8). Moreover, in 2014, isoniazid prevention therapy (IPT) was introduced in Tanzania TABLE 3 | Factors associated with no further TB diagnosis among screened positive: *N* = 88,008.


for PLHIV attending CTC, with a decline in TB incidence over the 6 years of follow up (23). Conversely, some PLHIV may not have typical symptoms of TB disease and may be missed by the screening, although screening is reported to have a 97% NPV for ruling out TB infection in PLHIV (24). Further analysis of whether low positive TB screening was a reality or due to a weaknesses in the screening process will be evaluated in subsequent analyses.

Up to 31.59% of visits did not receive a further TB diagnosis procedure after positive TB screening. In our study, we found that lack of further follow up after a positive TB screening was higher among females, those enrolled in Njombe region, and those enrolled in 2016. Studies in Tajikistan and Uganda reported a loss to follow up in TB diagnosis of 15 and 33.4%, respectively. A case-control study in Tajikistan documented risk factors for loss of follow up in TB diagnosis: movements, drug side effects, previous TB treatment, patient refusal, stigma, and family problems (25). A retrospective cohort study of 646 records of PLHIV initiated on ART in Uganda found that loss to follow up was associated with good health (normal weight), attendance in Hospitals, and having no telephone contact (26). Resources to train health care workers have been set aside to make sure that PLHIV are screened at every clinic visit in Tanzania, but further resources are needed to ensure the next step in the TB cascade is taken, and further investigations are undertaken on all who screen positive.

The study found low use of sputum examination for TB diagnosis (55.58%). This finding is lower than those reported in Ethiopia and another study in Tanzania. In Ethiopia, the AFB diagnostic method was up to 72% of all the TB diagnostic methods (13), whereas in Tanzania, AFB microscopy was reported to be the most available TB diagnostic method (27). Generally, our study found lower use of sputum microscopy examination for TB disease diagnosis unlike in Ethiopia and Tanzania (13, 27). In Tanzania, a sputum examination is the first test for TB disease diagnosis. All people presumed to have TB disease have to undergo a sputum examination. Other diagnostic tests such as chest radiography and clinical diagnosis come later in the algorithm (5).

Our study found only 37.50% of visitors among positive TB screening had TB disease. Such a high TB diseases diagnosis among PLHIV with positive TB screening results was recorded elsewhere in Kenya (21). It is important to note that TB screening tools with high specificity for TB symptoms are important in TB control as resources for TB diagnosis and will be used efficiently for those who most likely have TB disease.

## CONCLUSION

The study documented the high performance of TB screening for PLHIV in HIV CTCs, but a low transition of suspected TB cases

## REFERENCES


undergoing further investigations. Better systems are needed to ensure suspected TB cases are diagnosed, and this includes using more efficient diagnostic methods like GeneXpert.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## AUTHOR CONTRIBUTIONS

WM conceptualized the idea of the study, designed the study, analyzed data, interpreted the results, and drafted the manuscript. MM and JT helped to analyze the data and interpret the results. BN, LL, MM, JT, and SM reviewed the manuscript. All authors approved the final version of the manuscript.

## FUNDING

This research was funded by the Bill and Melinda Gates Foundation.

## ACKNOWLEDGMENTS

The authors would like to acknowledge financial support from the Bill and Melinda Gates Foundation, as well as support from MOHCDGEC Tanzania for allowing analysis of routine data.


tuberculosis in HIV-infected adults: a cohort study performed at ethiopian health centers. Open Forum Infect Dis. (2014) 1:ofu095. doi: 10.1093/ofid/ ofu095


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Maokola, Ngowi, Lawson, Mahande, Todd and Msuya. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Discrete Survival Time Constructions for Studying Marital Formation and Dissolution in Rural South Africa

Jesca M. Batidzirai <sup>1</sup> \*, Samuel O. M. Manda1,2,3, Henry G. Mwambi <sup>1</sup> and Frank Tanser 4,5,6

*<sup>1</sup> School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa, <sup>2</sup> Biostatistics Unit, South African Medical Research Council, Pretoria, South Africa, <sup>3</sup> Department of Statistics, University of Pretoria, Pretoria, South Africa, <sup>4</sup> Africa Health Research Institute, KwaZulu-Natal, South Africa, <sup>5</sup> Lincoln Institute for Health, University of Lincoln, Lincoln, United Kingdom, <sup>6</sup> School of Nursing and Public Health, University of KwaZulu-Natal, Durban, South Africa*

Introduction: Marriage formation and dissolution are important life-course events which impact psychological well-being and health of adults and children experiencing the events. Family studies have usually concentrated on analyzing single transitions including *Never Married* to *Married* and *Married* to *Divorced*. This does not allow understanding and interrogation of dynamics of these life changing events and their effects on individuals and their families. The objective of this study was to assess determinants associated with transitions between and within marital states in South Africa.

### Edited by:

*Jim Todd, London School of Hygiene and Tropical Medicine, University of London, United Kingdom*

### Reviewed by:

*Mwita Wambura, National Institute of Medical Research, Tanzania Asungushe Bonaventura Kayombo, London School of Hygiene and Tropical Medicine, University of London, United Kingdom*

> \*Correspondence: *Jesca M. Batidzirai batidzirai@ukzn.ac.za*

### Specialty section:

*This article was submitted to Health Psychology, a section of the journal Frontiers in Psychology*

Received: *17 May 2019* Accepted: *21 January 2020* Published: *18 February 2020*

### Citation:

*Batidzirai JM, Manda SOM, Mwambi HG and Tanser F (2020) Discrete Survival Time Constructions for Studying Marital Formation and Dissolution in Rural South Africa. Front. Psychol. 11:154. doi: 10.3389/fpsyg.2020.00154* Methods: The population-based data available for this study consists of over 55, 000 subjects representing over 340, 000 person-years exposure from the Africa Health Research Institute (AHRI) in rural KwaZulu-Natal, South Africa. It was collected from 1 January 2004 to 31 December 2016. Multilevel multinomial, binary and competing risks regression models were used to model marital state occupation, transitions between marital states as well as investigate determinants of marital dissolution, respectively.

Results: Between the years 2006 and 2007, a subject was more likely to be married than never married when compared to years 2004 − 2005. After 2007, subjects were less likely to be married than never married and the trend reduced over the years up to 2016 [with *OR*=0.86, *CI*=(0.78; 0.94), *OR*=0.71, *CI*=(0.64; 0.78), *OR*=0.60, *CI*=(0.54; 0.67), *OR*=0.50, *CI*=(0.44; 0.56), and *OR* = 0.43, *CI* = (0.38; 0.48)] for periods 2008 − 2009, 2010 − 2011, 2012 − 2013, 2014 − 2015, and 2016, respectively. In 2008 − 2009, subjects were more likely to experience a marital dissolution than in the period 2004 − 2005 and the trend slightly reduces from 2010 until 2013 [*OR*=24.49, *CI*=(5.53; 108.37)]. Raising age at first sexual debut was found to be inversely associated with a marital dissolution [*OR* = 0.97; *CI* = (0.95; 0.99)]. Highly educated subjects were more likely to stay in one marital state than those who never went to school [*OR*=6.43, *CI*=(4.89; 8.47), *OR*=18.86, *CI*=(1.14; 53.31), and OR=2.96, CI=(1.96; 4.46) for being married, separated and widowed, respectively, among subjects with tertiary education]. As the age at first marriage increased, subjects became less likely to experience a marital separation [*OR* = 0.06, *CI* = (0.00; 1.11), *OR* = 0.05, *CI* = (0.00; 0.91), and *OR* = 0.04, *CI* = (0.00; 0.76) for subjects who entered a first marriage at ages 18 − 22, 23 − 29, and 30 − 40, respectively].

Conclusion: The study found that marrying at later ages is associated with a lower rate of marital dissolution while more educated subjects tend to stay longer in one marital state. Sexual debut at later ages was associated with a lower likelihood of experiencing a marital dissolution. There could, however, be some factors that are not accounted for in the model that may lead to heterogeneity in these dynamics in our model specification which are captured by the random effects in the model. Nonetheless, we may postulate that existing programs that encourage delay in onset of sexual activity for HIV risk reduction for example, may also have a positive impact on lowering rates of marital dissolution, thus ultimately improving psychological and physical health.

Keywords: discrete time survival, multi-state models, multilevel models, competing risks, state transition

## 1. INTRODUCTION

Timing of marriage and marriage dissolution are associated with the psychological well-being and health of adults and children experiencing the events. Evidence suggests that early marriages and marital dissolutions increase the rates of stress, depression, high blood pressure, anxiety, aggression, suicide thoughts, and many other mental health disorders (Amato, 2005; Moon, 2011; Hashemi and Homayuni, 2017). Early marriages may also affect a woman's chance of educational and economic empowerment (Heward and Bunwaree, 1999; Amato, 2005; Hasselmo et al., 2015). On the other hand, one of the consequences of a marital dissolution where children are involved is child-headed families, which in turn adversely affects the development of children themselves. Children raised with divorced parents experience deferentially worse health and developmental profiles and lower survival rates compared to children living with stable and in union parents (Mackay, 2005). The rate of suicide (and suicidal thoughts) has been found to be associated with family dissolution, both for partners (Gove, 1973; Lillard and Panis, 1996; Kazan et al., 2016) and children (Kreitman, 1977; Gould et al., 1998). These health outcomes on vulnerable individuals could be due to stigma and societal norms that frown upon women (or men) who are divorced or separated and their children or due to thoughts of loss of material or financial belongings (Konstam et al., 2016).

While the above might be true, subjects may also get out of bad marriages so as to free themselves and have better well-beings. Despite these intertwining relationships and consequences, South Africans, including those in the rural areas, still find themselves in a system where they marry, separate, remarry, or become widowed while others remain in one marital state for a long period of time. For family planning practitioners and demographers to make informed decisions (for their intervention programs), they need to understand the patterns of movements or transitions between these marital states. Limited research on transitions between marital states has been done but in some instances, researchers would look at only transitions between two non-recurrent states, such as first marriage (see Bramlett and Mosher, 2001; Manda and Meyer, 2005; Hosegood et al., 2009) or divorce from marriage (Clark and Brauner-Otto, 2015). Robust statistical models have been developed and may be used to highlight issues on the dynamics in marital formation and dissolution. However, as Tanser et al. (2003) points out, a drawback of some statistically-driven models are that data sets used to develop the models are often of uncertain accuracy, models are not easily reproducible and the results are often applicable only to national or subregional scales.

We use multilevel discrete time to event models on a rich prospective data from a population-based cohort to study three marriage dynamics scenarios. We explore substantive issues concerning marital formation and dissolution in rural South Africa. Specifically, we investigate the determinants of marital state occupation, transition between marital states as well as marriage failure.

## 2. METHODS

In this study, we will consider three scenarios: the marital state a subject occupies at given ages, transitions between marriage states and how certain states end. These are discussed below.

## 2.1. Marital State Occupation

At a given age, a subject is known to have a particular marital status. We are interested in determining which age groups are likely to occupy a particular marital state. However, subjects do not only stay in marital states without some contributing factors. For researchers to ascertain why a subject may be in a given marital state at a particular age group, they must first understand different marriage formation and dissolution dynamics. Therefore, it is important to study the factors leading to marital state occupation.

## 2.2. Single Transitions Between Marital States

The transition between marital states is of importance to understand. For instance, the age of entering a first-time marriage might be of interest to family planners and to the civilization of a society at large as age at first marriage is one example of a transition that is directly associated with a country's health, fertility and economic state (Manda and Meyer, 2005; Jones and Gubhaju, 2009). Furthermore, a widowed or separated subject may end up re-marrying (making a transition into a Married state again). For those who would have experienced a marital dissolution, it is important to determine what factors make them re-marry and which age groups tend to re-marry or remain in a marital dissolution state. Marriage counselors and other organizations seek to give advises to those who consider a re-marriage.

## 2.3. Termination of a Marriage State

Some marriage-supporting institutions, including religious communities, advocate for married subjects to stay married. However, due to life's unfortunate events, some marriages may fail. There are two main possible ways of marriage dissolution: separation and widowhood (death of a partner). As such, it is important to study how married subjects make a transition out of the Married state. Again, understanding these possible transitions out of a marriage would assist those organizations that seek to empower widowed women, for example.

## 2.4. Data

### 2.4.1. Study Area

The data used in this study was collected by the Africa Health Research Institute (AHRI) which is situated in the rural area of northern KwaZulu-Natal of South Africa (Tanser et al., 2007). The surveillance area is near Mtubatuba, in the Umkanyakude rural district. It constitutes an area of 438 square kilometers with a highly mobile population of 90,000 people who are members of 11,000 households. These include all individuals reported by household informants as household members regardless of them being resident or non-resident (Tanser et al., 2007; Hosegood et al., 2009). In the event where a subject is not available to respond, the head of household would be the suitable alternative informant. The Africa Centre Demographic Information System (ACDIS) started data collection on 1 January 2004 and it is an ongoing study. However, for the purposes of this research, we only considered subjects who were followed up from January 2004 to December 2016. For all registered households and individuals, demographic and health information was collected every 4 months. Because the surveillance cohorts are dynamic, subjects may enter or leave the cohort through migration or births and deaths at any time (Tanser et al., 2007), thus we have a case of ragged study entries. The migration might be within the surveillance area itself (where subjects change households) or as a result of in-migration and out-migration, as defined by Dobra et al. (2017). As such, the participation rates at each wave is about 95% for household data collection. Therefore, to minimize non-response, where respondents are either nonresident or unavailable, suitable household members are selected as alternative informants.

## 2.4.2. Data for This Study

Among other data sets, AHRI collects longitudinal data on life course history of marriage and marriage dissolution events. To best of our knowledge, this study is the first attempt to analyze these data with advanced multilevel discrete time to event methods to study dynamics in family formation and dissolution. Previous analyses have used different statistical methods, such as Hosegood et al. (2009) who used descriptive statistics to compare the 2000 and the 2006 cohorts. In the case where discrete time event history analysis methods were used it was on HIV (Tomita et al., 2017) or on different study cohorts with a different subject area (Clark et al., 2013; Houle et al., 2014; Clark and Brauner-Otto, 2015). Of the 69, 134 observations (across all ages), this study considered 59, 792 episodes from 56, 308 unique subjects (comprising 343, 758 person-years of follow up) who were aged between 17 and 65 years. These were enrolled between January 2004 and December 2016, with an average of 1.06 episodes per person. The subject who experienced the highest number of episodes in the data set had 8 episodes and subjects stayed in a state for an average of 3.07 years. From the available and usable data, four marital states were considered (Never Married, Married, Separated, and Widowed) and each subject would be in one of these states at each time of visit. Separation in this context refers to any marital dissolution other than death of a partner, which could be a legal divorce or an informal divorce, both permanent and temporary. Married state includes any type of marriage, such as traditional African marriage, polygamous marriage and civil marriage.

Subjects would move between the four marital states and the possible transition types to be considered for binary regression models are entry into first marriage (Never Married to Married), exiting a marriage (Married →Separated and Married →Widowed), and remarriage (Separated →Married and Widowed →Married). For the competing risks model, we considered only three marital states, Married, Separated, and Widowed. A married subject has two possible ways that the marriage can end: either through separation or death of a partner.

## 2.4.3. Censoring

Due to the dynamic nature of the population, some subjects would not be available to respond to the interview at certain follow-ups and there might be no one in the household who could respond on their behalf. As a common phenomenon in longitudinal studies, this results in heavy censoring of the data. For subjects who have intermittent missing data, the last observation carried forward (LOCF) approach of data imputation was done. This method has been widely used in longitudinal studies as it assumes that the outcome remains constant at the last observed value after the dropout. For more and insightful details on the LOCF, we refer to Shao and Zhong (2003).

## 2.4.4. Ethical Clearance

Ethical approval for the study conducted by AHRI was obtained from the University of KwaZulu-Natal's Ethics Committee (BE 169/15). Both informed verbal and written consent were sought from study participants and details regarding operational and methodological procedures of ACDIS are well-documented by Muhwava et al. (2008). In cases where a participant was under the age of 16, written consent was provided by their parent or guardian. However, in this study, we do not consider participants under the age of 16.

## 2.5. Statistical Analysis

For the analysis, we use age as the time scale and consider it as a discrete variable. The main advantage of discrete time analysis


TABLE 1 | Distribution of explanatory variables for the 56,308 subjects aged between 17 and 65 years with marital status at entry to the study.

lies in its flexibility when modeling time-varying covariates We consider subjects between ages 17 and 65. Most cases in the African context for demography usually consider age for marriage to start at 15 years (Manda and Meyer, 2005). However, for this study, since there were no events occurring before 17 years of age, we opted to use a starting age of 17 years. It is then subdivided into q = 24 unit intervals between 17 and 65 years (which do not necessarily need to be equal). A subject's marital status is assumed to change not more than once within a period of 2 years, hence we chose to use 2-year time intervals in the analysis. Thus, we have intervals [17, 19], [19, 21], . . .[64, 65] which we denote as t = 1, 2, . . . , 24.

For assessing factors impacting the state occupied by a subject who possesses a number of characteristics at a particular age interval, a multinomial regression model is used (Grilli and Rampichini, 2007). For single transitions between marital states, into and out of a marriage, separate binary logistic regression models were utilized (Allison, 1982; Manda and Meyer, 2005; Steele, 2011; Clark et al., 2013). Separate response variables were created for the separate transition types, namely; marital dissolution (i.e., a dummy variable of 0 if subject is married and 1 if widowed or separated) or re-marriage (i.e., 0 if still widowed or separated and 1 if re-married). For analyzing ways of dissolving a marriage, a competing risks model (Allison, 1982; Jenkins, 1995; Steele, 2011) was used. We considered transitions out of a Married state, where Separated and Widowed are the competing events. A random variable, Yit, is created such that it is coded 0 whenever subject i is at risk of leaving a Married state, 1 when a transition is made into a Separated state and 2 when a transition is made into a Widowed state (the competing event).

Additionally, since observations are done on each individual repeatedly, there is possible correlation among observations TABLE 2 | Number of transitions between marital states from 1 January 2004 to 31 December 2016 for the 56,308 subjects aged between 17 and 65 years.


within the same subject. There may exist other unobserved subject-specific factors which may influence marital state occupation or transition between marital states. For example, one person may have reasons for not remarrying after the death of a partner, while another may easily remarry shortly. It is, therefore, important to account for these variations by including a subject-specific random effect in the analysis. In the binary logistic regression models, there is a separate random effect for each type of transition and these are allowed to be normally distributed with a mean of 0 and a variance of σ 2 . Moreover, since the marital states are possibly recurrent in a subject, the competing risks regression model controls for this using a multilevel model. The correlations will enable us to understand how one subject who experienced one type of marital transition is likely to experience the other. Tomita et al. (2017) used the same data set on HIV, accounting for its multilevel nature. However, they used continuous-time survival methods. Using discrete- time metrics, we fitted these in STATA15.1 using its inbuilt standard tools **xtlogit** for random effects logit model and **gsem** for multinomial logit.

## 3. RESULTS

## 3.1. Descriptive Statistics

The explanatory variables considered in this study include gender, if income is earned, if subject is employed, highest education level attained by subject, age at first sexual intercourse and age a first marriage for subjects who have ever been married. The median age at first sexual debut was 18 (mean = 18.03, min = 5, and max = 48) years for those who ever had sex. **Table 1** shows the distribution of some variables of interest at entry into the study where applicable. In all categories of the variables, never married subjects had the highest proportion. At entry into the study, 13.6% of the female participants were married and 10.3% of the males were married. Most participants did not earn an income and of those who did, 24.4% were married and 69.5% were never married. Of those who did not earn an income, 5.9% were widowed.

Number of transitions (in person-years) into the different marital states are represented in **Table 2**, where most subjects remained in the Never Married state and the most common transition type was Married to Widowed with 1, 561 transitions followed by the Never Married to Married transition with 1, 274

transitions. During the follow-up period, most subjects remained in the Never Married state with 267, 284 person-years. The median age at first marriage was found to be 34 years. Of the 17, 124 who ever married, 29%, 38%, 28%, and 5% had their first marriage at ages ≤ 22, 23 − 30, 31 − 40, and ≥ 41 years groups, respectively.

**Figure 1** displays the proportion at each age, of subjects who were occupying each state. It is clear that below 46 years of age, the biggest part of the population was constituted by subjects who were never married. Moreover, as shown in **Figure 2**, the hazards of entering a first marriage were very low (close to 0) in the younger ages, but increased just slightly with age. **Figure 3** also displays the trends over time for marital state occupation. Over the whole study period, the proportion of never married subjects was the highest. It decreased until 2008 and then stabilized afterwards. Proportion of married subjects was considerably low but slightly increased with time, then declined after 2009. The rates of marital separation were almost constant over time while those of widowhood started to increase slightly after 2008.

## 3.2. Results for Marital State Occupation

We began by considering factors leading to staying in a marital state as shown in **Table 3** below. The odds ratios are also displayed in **Figure 4**. All confidence intervals reported were at 95%. Compared to never married subjects, we assessed the odds of being married. Males had a significantly lower likelihood of being married than females [Odds Ratio (OR) for gender was 0.84(CI = 0.76; 0.93)]. Similarly, the odds ratio for primary, high school and tertiary education are 1.68(CI = 1.43; 1.98), 2.42(CI = 2.04; 2.86), and 6.43(CI = 4.89; 8.47), respectively, thus the likelihood of being married relative to never married were highest among subjects with tertiary education compared to those who never went to school. Those who did not earn income and those who were not employed had a lower likelihood of being in a Married state relative to Never Married [OR= 0.93(CI = 0.87; 1.01) and 0.93(CI = 0.86; 0.99), respectively]. Between the years 2006 and 2007, a subject was more likely to be married than never married when compared

to years 2004 − 2005. After 2007, subjects were less likely to be married than never married and the trend reduced over the years up to 2016 [with OR = 0.86, CI = (0.78; 0.94), OR = 0.71, CI = (0.64; 0.78), OR = 0.60, CI = (0.54; 0.67), OR = 0.50, CI = (0.44; 0.56), and OR = 0.43, CI = (0.38; 0.48)] for periods 2008 − 2009, 2010 − 2011, 2012 − 2013, 2014 − 2015 and 2016, respectively. However, the variation in the likelihood of being married, between subjects was quite substantial [which was allowed to vary with a standard deviation of 0.22 (CI=0.21; 0.23)].

For Separated vs. Never Married state occupation, although there is heterogeneity among subjects (with a standard deviation of the heterogeneity component of 0.09(CI = 0.08; 0.10)), males and those without employment were less likely to be in a separated state than females and those who were employed



\**Statistically significant variable.*

[OR = 0.42(CI = 0.24; 0.73) and OR = 0.52(CI = 0.35; 0.78) for gender and employment, respectively]. When compared to those who never went to school, the more educated a subject became, the higher the likelihood of being in a Separated state [OR = 1.62(CI = 0.59; 4.46), 4.50(CI = 1.52; 13.34) and 14.86(CI = 4.14; 53.31) for primary, secondary, and tertiary education, respectively].

When comparing subjects who occupied a Widowed state relative to those who never married, males were less likely to be widowed compared to females [OR = 0.09(CI = 0.07; 0.11)]. When compared to subjects without any education, the likelihood of occupying a widowed state significantly increased with increasing level of education [OR = 1.34(CI = 1.11; 1.62), 1.62(CI = 1.29; 2.02), and 2.96(CI = 1.96; 4.46) for primary, secondary, and tertiary education, respectively]. There existed some significant subject-to-subject variation in the likelihood of the Widowed vs. Never Married state occupation [standard deviation of the heterogeneity component, OR = 0.431(CI = 0.231; 0.631)].

## 3.3. Results for Single Marital State Transitions

Results of the binary transitions are displayed in **Table 4** below. These transitions include marital dissolution, re-marrying after a dissolved marriage (Separated or Widowed →Married), marital dissolution due to a separation as well as a dissolution as a result of death of a partner. The hazards of transitions over the years are displayed in **Figure 5**. For each transition type, a separate baseline hazard was allowed. Using the smallest BIC, they all had a linear baseline effect of age except for the married→separated transition whose baseline was constant. These have been represented


TABLE 4 | Results for single transitions for the different family dynamics for subjects aged between 17 and 65 years between 2004 and 2016 for the 56,308 subjects.

\**Statistically significant variable.*

graphically in **Figure 2**. Additionally, there was homogeneity among subjects' re-marriage rates, since the standard deviation of the heterogeneity component was estimated as SD = 0. Increasing age significantly reduced the rate of re-marriage [OR = 0.92(0.88; 0.97)]. Males were significantly less likely to experience a marital dissolution than females [OR = 0.27, CI = (0.17; 0.45)]. Raising age at first sex was associated with a lower rate of marital dissolution [OR = 0.97(CI = 0.95; 0.99)]. The likelihood of having a marital dissolution was significantly associated with recent periods, with the period 2008 − 2009 having the highest effect [OR = 24; CI = (5.53 − 108.37)] [standard deviation of 1.44, CI = (1.04, 2.01)].

Males were significantly less likely to experience a Married to Widowed transition, when compared to females [OR = 0.20, CI = (0.12; 0.36)]. The rest of the covariates did not significantly affect transition into widowhood. However, there existed some variation between subjects which is unaccounted for in the analysis [standard deviation of 1.50(CI = 1.07; 2.09)]. Lastly, higher ages at first marriage were associated with lower rates experiencing a Married→Separated transition for those who entered first marriages before the age of 40. The likelihood of a Married→Separated transition slightly decreased with increasing age at first marriage [OR = 0.55, CI = (0.10; 2.94) and OR = 0.38, CI = (0.06; 2.56) for age groups 23 − 29

and 30 − 40, respectively]. The standard deviation for random effects was 3.43(CI = 2.50; 4.72) which implies that there could be other factors associated with transitions from Married to


TABLE 5 | Results for exiting a marriage for subjects aged between 17 and 65 years since 2004–2016 from a population of 56,308 subjects.

\**Statistically significant variable.*

Separated state which were not captured by the model that causes the variation.

## 3.4. Results for Termination of a Marriage

Results of the competing risks model using the multivariate binary response are displayed in **Table 5** below. The hazards of transitions over the years are displayed in **Figure 6**. These concern transitions out of a Married state, whose competing destination marital states were Separated and Widowed. The Married→Separated transition had a constant baseline age effect while the Married→Widowed had a linear age effect. Marrying at later ages was associated with a lower likelihood of transition from married to widowhood. [OR = 0.65, CI = (0.43; 0.99) and OR = 0.40, CI = (0.19; 0.88) for those who entered first marriage between 30 − 40 and ≥ 41 years, respectively]. Males were significantly less likely to experience a Married→Widowed transition when compared to women [OR = 0.20, CI = (0.12; 0.36)]. Substantial heterogeneity [OR of SD = 3.56(CI = 2.66, 4.77) and SD = 1.50(CI = 1.08, 2.09)] between subjects on the Married → Separated and Married → Widowed transitions, respectively, were observed.

## 4. DISCUSSION

Marriage and marriage dissolution are important life events that affect adults in a society. We used the rich data from one of Africa's largest population-based cohorts based in KwaZulu-Natal South Africa to model family formation and dissolution using a series of multilevel discrete time event models. In line with Garenne (2004) and Hosegood et al. (2009), our results demonstrate that marital rates are low (13.6% females and 10.3% males) and that age at first marriage is remarkably high in this population (median of 34 years). In addition, the results of the multilevel discrete time to event models revealed that marrying at later ages had a clear association with a low rate of marital dissolution, whilst more educated subjects and later age of sexual debut were associated with a lower likelihood of experiencing a marital dissolution. The role of these factors on marriage warrants further discussion and may be investigated in future research which can be supported by data.

The study has several strengths including the use of rich data from one of Africa's largest population-based cohorts and stateof-the-art statistical methods. However, even more advanced statistical models can still be used to model the transitions in this process simultaneously. As such, in addition to modeling the hazards and covariate effects, other methods may then be used to determine more useful statistics, such as expected waiting times, sojourn times, and transition probabilities which may then be used for prediction. Additionally, the random effects in this study were modeled using a normal distribution but a different distribution (such as gamma mixture, by Jenkins, 1997) may also be considered where different assumptions are made about the subjects. Different approaches may also be used to handle missing data.

Although marriages may get terminated as a result of poor (psychological) health (Wang and Amato, 2000), marital dissolution might lead to undesired health outcomes on both partners and children, such as stress, and high blood pressure. We found significant factors which contribute to a marital dissolution which may help with policy decision making. Many prevention programs in HIV encourage delay in sexual debut in order to reduce risk of HIV acquisition in the highly vulnerable youth (Karim et al., 2017). Thus, programs which encourage delay in sexual debut may have additional benefits in terms of reducing rates of marital dissolution thus ultimately improving psychological and physical health.

## DATA AVAILABILITY STATEMENT

The datasets for this study can be found upon request from Africa Health Institute whose website is https://www.ahri.org/. Data sets with restricted access require a data access agreement to be completed. A request is then submitted to the applicable data custodian for specific data sets on the repository. For requests of ad hoc data sets, beyond the data sets archived on the Africa Centre data repository, these have to be directed to Africa Centre's Helpdesk (help@africacentre.ac.za).

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Ethical approval for all the data collected by AHRI was obtained from the University of KwaZulu-Natal's Ethics Committee (BE 169/15). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

## AUTHOR CONTRIBUTIONS

JB performed the literature review, data management, statistical analysis, and wrote the initial draft of the manuscript. SM conceived, designed, and developed the methodology of the study. HM helped with the design and methodology of the study. SM and HM reviewed the statistical analysis and helped with revision of the manuscript. FT contributed to the contextualization of the study findings and substantively helped with the revision of the manuscript. JB, SM, HM, and FT read and approved the final version of the manuscript.

## FUNDING

This work was funded and supported through the DELTAS Africa Initiative [SSACAB]. The DELTAS Africa Initiative was an independent funding scheme of the African Academy of Sciences (AAS)'s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [grant 107754/Z/15/Z-DELTAS Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) programme] and the UK government. The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK government. We also received funding from University of KwaZulu- Natal's University Capacity Development Programme (UCDP) and the Teaching development grant [APP-TDG-226], both from South African Department of Higher Education and Technology (DHET) through University of Pretoria. We acknowledge the Africa Health Research Institute's Demographic Surveillance Information System and Population Intervention Programme which was funded by the Wellcome Trust (201433/Z/16/Z) and the South Africa Population Research Infrastructure Network (funded by the South African Department of Science and Technology and hosted by the South African Medical Research Council).

## ACKNOWLEDGMENTS

We gratefully acknowledge the AHRI for providing with the datasets. The authors were also grateful to the study participants and the work and support of the fieldwork and database teams at AHRI. The Biostatistics Research Unit of the South Africa Medical Research Council provided research visits to their Pretoria offices. We also acknowledge the support we received from the University of KwaZulu-Natal through the provision of research facilities.

## REFERENCES


death: a rural South African population-based surveillance study. PLoS Med. 10:e1001409. doi: 10.1371/journal.pmed.1001409


Hashemi, L., and Homayuni, H. (2017). Emotional divorce: child's well-being. J. Divorce Remarriage 58, 631–644. doi: 10.1080/10502556.2016.1160483


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Batidzirai, Manda, Mwambi and Tanser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Progression of HIV Disease Among Patients on ART in Ethiopia: Application of Longitudinal Count Models

### Belay Desyebelew Andualem<sup>1</sup> \* and Birhanu Teshome Ayele2,3

*<sup>1</sup> Department of Statistics, Dire Dawa University, Dire Dawa, Ethiopia, <sup>2</sup> Department of Statistics, Addis Ababa University, Addis Ababa, Ethiopia, <sup>3</sup> Division of Epidemiology and Biostatistics Unit, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa*

Although the world has been fighting HIV disease in unity and patients are getting antiretroviral therapy treatment, HIV disease continues to be a serious health issue for some parts of the world. A large number of AIDS-related deaths and co-morbidities are registered every year in resource-limited countries like Ethiopia. Most studies that have assessed the progression of the disease have used models that required a continuous response. The main objective of this study was to make use of appropriate statistical models to analyze routinely collected HIV data and identify risk factors associated with the progression of the CD4<sup>+</sup> cell count of patients under ART treatment in Debre Markos Referral Hospital, Ethiopia. In this longitudinal retrospective study, routine data of 445 HIV patients registered for ART treatment in the Hospital were used. As overdispersion was detected in the data, and Poisson-Gamma, Poisson-Normal, and Poisson-Gamma-Normal models were applied to account for overdispersion and correlation in the data. The Poisson-Gamma-Normal model with a random intercept was selected as the best model to fit the data. The findings of the study revealed the time on treatment, sex of patients, baseline WHO stage, and baseline CD4<sup>+</sup> cell count as significant factors for the progression of the CD4<sup>+</sup> cell count.

Keywords: HIV/AIDS, CD4 count, longitudinal data, Poisson-Normal model, Poisson-Gamma-Normal model, antiretroviral therapy (ART), Ethiopia, Debre Markos

## 1. INTRODUCTION

HIV disease continues to be a serious health issue for resource-limited countries like Ethiopia. According to the UNAIDS (2016) fact sheet, there were about 2.1 million new cases of HIV in 2015 globally (1). About 36.7 million people were living with HIV around the world, and, as of June 2016, 18.2 million people living with HIV were receiving medicine to treat HIV, called antiretroviral therapy (ART). An estimated 1.1 million people died from AIDS-related illnesses in 2015, and 35 million people have died from AIDS-related illnesses since the start of the epidemic. CD4<sup>+</sup> cell counts are the primary targets of HIV. The relentless destruction of CD4<sup>+</sup> cell counts by HIV, either directly or indirectly, results in the loss of HIV-specific immune responses and, finally, non-specific immune response in the AIDS stage. The estimation of peripheral CD4<sup>+</sup> cell counts has been used as a tool for monitoring disease progression and the effectiveness of antiretroviral treatment (ART) (2). The changes in the CD4<sup>+</sup> cell counts are important indicators of the response to ART.

### Edited by:

*Jim Todd, University of London, United Kingdom*

### Reviewed by:

*Edson Mollel, Kilimanjaro Christian Medical University College, Tanzania Kathryn Risher, University of London, United Kingdom*

### \*Correspondence:

*Belay Desyebelew Andualem belaydesyebelew@gmail.com*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *20 May 2019* Accepted: *23 December 2019* Published: *19 February 2020*

### Citation:

*Andualem BD and Ayele BT (2020) Progression of HIV Disease Among Patients on ART in Ethiopia: Application of Longitudinal Count Models. Front. Public Health 7:415. doi: 10.3389/fpubh.2019.00415* Initial CD4<sup>+</sup> cell count, age, gender, smoking, unemployment, WHO stage, hospital, opportunistic infections, body mass index, changing doctors during outpatient follow up, use of alcohol and drugs, and duration of treatment (in months) are some of the significant determinants that affect CD4<sup>+</sup> cell count progression of patients on ART (3–5).

Most studies conducted in the area fitted statistical models that require (multivariate) Normal distribution by considering CD4<sup>+</sup> cell counts as continuous variable. When this assumption is violated, even after transformation, considering Poissonrelated models is a natural choice. One of the common problems one can be faced with in analyzing count data like the CD4<sup>+</sup> cell count is overdispersion. A Negative Binomial model can be considered to overcome this problem. Trindade et al. applied Poisson and Negative Binomial models using the multilevel (ML) approach and the generalized estimations equations (GEE) to model CD4<sup>+</sup> cell counts of 587 HIV seropositive patients, and they stated that the best marginal model to fit the data was the Negative Binomial (NB) with an exchangeable correlation structure (6). Tekle et al. also employed different count data analysis methods starting from the ordinary Poisson regression model to study CD4<sup>+</sup> cell counts of 222 HIV positive patients, and they found that Poisson-Normal-Gamma is the best model to fit their data (7). In this study, we applied various count data models to study the progression of the CD4<sup>+</sup> cell count of HIV patients and identified risk factors for progression of patients' CD4<sup>+</sup> cell count in Debre Markos Referral Hospital, Ethiopia.

## 2. MATERIALS AND METHODS

In practice, it is common to have response variables of a count type-like number of the CD4<sup>+</sup> cell count in a cubic milliliter of blood. Some data analysts treat the CD4<sup>+</sup> cell count as a continuous measure and apply the linear mixed effects model. But that practice ignores two facts: the data are really discrete, and the distributions of count variables are usually skewed. For these reasons, the use of models that assume (multivariate) normality might not be efficient (8). Even if the data is transformed and these models are applied, the interpretation might not be straightforward. In scenarios like this, it is better to apply statistical models that account for the nature of the data.

Our data includes 445 HIV-positive patients who started ART treatment between December 2005 and July 2014 in Debre Markos Referral Hospital, Ethiopia. The minimum number of measurements was two and the maximum was seven. Patients


with less than two measurements and age of <15 years were excluded from the study.

For our data, the assumption of multivariate normality failed, and this suggested that use of a linear mixed model was not appropriate (**Table 1**). The Poisson regression model with normal random effects and models that account for both correlation between repeated measures and overdispersion simultaneously were thus considered in line with Booth et al. (9) and Molenberghs et al. (10, 11).

## 2.1. Variables in the Study

### 2.1.1. Dependent Variable

The dependent variable of this study was the CD4<sup>+</sup> cell count per cubic millimeter of blood of HIV-infected patients who are under ART treatment.

### 2.1.2. Independent Variables

The independent variables considered in this study were selected based on related literature (5, 7). These include the sex of patients, age of patients (age at the initiation of the treatment), baseline CD4<sup>+</sup> cell count (the CD4<sup>+</sup> cell count of the patients at the start of the treatment), WHO clinical stage at baseline (stage I, stage II, stage III, and stage IV), marital status at baseline, baseline weight, level of education at baseline, functional status at baseline, TB status at baseline, and time in months. Functional status was defined as WHO categories: Ambulatory and Working. Patients who are able to perform activities of daily living but not able to work or play are classified as ambulatory and the who are able to perform usual work in or out of the house, harvest, go to school or for children, normal activities, or playing were classified as working.

## 2.2. Poisson Model

Let Y<sup>i</sup> be the ith CD4<sup>+</sup> cell count and is Poisson distributed with mean λ<sup>i</sup> . The density function of Y<sup>i</sup> can then be written as

$$f(Y\_i = y\_i | \lambda\_i) = \frac{e^{-\lambda\_i}\lambda\_i^{y\_i}}{y\_i!} = \exp\{y\_i \ln \lambda\_i - \lambda\_i - \ln y\_i!\}, \quad \text{(1)}$$

The Poisson distribution belongs to the exponential family, with natural parameter θ<sup>i</sup> equal to ln λ<sup>i</sup> , scale parameter φ = 1, and variance function v(λi) = λ<sup>i</sup> (12). The logarithm is the natural link function, leading to the classical Poisson regression model Y<sup>i</sup> ∼ Poisson(λi), with log(λi) = X T i β.

## 2.3. Poisson-Gamma Model

The standard Poisson distribution requires the mean and variance to be equal. When this assumption fails, the Poisson-Gamma model should be used to fit the data. Assume that Y<sup>i</sup> |θ<sup>i</sup> ∼ Poi(θiλi), where θ<sup>i</sup> denotes an independent and identically distributed (iid) sample of unit mean Gamma random variables with shape parameter α (9). Conditional on θ<sup>i</sup> , the CD4 count of the ith patient follows a Poisson distribution with mean θiλ<sup>i</sup> . The counts are then marginally independent Poisson-Gamma random variables [Y<sup>i</sup> ∼NB(α, λi)] with mean λ<sup>i</sup> and variance λ<sup>i</sup> + λ 2 i /α. Hence, the parameter α quantifies the amount of overdispersion with α = ∞ corresponding to no overdispersion

### TABLE 2 | Summary of CD4<sup>+</sup> cell count at different time points.


TABLE 3 | Summary of CD4<sup>+</sup> cell count progression for some categorical covariates.


with respect to the Poisson distribution. The mass function of the Poisson-Gamma random variables is given by

$$Pr(Y\_i = \boldsymbol{\wp}; \alpha, \lambda\_i) = \frac{\Gamma(\boldsymbol{\wp} + \alpha)}{\Gamma(\alpha)\boldsymbol{\wp}!} (\frac{\alpha}{\lambda\_i + \alpha})^{\alpha} (\frac{\lambda\_i}{\lambda\_i + \alpha})^{\boldsymbol{\wp}} \tag{2}$$

The Poisson-Gamma model (also known as the Negative Binomial model) is given by log(λi) = X T i β.

## 2.4. Poisson-Normal Model

For µij =E(Yij|bi) and known link function η(.), the generalized linear mixed model can be expressed as:

$$\eta(\mu\_{i\bar{j}}) = \eta[E(Y\_{i\bar{j}}|b\_i)] = X\_{i\bar{j}}^T \beta + Z\_{i\bar{j}}^T b\_i \tag{3}$$

where Yij is the CD4<sup>+</sup> cell count of the ith patient at jth visit (measurement). β= a p-dimensional vector of unknown fixed regression coefficients. b<sup>i</sup> = a q-dimensional vector of unknown random regression coefficients for the ith individual, and these are often assumed to be drawn independently from the N(0, D), and D is the variance-covariance matrix of the random effects. Xij and Zij are p-dimensional and q-dimensional vectors of known covariate values, respectively (10). The generalized mixed Poisson model with normal random effects (Poisson-Normal model) becomes

$$\ln(\lambda\_{i\bar{j}}) = X\_{i\bar{j}}^T \beta + Z\_{i\bar{j}}^T b \mathbf{i} \tag{4}$$

This model is referred to as the Poisson-Normal model because it assumes Poisson distribution for the counts and normal distribution for the random effects b<sup>i</sup> (10, 11).

## 2.5. Poisson-Gamma-Normal Model

According to Molenberghs et al. (10, 11), a model combining the ideas from the Poisson-Normal and overdispersion models for repeated Poisson data with overdispersion can be specified as follows Yij ∼ poi(θijλij)

$$\lambda\_{i\dot{\jmath}} = \exp(X\_{i\dot{\jmath}}^T \beta + Z\_{i\dot{\jmath}}^T b\_i) \tag{5}$$

where θij capture overdispersion and denote an independent and identically distributed (iid) sample of unit mean gamma random variables with shape parameter α and scale parameter β=1/α, and where b<sup>i</sup> ∼ N(0, D) and θij ∼ Gamma(α, β). This model is called the Poisson-Gamma-Normal (combined) model because it includes both Normal (bi) and Gamma (θij) random effects to account for correlation and overdispersion, respectively.

## 2.6. Methods of Parameter Estimation

In this study, we used glmer and glmer.nb functions in R under packages MASS and lme4. A Laplace approximation was used to obtain parameter estimates. The R code used to fit the models is available in **Supplementary Material**.

## 2.7. Model Comparison

To select the important variables, first the main effect, main effect by time interaction, and plausible main effect by main effect interactions were incorporated to the initial candidate models,

the non-significant interaction effects were then removed, and the models were refitted again and so on. The best model that can fit the data was selected using various information criteria (AIC, BIC, and −2loglikelihood) (**Table 7**). The model with smallest values of information criteria was selected as the final model.

## 3. RESULTS AND DISCUSSION

## 3.1. Descriptive Analysis

In this section, CD4<sup>+</sup> cell count data obtained from 445 HIV patients on ART treatment in Debre Markos Referral Hospital were summarized. The majority of the HIV patients [347 (78.0%)] started antiretroviral treatment with CD4<sup>+</sup> cell counts <200 cells/mm<sup>3</sup> . At the start of the treatment, the median CD4<sup>+</sup> cell count of the patients was 145 CD4<sup>+</sup> cells/mm<sup>3</sup> of blood with IQR of 107.00 CD4<sup>+</sup> cells/mm<sup>3</sup> of blood. The minimum and maximum baseline CD4<sup>+</sup> cell counts were three and 971 CD4<sup>+</sup> cell cells/mm<sup>3</sup> of blood, respectively.

The summary of CD4<sup>+</sup> cell counts at different time points is given in **Table 2**. As can be seen in **Table 2**, the median CD4<sup>+</sup> cell count increased over time. The IQR of CD4<sup>+</sup> cell counts increased at some points and then started to decrease after the 24th month. The number of patients decreased at some points and increased at others, which implies the presence of intermittent missingness in the data. That means some patients were falling out of care and then re-engaging, or they did not have CD4<sup>+</sup> cell counts that were spaced perfectly every 6 months.

Data on demographic and clinical characteristics of the patients was collected at the start of antiretroviral treatment. Among the 445 patients, 280 (62.9%) were females. The male patients had a 134.84 mean baseline CD4<sup>+</sup> cell count, while the female patients had a mean baseline CD4<sup>+</sup> cell count of 168.91. On average, female patients started ART treatment at a relatively higher CD4<sup>+</sup> cell count. The difference in mean CD4<sup>+</sup> cell count of the two groups increases as time increases. The average CD4<sup>+</sup> cell count of females was higher than males at all time points and the difference increases over time.

WHO stage III had a higher number of patients [282 (63.4%)] as compared to the other three stages. WHO stage II took second place in number of patients [77 (17.3%)], and WHO stage IV had the smallest number of patients [27 (6.1%)]. As expected, patients on WHO stage I had a higher CD4<sup>+</sup> cell count at all time points as compared to patients of the other three stages of the disease.

Patients with a working functional status have a higher mean CD4<sup>+</sup> cell count at all time points than that of patients with ambulatory functional status. Among the 445 HIV patients included in this study, 346 (77.8%) were patients with working functional status and 99 (22.3%) were ambulatory (**Table 3**). About 27.3% of the 445 HIV patients were TB positive at baseline. TB negative patients had a higher mean CD4<sup>+</sup> cell count at



all time points compared to TB positive patients, implying the impact of the HIV-TB coinfection.

## 3.2. Exploratory Data Analysis

**Figure 1** depicts the individual profile plot of the CD4<sup>+</sup> cell count of HIV-infected patients included in the study. The plot provides some information on the between patients' CD4<sup>+</sup> cell count variability and illustrates the over-time change in patients' CD4<sup>+</sup> cell count. Some individuals have an erratic CD4<sup>+</sup> cell count and others have a CD4<sup>+</sup> cell count that slowly increases over time. As one can see from the graph, there is a considerably large difference in the intercepts of individual trajectories. Similarly, some trajectories are steeper, while others were almost horizontal, indicating the possible variability in the slope of CD4<sup>+</sup> cell counts. Therefore, because of the variability in the intercept and slope of trajectories, using a mixed model could fit the data very well. The overall mean profile plot of the CD4<sup>+</sup> cell count shows somehow a linear increasing pattern of CD4<sup>+</sup> cell count over time (**Figure 2**), suggesting that a linear time effect seems reasonable. The mean CD4<sup>+</sup> cell count

TABLE 5 | Poisson-Normal model.

increases at a high rate from baseline till the 6th month and then starts to increase slowly from 6 to 24th month and decreases at month 30.

## 3.3. Model Results

**Table 4** summarizes the parameter estimates of Poisson and Poisson-Gamma regression models employed on the CD4<sup>+</sup> cell count. All parameters included in the Poisson regression model are significant at 5% level of significance. For this study, the data were overdispersed, as the sample variance of CD4<sup>+</sup> cell count at all time points was greater than its corresponding sample means (**Table 2**). A likelihood ratio (LR) test was used to test the null hypothesis that the restriction in the Poisson model was true. The test revealed that the null hypothesis was rejected, implying the presence of overdispersion in our data. The Poisson-Gamma model better fits the data as compared to the Poisson model with smallest AIC value. The Poisson-Normal model with both random intercept and slope was found to be the best fit since it has smaller information criteria values compared with the only random intercept model. The parameter


estimates of this model are displayed in **Table 5**. Depending on this model time, the WHO stage and initial CD4<sup>+</sup> cell count were found to be significant factors of patients' CD4<sup>+</sup> cell count progression.

An improvement in both the Poisson-Gamma and Poisson-Normal models as compared with the Poisson model in fitting the data is an indication of the occurrence of both correlation and overdispersion in the data. The Poisson-Gamma-Normal (Negative Binomial log-linear mixed) model proposed by Booth et al. (9) and Molenberghs et al. (10, 11) was fitted to overcome this problem of correlated and overdispersed count data, and the random intercept Poisson-Gamma-Normal Model is a much better fit because of its lower AIC (27,379.9), BIC (27,488.4), and −2loglikelihood (27,342) values as compared to the Poisson-Normal models (**Table 6**). Therefore, the final model to fit our data was the random intercept Poisson-Gamma-Normal model. We have also tried the Poison-Gamma-Normal model with different (random) linear slopes for a time, but we found that the Poison-Gamma-Normal with random intercept was better based on information criteria (AIC and BIC).

Based on the results obtained from the Poisson-Gamma-Normal model, time in months, sex, and baseline CD4<sup>+</sup> cell count were found to be significant factors of the CD4<sup>+</sup> cell count of a patient (**Table 8**). For a given patient, keeping the random intercept and other covariates constant, one more month on ART increased the CD4<sup>+</sup> cell count by a multiplicative factor of e 0.0243 = 1.0246.

A female patient had a CD4<sup>+</sup> cell count of 1.1215 times that of a male patient, adjusting for other covariates and random intercept. A unit change in baseline CD4<sup>+</sup> cell count increased the CD4<sup>+</sup> cell count of a patient by a factor of 1.0034, fixing the values of the other covariates and the random intercept constant.

The dispersion parameter (1/α) has been estimated, in the final model, as 7.7009, and the Gamma (overdispersion) random effects are assumed to follow a Gamma distribution with unit mean and shape parameter α (0.130).

TABLE 6 | Poisson-Gamma-Normal model.



TABLE 7 | Summary of information criteria of different models.

TABLE 8 | Poisson-Gamma-Normal model.


## 3.4. Discussion

The effects of demographic and clinical factors on the progression of CD4<sup>+</sup> cell counts over time of HIV patients taking ART treatment in Debre Markos Referral Hospital were assessed using Poisson longitudinal models since the response variable of interest CD4<sup>+</sup> cell count is a count variable.

The results of the summary statistics revealed that the value of IQR is high at all time points, which might be an indication for high variation among the patients' CD4<sup>+</sup> cell count at baseline as well as at different time points after the initiation of ART treatment. This variation might have been caused by the year at which the patients started ART treatment, as there have been different WHO's CD4<sup>+</sup> cell count cut-off points to initiate ART treatment at different times. Although most of the patients included in our study started with lower CD4<sup>+</sup> cell counts (<200 cells/mm<sup>3</sup> ), there were patients who had higher baseline CD4<sup>+</sup> cell counts (971 cells/mm<sup>3</sup> ). Despite the continuous effort to initiate early, some patients still presented with lower CD4<sup>+</sup> cell counts, which might be due to patients' lack of willingness to get tested (13, 14) or difficulties to provide treatments to all patients in lower-income countries including Ethiopia. Hence, we believe that our result could be generalizable. The final model also indicated that initial CD4<sup>+</sup> cell count (CD4<sup>+</sup> cell count at the start of the treatment) significantly affects CD4 count progression. Therefore, based on our findings we recommend patients to start the treatment early as of the WHO's "treat all" recommendation.

The sign of the parameter estimate of WHO stage III is positive, which implies that a patient with WHO stage III has a higher CD4<sup>+</sup> cell count as compared with a patient of WHO stage II. It might be because the number of patients with WHO stage III are much higher (non-comparable) than patients with WHO stage II. The relationship between CD4<sup>+</sup> cell count and WHO stage III might also be explained by the baseline CD4<sup>+</sup> cell count. Duration of treatment also have a positive effect on the CD4<sup>+</sup> cell count progression of HIV patients. This means patients with longer time on ART treatment have good recovery of CD4<sup>+</sup> cell count than that of patients with short duration on the treatment.

## 4. CONCLUSION

An analysis of CD4<sup>+</sup> cell count data using conventional models like linear mixed models might be inadequate as the data were highly skewed and may not satisfy normality (multivariate) assumption as demonstrated in our data.

In this study, CD4<sup>+</sup> cell count data of 445 HIV patients under ART in Debre Markos Referral Hospital was analyzed using different longitudinal count models, and the Poisson-Gamma-Normal model was selected as the final model to fit the data based on different selection criteria. The Poisson-Gamma-Normal model handles overdispersion and correlation simultaneously.

The duration on ART treatment (time in months), sex of patients, and baseline CD4<sup>+</sup> cell count were all identified as potential risk factors of CD4<sup>+</sup> cell count progression. Having a good CD4<sup>+</sup> cell count at baseline had a positive impact on CD4<sup>+</sup> cell count evolution over time.

Although good CD4<sup>+</sup> cell count progress in response to ART was observed, most of the patients (78.0%) were at decreased CD4<sup>+</sup> cell counts (<200 cells/mm<sup>3</sup> ) when enrolled for ART treatment, which might have contributed to low CD4<sup>+</sup> count recovery in some patients.

## 5. LIMITATIONS AND RECOMMENDATION

In our study, we only considered patients from one hospital. The likelihood inference of the models considered in this study are valid under MCAR (missing completely at random). In the current study, we did not carry out a sensitivity analysis, and we only considered linear slopes models, although a different linear slope for different time periods seems reasonable. Hence, we recommend that researchers consider sensitivity analysis and data obtained from different Hospitals. The age and weight of patients might have a non-linear relationship with the CD4<sup>+</sup> cell count. We recommend smoothing techniques like splines to be explored for further studies. The assumption of multivariate normality that is assumed by most statistical models used in longitudinal data analysis should be checked before analysis. Efficient methods like the ones used in this study could be considered if the assumption is violated.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

Before data collection, a letter of support written by the Statistics Department of Addis Ababa University was submitted to Debre Markos Hospital and permission to collect anonymized data was obtained. The data was extracted by trained data clerks in the ART Clinic and none of the researchers had access to original cards of patients. Written informed consent for participation

## REFERENCES


was not required for this study in accordance with the national legislation and the institutional requirements.

## AUTHOR CONTRIBUTIONS

BAn conceived the idea, performed the data cleaning and analysis, interpreted the ensuing results, and drafted the manuscript. BAy supervised the study, contributed to the conception, and revised the manuscript. Both the authors read and approved the final draft.

## ACKNOWLEDGMENTS

We acknowledge the ART case unit information center of Debre Markos Referral Hospital, Ethiopia for the data they supplied. The manuscript was prepared on the basis of my Masters' thesis.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2019.00415/full#supplementary-material

R codes used to fit the models are attached as supplemental data.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Andualem and Ayele. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Survival of Children Living With HIV on Art in Zambia: A 13-Years Retrospective Cohort Analysis

Tendai Munthali 1,2 \*, Charles Michelo<sup>1</sup> , Paul Mee<sup>3</sup> and Jim Todd<sup>4</sup>

*<sup>1</sup> School of Public Health, University of Zambia, Lusaka, Zambia, <sup>2</sup> Department of Public Health, Ministry of Health, Lusaka, Zambia, <sup>3</sup> MeSH Consortium, Department of Public Health Environments and Society, Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>4</sup> Department of Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom*

Background: Research conducted before the introduction of anti-retroviral therapy (ART), showed that the majority of children living with HIV (CLHIV) would die before their second birthday. In Zambia, ART was rolled out to the public health system in 2004 with subsequent improved survival in CLHIV. However, the survival rates of CLHIV on ART in Zambia since 2004 have not been extensively documented. We assessed survival experiences and the factors associated with survival in CLHIV on ART in Zambia.

### Edited by:

*Philip Ayieko, University of London, United Kingdom*

### Reviewed by:

*Samuel Manda, South African Medical Research Council, South Africa Lucas Malla, KEMRI Wellcome Trust Research Programme, Kenya*

> \*Correspondence: *Tendai Munthali munthalitendai@gmail.com*

### Specialty section:

*This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health*

Received: *10 May 2019* Accepted: *09 March 2020* Published: *31 March 2020*

### Citation:

*Munthali T, Michelo C, Mee P and Todd J (2020) Survival of Children Living With HIV on Art in Zambia: A 13-Years Retrospective Cohort Analysis. Front. Public Health 8:96. doi: 10.3389/fpubh.2020.00096* Methods: We conducted a retrospective cohort analysis of CLHIV (aged up to 15 years) using routinely collected data from health facilities across Zambia, over 13 years to ascertain mortality rates. We explored survival factors using Cox regression giving adjusted hazard ratios (AHR) and 95% confidence intervals (95% CI). Nelson Aalen estimates were used to show the cumulative hazards of mortality for different levels of explanatory factors.

Results: A total of 65,448 eligible children, were initiated on ART between 2005 and 2018, of which 33,483 (51%) where female. They contributed a total survival time of 275,715-person years at risk during which 3,265 children died which translated into an incidence rate of 1.1 deaths per 100 person-years during the review period. Mortality rates were highest in children in the first year of life (Mortality rate 2.24; 95% CI = 2.08–2.42) and during the first year on ART (Mortality rate 3.82 95% CI = 3.67–3.98). Over 50% of the children had been on ART for 5–10 years by 2018, and they had the lowest risk of mortality compared to children who had been on ART for <5 years.

Conclusions: Children with HIV in Zambia are surviving much longer than was predicted before ART was introduced 14 years ago. This key finding adds to the literature on analysis of survival in CLHIV in low income settings like Zambia. However, this survival is dependent on the age at which ART is initiated and the time on ART highlighting the need to increase investments in early infant diagnosis (EID) to ensure timely HIV testing and ART initiation for CLHIV.

Keywords: survival, HIV- infected, children, Zambia, anti-retroviral treatment (ART), initiation

## INTRODUCTION

UNAIDS has a target to end all new HIV infections by 2030, but in 2018 there were still 1.7 million children living with HIV (CLHIV) (1). Worldwide, every year there are 110,000 deaths in CLHIV under the age of 15 years, and 180,000 new HIV infections from mother to child transmission (1, 2). In Zambia, in 2018 there were ∼62,000 children living with HIV, 79% of them were on ART and there were 3,000 AIDS related annual deaths (3, 4). Without ART, more than 50% of children infected with HIV will die before their second birthday (5, 6). From its inception ART has been shown to reduce mortality among adults and children living with HIV more than 50% since the year 2000 (7–9). With the roll out of HIV treatment services, the initiation of CLHIV on ART in Zambia increased steadily from 24,000 in 2010 to 49,116 in 2018 (2, 3). This translated into a reduction in HIV-related mortality from 3,600 in 2016 to 3,000 in 2018 (3, 4).

However, studies show that mortality among CLHIV on ART continues to be high, especially among infants with advanced disease, and in the first 12 months following ART initiation despite the use of prevention of mother-to-child transmission of HIV (PMTCT) drugs (10, 11). Studies have shown that initiating children on ART early (birth-7 weeks) reduced mortality by 76% compared to postponing ART until progression to advanced HIV disease or CD4 count thresholds are reached (11, 12). However, other studies have shown that WHO clinical stage 3 or 4 and young age at the time of ART initiation were associated with a 5-fold increased risk of death, while absolute CD4 counts of <350 cells/mm<sup>3</sup> at ART initiation and underweight (weightfor-age Z-score <-2) were associated with 3.6 and 3,5 times increase in mortality among CLHIV (5, 13–15). In addition, overall mortality rates among CLHIV on ART vary from country to country depending on several factors. In a 7-year study in South Africa, a mortality rate of 4.7 per 100 person-years was reported (16), whereas in Ethiopia a mortality rate of 1.2 per 100 person-years was reported in Addis Ababa and 2.1 per 100 person-years in Eastern Ethiopia after a 2 year follow up (14). Furthermore, a mortality rate of 3.0 per 100 person-years was reported in a 1 year follow up study in Nigeria (17).

Despite the varied mortality rates, CLHIV are surviving longer on ART. Studies in Asia and Africa have shown survival rates that ranged from 84 to 97% after 12 months of ART for children initiated on ART at a median age of 5–7 years. Another multi country study in Africa also reported above 90% (93%) survival rates after 2 years of follow up on ART (18). There is limited global data on long term survival of children on ART. In Zambia the long-term survival of CLHIV and clinical factors that are associated with patterns of survival have not been extensively documented. We therefore assessed the probability of survival and factors associated with mortality among CLHIV in Zambia.

## MATERIALS AND METHODS

## Study Population and Sample

This study was a quantitative retrospective cohort study of CLHIV with records in the Zambian SmartCare data. SmartCare is an electronic patient monitoring system which is used in about 600 government health facilities in all districts in Zambia (19). The system records patient characteristics with a unique identification number, at the first contact with the clinic, and then records clinical information at each subsequent health facility visit. A total sampling from the SmartCare records of all children with a positive HIV result, aged 15 years and younger at the time of ART initiation, regardless of missing data during the period under review, was conducted for this analysis.

## Clinical Procedures

In the Zambian health care system, all children with a positive HIV test result and diagnosis are eligible to receive HIV care according to national ART prevention and treatment guidelines. This includes documenting medical history, physical examination, anthropometric measurements, socio demographic information, and WHO clinical stage. In addition, measurement of CD4+ T-cell counts or percentages, hemoglobin levels (HB), renal, liver function tests, and HIV viral loads are taken. Eligible children are then treated with a first-line regimen, either Nevirapine based (NVP), Efavirenz based (EFV), or Nucleoside reverse transcriptase inhibitors (NRTI). Children are then asked to return for clinical evaluation every 3 months (20, 21).

## Data Definitions and Statistical Methods

Follow up of children started at the date they initiated ART, and continued until death, or the date of censoring. The event of interest, death was coded as 1 for those who died, and 0 for those who were censored. The date of censoring (end date) contained the date of death for those that died, and the last clinic date for those who were lost to follow up, with all other children censored on 1st January 2018, or the date of their 15th birthday, whichever came first. The variable age at ART initiation was obtained by subtracting the child's date of birth from the date of ART initiation and was categorized into 5 age bands (birth to <1 year, 1–2, 3– 5, 6–9, and 10–15 years). Time on ART was divided into 4 points that is at 1, at 2, at 5, and at 10 years on ART. Children's weight was used to calculate the weight for age Z-scores (WAZ) for children in the dataset using a WHO standard (22).

Categorical variables were summarized across all children as frequencies and percentages. Mortality rates were defined as the number of deaths divided by the person-years of followup with 95% confidence intervals (95% CI) for mortality rates defined for each subgroup of children. Nelson Aalen cumulative Hazards were used to graphically show the differences in mortality by subgroups of children. A modified Cox regression model, adjusting for possible heterogeneity at the facility level, was used to obtain hazard ratios (HR) and 95% confidence intervals (95% CI) for factors associated with mortality, with subsequent adjustment for sex, age at ART initiation, and province to obtain adjusted HR (aHR) and 95% CI. In order to model the unobserved covariates at health facility level, an additional parameter, theta, drawn from a Gamma distribution, was estimated in the modified Cox regression model, representing the frailty or variation in the log of the survival function across health facilities.

For socio-demographic variables, such as age, sex, year, health facility and province, a complete case analysis was conducted, using all the children, assuming the small amount of missing data were missing at random. The results present the crude analysis, the age-adjusted analysis and the frailty model that adjusts for the effect of clustering. For variables where a large number of children had missing data, a subset analysis on children

TABLE 1 | Background characteristics of CLHIV on ART in Zambia from 2004 to 2018.



\**Tested using Cox proportional Hazards ratio.*

\*\**Adjusted for adjusted for age at ART start, sex and province.*

with valid data was performed to assess the mortality risks for those variables.

## RESULTS

A total of 70,718 children were recorded as having been initiated on ART from 496 health facilities and 71 districts across Zambia during the period under review. Of these only 65,448 were analyzed in survival analysis translating to 33,483 (51%) female, and 31,965 (49%) male.

The largest proportion of children analyzed were between 10 and 15 years and these accounted for about 25% of the sample while the smallest proportion was accounted for by children under 1 year of age at 10%. Mortality was also highest among children that initiated ART under 2 years, and lowest among children that initiated ART at older ages (10–15 years). Children that initiated ART during the period under review, were largely initiated on NRTIs (47%) and NVP based regimen (29%) (**Table 1**).

## Health Facility and Provincial Characteristics of CLHIV on ART

Out of the three levels of health facilities that were analyzed, hospitals and health centers accounted for the highest number of person years of follow-up (140,270 and 133,904, respectively) with health posts having the lowest person years of follow up (402). Out of the 3,265 children that died, hospitals in the country accounted for about 1,542 deaths translating to about 47% of deaths recorded during the reporting period. On the other hand, facilities in Lusaka (812, 24%), Copperbelt (583, 18%) and Southern (467, 14%) provinces had the highest number of deaths amongst all the ten provinces while facilities in Muchinga province had the lowest number of deaths with 81 deaths (2.5%). Children in Muchinga, Luapula and North western provinces also had the lowest chances of survival compared to children in Central province (North western, aHR 2.05 95% CI = 1.69–2.4; Muchinga, aHR1.95 95% CI = 1.50–2.50; Luapula, aHR 1.60 95% CI = 1.30–1.95) (**Table 2**).

## Survival of CLHIV on ART and Associated Factors

A total of 65,448 children were analyzed using proportional hazards survival analysis, with a total of 275,715 person-years at risk. The individual children had follow-up times that ranged from 1 month to 13.5 years. Survival of children on ART was highest among children who initiated ART between 10 and 15 years, with these children having 72% lower hazard ratio compared to children initiating ART between birth and 1 year (aHR 0.28,95% CI = 0.25–0.32). There was considerable variability in survival amongst the children initiating ART from the different provinces (**Table 2**). Factors associated with mortality included age when initiating ART and time on ART although these two variables were correlated as only children who initiated ART at younger ages could have a longer duration on ART. Compared to children who were diagnosed and initiated ART in the first year of life, children initiating ART between



\**Tested using Cox proportional Hazards ratio.*

\*\**Hazards adjusted for age at ART start and sex.*

\*\*\**Frailty model adjusting for clustering at facility level.*

*The variability of survival between the health facilities 2.18 (95% CI 1.86–2.63).*

the first and the second years of life were 32% less likely to die (aHR 0.68 95% CI = 0.61–0.75), with lower hazard ratios in each older age group (**Table 3**). Compared to WHO clinical stage 1, children with WHO clinical stages 3 (aHR 10.6; 95% CI = 9.70–11.74) and 4 (aHR 16.49 95% CI = 14.56–18.67) had increased mortality (**Table 3**). The frailty coefficient was >0, showing that the survival rates differed by a factor of 2.18 across the health facilities.

Three separate models were used restricting the analysis to subsets of children with data for CD4 counts at initiation, WAZ and the first line drug regime used at ART initiation. All models were adjusted for sex, WHO clinical stage, and time on ART, and the estimates for these variables were similar to the full model although the significance of the effects was greatly reduced (**Table 4**). Higher mortality was observed in those who were severely malnourished (aHR 4.51, 95% CI 2.57–7.92) compared to those who were not severely malnourished. Higher values of frailty for the model including malnourishment indicates the additional clustering of malnourishment in health facilities, increasing the variability of the survival rates by 3.22 across the health facilities. In the same model, those who initiated ART with a CD4 count of <500 cells/µl (aHR 1.76, 95% CI 1.22–2.54) had higher mortality than those who initiated ART with a CD4 count >500 cells/µl (**Table 4**).

## Estimation of Cumulative Hazards Curves of Death Among CLHIV on ART

Nelson Aalen cumulative Hazard estimates were analyzed for factors affecting survival in CLHIV on ART in Zambia and cumulative Hazard curves were derived. Factors analyzed included WHO staging, age at ART initiation, ART drug regimen and weight for age Z-scores. Overall cumulative hazards increased rapidly between the first and third year of follow up time across all factors that were estimated. The cumulative proportions of children who died on NVP based drug regimen was 15% higher than children who were on NRTI and EFV based regimen in the first 5 years of follow up time. Similar increases in mortality were also evident among children that initiated ART in WHO clinical stage 4, where steady increases of up to 30% were noticed for close to 10 years of follow up see **Figure 1**.

## DISCUSSION

Our study outlines survival of children on ART over a 13 year period (2004–2017) in Zambia. Studies in the past showed that without ART, more than 50% of CLHIV would die before their second birthday (5). For all children, mortality rates at ART initiation were high but lower than those reported in CLHIV prior to ART availability. Overall the SmartCare database showed 3,265 children on ART died during the 13 years, which gave an overall estimate of 1.1 deaths per 100-person years.

Mortality rates in our study decreased with longer duration on ART. Mortality rates in children who had been on ART for 5 years or more were only slightly higher than those reported for HIV negative children in Zambia (23). The overall mortality rate of 1.1 per 100 person-years reported in this study changed with duration on ART. In the first year on ART, the mortality rate was 12 per 100 person-years, which reduced to 8 per 100 person-years between the first and second year, 1.6 per 100 between the second and fifth year, and 0.62 after the fifth year. Our mortality rates over the first 2 years on ART were higher than those reported in a 2 year follow up study in Addis Ababa, Ethiopia (1.2 per 100 person-years), a 1 year follow up study in Makurdi, Nigeria (3.0 per 100 person-years), and a 2 year follow up study in Eastern Ethiopia (2.1 per 100 years). However, our overall mortality rate was lower than those reported in a 7 year South African study (4.7 per 100 person-years) and an eight-year study in Jos, Nigeria (14, 16, 17). The variation in mortality rates could be due to larger sample size in our study compared to other studies used in the comparison. Another reason could be the year in which the study was conducted as WHO guidelines being used at that time could have affected time to ART initiation, drug regimen and in turn mortality (10, 14).

Children that initiated ART in Lusaka, Copperbelt, and Southern provinces as well as children that initiated ART in hospitals and health centers had higher rates of mortality in our study. Male children on ART in our study had increased risk of mortality in contrast to findings by Zanoni et al. (15), in a South African study where female child had increased chances of mortality compared to their male counterparts. Nonetheless, our


TABLE 4 | Multi level adjusted and unadjusted hazards ratios for factors associated with mortality among CLHIV on ART in Zambia clustered at facility level.

*All models adjusted for sex and WHO clinical stage clustering at health facility level.*

findings are in line with findings by (16) where higher mortality rates was experienced among males (66.7%) than females (33.3%) as well as in a cohort of HIV uninfected children where male children had 2% higher mortality than female children sub Saharan Africa (24).

This study used a modified Cox regression to allow for differences in survival due to unobserved covariates at facility level. In these models, the estimated variance of health facility frailty terms was significantly >0, indicating large variability in the survival rates across facilities. The coefficient of variation for mortality across health facilities, showed mortality rates were more than twice as high (coefficient of variation = 2.18) in some facilities than the average. This difference could be due to varying resources in different facilities, training and availability of staff, and the quality of services. This high inequality needs to be addressed by the Ministry of Health to ensure equitable services to reduce HIV mortality for all Zambian CLHIV.

We also found that the risk of mortality was highest among infants and younger children initiated on ART, which is the period for highest risk of death for HIV negative children also. The risk of death reduced with increasing age. This could be due to non-adherence, sub optimal doses or exposure to perinatal antiretroviral therapy, which may lead to rapid disease progression and later death in infants and younger children (25). Another reason could also be that diagnosing HIV infection in infants is intricate and this can delay ART initiation in this age group (26). However, the proportion of children dying in this study did not account for over 50% of children analyzed in any age group showing improvements in survival compared to studies by Mulugeta et al. (5) and Tariro (6). Our study was in line with Studies in Asia and Africa that showed survival rates above 50% (84% to 97%) after 12 months of ART for children initiated on ART at a median age of 5–7 years, as well as another multi country study in Africa that also reported above 90% (93%) survival rates after 2 years of follow up on ART (18).

We found late initiation on to ART from time of HIV testing was predictive of reduced survival. Studies have shown that late presentation for early infant diagnosis (EID) services leads to poor prognosis (27, 28). This is also evidenced by findings of Bolton-Moore et al. (29) in earlier studies in Zambia. Our findings showed high mortality rates among children that initiated ART between 1 week and 1 month after HIV diagnosis. Current WHO guidelines call for same day ART initiation after HIV diagnosis which gives children the greatest benefit in survival (30). Our findings are in line with findings in a study done in South Africa that reported that 64% of deaths occurred within 3 months of ART initiation (15); a Nigerian study, where 81.3% and 84.4% of deaths occurred in the first 6 months and within 1 year, respectively (16) and an Ethiopian study that reported a high number of deaths within the first sixth months

of starting ART (31, 32). This pattern of increased mortality could suggest that HIV-infected children presented with poor immunological or clinical states at ART initiation or that CLHIV require some time before the benefits of ART are fully realized, especially when ART is initiated early (13, 18, 33). However, our study showed reducing mortality with increase in time to ART initiation above 1 months. This is similar to findings in a 5-year cohort study in Thailand where children that initiated ART in infancy had higher risks of mortality compared to children that initiated ART at older ages (18). The cohorts of children initiating ART at older ages, probably reflects those who survived with HIV, and therefore reflect a "survivor effect" of the older children (18).

The SmartCare database showed 3,265 children on ART died over the 13 years, but some deaths could have been missed and not reported to the health facility staff. In the SmartCare data, CD4 counts, weight and ART regime were not routinely recorded for every child but may only be recorded when triggered by a clinical event. We did not attempt to impute these data but instead carried out subgroup analyses in those children where the data are recorded to assess whether risk factors differed in the subgroups. The measures of advanced HIV diseases, low CD4 count, severe acute malnutrition, and WHO clinical stages 3 and 4 were all associated with reduced survival and this is seen in our results (13, 15). These results are consistent with earlier an study in Zambia which showed that mortality was higher in children with lower WAZ scores (29).

This study was not without limitations. The main purpose of the SmartCare data is for clinical care, treatment and follow up of CLHIV in health facilities where staff are overburdened. In these busy health care facilities vital information such as laboratory results and weight were not collected at every visit, resulting in a large number of missing data, which meant these factors could not be included in our main analysis The results for these variables should be interpreted with caution, as they did not represent all children. However, the consistency of the demographic risk factors across the subgroup analysis is encouraging. The SmartCare database may have missed some deaths when they did not occur at the health facility, as these might not have been reported, or entered into the data base, which could underestimate the mortality in children on ART. Moreover, loss to follow up of children initiated on ART is common among children on ART in Zambia and has been documented in earlier studies (20, 34).

## CONCLUSIONS

Children with HIV in Zambia are surviving much longer than was predicted before ART was introduced 14 years ago. This key finding adds to the literature on analysis of survival in CLHIV in low income settings like Zambia. However, survival was dependent on the age at which ART was initiated, the time on ART, type of health care facility used at ART initiation and the province where services were accessed. This highlights the need to increase investments in early infant diagnosis (EID) and make services more equitable across Zambia to ensure timely HIV testing and ART initiation for CLHIV.

## DATA AVAILABILITY STATEMENT

The Zambian Ministry of Health has the sole authority over the dataset that was used in this study, thus, it cannot be shared online. Any further information or interest to use the data should be addressed to the Zambian Ministry of Health (www.moh.gov.zm).

## REFERENCES


## AUTHOR CONTRIBUTIONS

TM, JT, and CM participated in the conception of the study, co-ordination, acquisition of data and drafted the manuscript. TM and JT carried out the statistical analysis. CM, PM, JT, and TM reviewed all the content related to interpretation of the findings and participated in the critical review and editing of the manuscript drafts for scientific merit and depth. All authors read and approved the final manuscript.

## FUNDING

This study was supported by the SEARCH (Sustainable Evaluation through Analysis of Routinely Collected HIV data) Project. We acknowledge funding by the Bill & Melinda Gates Foundation grant [OPP1084472] entitled 'Using routinely collected public facility data for program improvement in Tanzania, Malawi and Zambia'.

## ACKNOWLEDGMENTS

We would like to thank the Ministry of Health for granting access to the data that was analyzed. We would also like to thank the Bill & Melinda Gates Foundation for providing funding that made this study possible.

initiation in Southern Africa: the IeDEA-SA collaboration. PLoS ONE. (2013) 8:e81037. doi: 10.1371/journal.pone.0081037


urban Zambia: a retrospective cohort study. BMC Pediatrics. (2013) 10:54. doi: 10.1186/1471-2431-10-54


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor declared a shared affiliation, though no other collaboration, with several of the authors JT and PM within the last 2 years, at time of review.

Copyright © 2020 Munthali, Michelo, Mee and Todd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Misreporting of Patient Outcomes in the South African National HIV Treatment Database: Consequences for Programme Planning, Monitoring, and Evaluation

David Etoori <sup>1</sup> \*, Alison Wringe<sup>1</sup> , Chodziwadziwa Whiteson Kabudula<sup>2</sup> , Jenny Renju1,3 , Brian Rice<sup>4</sup> , F. Xavier Gomez-Olive<sup>2</sup> and Georges Reniers 1,2

*<sup>1</sup> Department of Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom, <sup>2</sup> MRC/WITS Rural Public Health Transitions Research Unit (Agincourt), Faculty of Health Sciences, School of Public Health, University of Witwatersrand, Johannesburg, South Africa, <sup>3</sup> Kilimanjaro Christian Medical University College, Moshi, Tanzania, <sup>4</sup> MeSH Consortium, Department of Public Health Environments and Society, Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, United Kingdom*

### Edited by:

*Marc Jean Struelens, European Centre for Disease Prevention and Control (ECDC), Sweden*

### Reviewed by:

*Philip Ayieko, University of London, United Kingdom Christen Ann Said, University of California, San Francisco, United States Deborah K. Glencross, University of the Witwatersrand, South Africa*

> \*Correspondence: *David Etoori david.etoori@lshtm.ac.uk*

### Specialty section:

*This article was submitted to Infectious Diseases – Surveillance, Prevention and Treatment, a section of the journal Frontiers in Public Health*

> Received: *25 April 2019* Accepted: *12 March 2020* Published: *07 April 2020*

### Citation:

*Etoori D, Wringe A, Kabudula CW, Renju J, Rice B, Gomez-Olive FX and Reniers G (2020) Misreporting of Patient Outcomes in the South African National HIV Treatment Database: Consequences for Programme Planning, Monitoring, and Evaluation. Front. Public Health 8:100. doi: 10.3389/fpubh.2020.00100* Background: Monitoring progress toward global treatment targets using HIV programme data in sub-Saharan Africa has proved challenging. Constraints in routine data collection and reporting can lead to biased estimates of treatment outcomes. In 2010, South Africa introduced an electronic patient monitoring system for HIV patient visits, TIER.Net. We compare treatment status and outcomes recorded in TIER.Net to outcomes ascertained through detailed record review and tracing in order to assess discrepancies and biases in retention and mortality rates.

Methods: The Agincourt Health and Demographic Surveillance System (HDSS) in north-eastern South Africa is served by eight public primary healthcare facilities. Since 2014, HIV patient visits are logged electronically at these clinics, with patient records individually linked to their HDSS record. These data were used to generate a list of patients >90 days late for their last scheduled clinic visit and deemed lost to follow-up (LTFU). Patient outcomes were ascertained through a review of the TIER.Net database, physical patient files, registers kept by two non-government organizations that assist with patient tracing, cross-referencing with the HDSS records and supplementary physical tracing. Descriptive statistics were used to compare patient outcomes reported in TIER.Net to their outcome ascertained in the study.

Results: Of 1,074 patients that were eligible for this analysis, TIER.Net classified 533 (49.6%) as LTFU, 80 (7.4%) as deceased, and 186 (17.3%) as transferred out. TIER.Net misclassified 36% of patient outcomes, overestimating LTFU and underestimating mortality and transfers out. TIER.Net missed 40% of deaths and 43% of transfers out. Patients categorized as LTFU in TIER.Net were more likely to be misclassified than patients classified as deceased or transferred out.

Discussion: Misclassification of patient outcomes in TIER.Net has consequences for programme forecasting, monitoring and evaluation. Undocumented transfers accounted for the majority of misclassification, suggesting that the transfer process between clinics should be improved for more accurate reporting of patient outcomes. Processes that lead to correct classification of patient status including patient tracing should be strengthened. Clinics could cross-check all available data sources before classifying patients as LTFU. Programme evaluators and modelers could consider using correction factors to improve estimates of outcomes from TIER.Net.

Keywords: HIV, retention in care, bias, South Africa, health information systems

## INTRODUCTION

At the end of 2017, it was estimated that there were 36.9 million people living with HIV (PLHIV) worldwide, with 70% of the disease burden situated in sub-Saharan Africa (1, 2). The World Health Organization's (WHO) revised HIV treatment guidelines in 2015 call for immediate provision of lifelong antiretroviral therapy (ART) to all people testing positive for HIV. By the end of 2017, 60% of PLHIV in sub-Saharan Africa were on ART (1, 2). Whilst ART initiation rates have been increasing over time, in order to reduce HIV transmission rates and achieve 90- 90-90 AIDS elimination goals, there is a need for accelerated increases in treatment adherence and retention in care (3–5). South Africa has the largest population of PLHIV worldwide, with an estimated 18.8% of the adult population aged 15–49 years old living with HIV, representing 7.2 million people (1, 2). By the end of 2018, an estimated 68% of PLHIV in South Africa were on ART (1, 2).

The rapid growth in access to ART has accentuated the need for an affordable and accurate way to monitor and evaluate treatment programmes (6–8), including documenting the number of people alive and on treatment, and programme impact on mortality. In the past, the progress of patients on ART was mainly monitored through patient cohorts (9) and tallying numbers of services rendered to inform resource allocation (8). However, evaluation of HIV programmes has proved challenging due to multiple data constraints. These include concerns about data reliability (8), and continued use of paper registers which often lack unique identifiers, suffer from incompleteness (10), and are cumbersome to use with increasing patient numbers and length of patient follow-up (11, 12). Another major concern is "silent transfers" whereby patients change clinics informally and without accompanying documentation, a phenomenon which has become more prevalent with the expansion of ART programmes (13, 14). As a result, there is concern that many high-burden countries are ill-equipped to report on the outcomes of patients in care and on treatment (6, 7, 15–17).

In order to address these concerns, many countries are scaling-up the use of electronic patient registers (11). However, challenges persist including insufficient linkages between clinics (10), insufficient training of staff who are responsible for entering this information (10), and staff shortages (18–21), resulting in some staff responsible for data management being stretched across multiple roles (22). This sometimes leads to poor workflow, and staff resistance which results in poor change management. Privacy and security issues (22) are also a major concern.

In 2010, South Africa adopted TIER.Net, a three-tiered monitoring approach involving paper registers (TIER 1 recommended for facilities with <500 patients), an offline electronic register (TIER 2—recommended for facilities with 500–2,000 patients) and networked electronic medical records (TIER 3—recommended for facilities with more than 2,000 patients) (11). This allowed for different tiers to be implemented in each facility based on the context and resources available at the time of implementation and typically involved a phased evolution, beginning with preparation for TIER.Net, installation and training, back capturing, live capturing and finally a live site able to produce monthly and quarterly reports with staff on-site to manage it. In 2014, an estimated 3,000 out of 4,000 public sector clinics in South Africa were using TIER.Net (23, 24) in one of the three phases of implementation. As of 2017, TIER 3 was still in its pilot phase (25).

ART patient outcomes have evolved since the start of national HIV treatment programmes. In several cohort studies of ART programmes in sub-Saharan Africa, there have been reports of higher rates of LTFU among patients who initiated ART in later years compared to earlier years (26–28). This may be explained by patients increasingly initiating treatment while less severely ill (29), as well as a negative consequence of patient numbers increasing such as facility workloads (30), raising concerns about the sustainability of these programmes. Some systematic reviews have shown that the percentage of patients LTFU who have died has decreased in later years as eligibility criteria have evolved to include less immunologically compromised patients, and as the proportion of patients LTFU has increased (13, 31). Furthermore, scale-up and decentralization of these programmes means ART can be offered at clinics closer to patients' homes, which may serve as an incentive to self-transfer in order to continue treatment at more convenient locations (13, 32).

Unpublished TIER.Net analyses from 2018 showed LTFU rates to range from 11 to 15% in the first three months and from 27 to 34% in the first year of ART (Y.Pillay, HIV Think-tank update, March 19, 2019). The high percentages of LTFU present many issues. Firstly, if these patients have really stopped ART then they have a higher mortality risk (33–35), and are more likely to transmit HIV (36–39). Given that patients that are LTFU have poorer outcomes, LTFU can also through misclassification bias event rates such as mortality downwards (40), leading to biased performance indicators for ART programmes. Accurate mortality rates are also important as they are used as parameters for projections such as in the UNAIDS spectrum package (41, 42).

We compare patient outcomes recorded in TIER.Net to the outcomes ascertained through a record review and tracing study for patients deemed lost to follow-up in eight public sector health facilities in rural north-eastern South Africa. We aimed to assess misreporting in TIER.Net and potential biases in the national programme statistics reported from the TIER.Net database.

## METHODS

## Setting

The Agincourt Health and Demographic Surveillance System (Agincourt HDSS) is located in rural north-eastern South Africa, in Mpumalanga province which has the second highest prevalence of HIV at 14.1% (43). HIV prevalence among people 15 years of age or older in the HDSS was estimated at 19.4% in 2010 (44). The Agincourt HDSS comprises of 31 villages covering an area of 475 square kilometers with an estimated population of 115,000 people (45, 46).

There are five primary health facilities and three secondary community health centers located within the Agincourt HDSS. Every HIV-positive patient has a clinical file that is opened when they first register at an ART clinic and updated at each clinic visit. Following the clinic visit, visit-level information from the patient file is entered into the national electronic database, TIER.Net. All health facilities routinely trace patients that are late for a scheduled clinic appointment. This tracing is done in conjunction with two non-profit organizations, Right to Care (RtC) and Home-Based Carers (HBC). Clinic staff must contact all patients first by phone and if this does not yield a satisfactory outcome, a home visit is organized. Patients are classified as lost to follow-up (LTFU) if they have not returned 90 days after their scheduled visit.

### Demographic Surveillance

Data collection aims to capture all demographic events for the Agincourt HDSS population. Fertility, mortality and migration data are based on a comprehensive household registration system that has been in operation since 1992. Following the baseline demographic surveillance survey in 1992 and three update rounds until 1998, the site has conducted annual surveys since 1999 (45, 47–49). Trained fieldworkers visit each household and interview the most knowledgeable adult available. During the visit, individual-level information on all household members is checked and updated and any events that have occurred since the last census round are recorded. Starting in 2017, data have been collected utilizing an electronic data collection system using tablets (50).

## Point-of-Contact Interactive Record Linkage (PIRL)

A key element of the data infrastructure for this study consists of HIV patient visit logs collected by a study fieldworker in the health facilities that provide ART in the area. This work started in April 2014 at seven government facilities and was extended in 2016 to include one additional health facility. In addition to logging patient visits, these records are linked to the Agincourt HDSS using a procedure that we have previously described as Point-of-Contact Interactive Record Linkage (PIRL) (51, 52). In brief, a fieldworker conducts a short uptake interview with patients in the waiting area of the clinic. Patients who consent are asked to declare a few personal identifiers that are used to search a local copy of the Agincourt HDSS database using a probabilistic algorithm. Matches are confirmed in interaction with the patients, and the names of other household members are used as a key attribute to adjudicate between possible matches.

## Record Review and Tracing Study

Through the PIRL database, we identified patients who were more than 90 days late for a scheduled clinic appointment from HIV services on August 15, 2017 (the date of data extraction) at any of the eight health facilities located in the Agincourt HDSS area. These patients were recruited into a cohort and followed up to ascertain their treatment and vital status i.e., whether they were still alive.

All PLHIV aged 18 years or over, who had ever declared residency in Agincourt HDSS, and had enrolled in the HIV treatment and care programme since 2014 (after the Agincourt HDSS record linkage was established at the health facilities) were eligible for inclusion in the study.

Trained and supervised fieldworkers conducted a thorough record review, comparing the list of patients LTFU against (a) TIER.Net (b) patient clinic files, and (c) logbooks kept by the RtC and the HBC. The PIRL database was also reviewed for duplicate patients (different clinical records linking to the same individual in the Agincourt HDSS database, which was taken as evidence of silent transfers), and residency and vital status were extracted from the Agincourt HDSS database. This was done on a case-by-case basis.

HBC conducted a visit to the households of all patients for whom a definitive outcome (defined as death, transfer out, stopped ART, migrated, re-engaged in care, and alive with ART status unknown) could not be established, or for whom routine patient tracing was not done. Finally, all patients who remained LTFU after the HBC visit, were searched for in TIER.Net databases at clinics in close proximity to their home residence to capture any further silent transfers.

We also reviewed the records for a stratified random sample of 162 patient records who were not LTFU as of August 15, 2017, in order to assess whether TIER.Net misclassified any patients that were still in care. This sample was chosen to include 18 patients from every clinic (six men, six non-pregnant women, and six women who initiated ART while pregnant) with the exception of one clinic which had recently merged with another, and from which we sampled 18 patients who had enrolled whilst in each of the clinics prior to the merger.

## Definitions

Definitions of terms used in this article are provided in **Table 1**.

## Statistical Analyses

For patients included in the record review and tracing study, we calculated counts and proportions for socio-demographic and baseline clinical characteristics, TIER.Net treatment status, the final outcome, and cross-tabulated TIER.Net status and the final outcome.

To assess the degree and direction of mis-reporting of patient outcomes in TIER.Net, we graphically present TIER.Net treatment status and the final outcome proportions by some selected patient characteristics. A Pearson's chi-square test was

### TABLE 1 | Definitions of terms used.


used to compare whether TIER.Net treatment status and the final outcome varied by all the categorical variables. We also present a cross-tabulation of patient outcomes from the two sources.

A binary outcome variable was created to indicate whether TIER.Net had misclassified a patient's treatment status, with a second outcome created to identify whether the patient was recorded as LTFU in TIER.Net. All cases where an electronic record could not be found were removed from further analysis.

To explore factors associated with misclassification in TIER.Net, we ran bi-variate analyses with patient-level treatment characteristics, demographic characteristics and facility-level characteristics. All variables with p < 0.1 were included in the multivariable logistic regression model. A parsimonious model was achieved using Wald tests. This same procedure was followed in order to understand what factors were associated with being reported LTFU in TIER.Net. All analyses were conducted in Stata 15 (53) and all data visualization was done using R (54).

### Ethics

Ethical approval was obtained from the London School of Hygiene and Tropical Medicine, the University of Witwatersrand and the Mpumalanga Department of health.

## RESULTS

## Database Population Characteristics

Over the study period, 4,089 patients were added to the PIRL database and met the inclusion criteria. Of these 4,089, 1,325 (32.4%) met the LTFU criteria and were eligible for inclusion into the record review and tracing study. Of these 1,325 patients, 166 (12.5%) did not have an ART initiation date and were assumed to be pre-ART. Further investigation of these 166 patients found 46 (27.7%) had initiated ART after record linkage, 59 (35.5%) were genuine pre-ART patients and 61 (36.7%) had initiated ART before record linkage began. These 61 patients were excluded from further analyses. Of the remaining 1,264 patients, 190 (15.0%) were found to have data errors (mostly due to missing clinic visits in the PIRL database) and were excluded from further analysis (**Supplementary Figure 1**).

Of 1,074 remaining patients, 280 (26.1%) initiated ART for prevention of mother-to-child transmission (PMTCT), 737 (68.6%) met the ART initiation criteria for non-pregnant adults, and 57 (5.3%) had not initiated ART yet (pre-ART).

Thirteen (8.0%) of the 162 patients still in care were excluded from the analysis because they had not declared residency in the HDSS. The remaining 149 from the random sample of patients still in care were also assessed to see if misclassification also occurred for those who remained engaged in care (**Table 2**).

## TIER.Net Treatment Status

Of the 1,074 patients who remained eligible for this analysis, 533 (49.6%) were categorized as LTFU, 222 (20.7%) as still in care, 186 (17.3%) as transferred out, 80 (7.5%) as deceased, and 53 (4.9%) could not be found in the TIER.Net database (**Table 2**).

There was a statistically significant difference (all p < 0.001) in the TIER.Net treatment status by sex, age, ART initiation status and reason, year of ART initiation, baseline CD4 count, time on ART, clinic visit schedule, health facility, and time since a missed appointment (see **Supplementary Figure 2**). Women who initiated ART for PMTCT were less likely to be categorized as deceased and more likely to be LTFU. All 149 patients sampled as still in care were also reported as still in care in TIER.Net.

## Outcomes After Record Review and Tracing Study

Of the 1,074 patients who remained eligible for this analysis, 326 (30.3%) were found to have transferred to another clinic, 234 (21.8%) to have re-engaged in care, 132 (12.3%) were deceased,

### TABLE 2 | Database population characteristics.


Etoori et al. Misreporting HIV Patient Treatment Outcomes

### TABLE 2 | Continued


*Not in sample: All patients eligible for the study but not LTFU in the PIRL database and not included in the still in care sample; Sample: All patients included in the study (149 still in care, 1,264 LTFU* = *1 074 really LTFU* + *190 data errors); Data error: Patients included as LTFU but found to be still in care and* <*90 days late for their last appointment; For ART start year data from 2017 reflects number of ART initiations up to mid-August when data extraction occurred.*

TABLE 3 | A cross-tabulation of TIER.Net treatment status and the final outcome.


117 (10.9%) were alive with ART status unknown, 81 (7.5%) were alive but not on treatment, 53 (4.9%) had migrated to another place of residence, and 131 (12.2%) were still LTFU (**Table 3**). These outcomes differed (all p < 0.001) by sex, age, ART initiation status and reason, baseline CD4 count, time on ART, clinic visit schedule, health facility, whether the patient record was successfully linked to an Agincourt HDSS record,

and time since a missed appointment (some selected variables illustrated in **Figure 1**).

## Differences Between TIER.Net Treatment Status and Final Outcomes

Records of deceased or transferred out patients documented in TIER.Net aligned with patients' final outcome (i.e., no inaccuracies found for these two statuses). However, TIER.Net misclassified 52 (39.4%) of 132 deaths. Of these 52, 38 (73.1%) were classified as LTFU, 6 (11.5%) as still in care, and 8 (15.4%) were not found in the system at all.

TIER.Net also misclassified 53 patients as still in care. Of these, 10 (18.9%) were found to be LTFU, 16 (30.2%) to have transferred, 12 (22.6%) as alive with unknown ART status, 8 (15.1%) alive but not on treatment, 6 (11.3%) to have died, and 1 (1.9%) to have migrated to another place of residence. TIER.Net correctly captured 186 (57.1%) of 326 transfers.

Of 533 patients classified as LTFU by TIER.Net, 116 (21.8%) were found to have transferred to another clinic, 70 (13.1%) to be alive but not on treatment, 47 (8.8%) to have migrated to another place of residence, 38 (7.1%) to have died, and 56 (10.5%) to have re-engaged in care (38 of whom were resolved by new visit data in the PIRL database and so it is possible that their TIER.Net status could have also changed back to still in care) (**Figure 2** and **Supplementary Figure 3**).

As patients classified as LTFU in TIER.Net were more likely to be misclassified we report on the factors associated with being classified as LTFU in TIER.Net (**Supplementary Data** and **Supplementary Table 1**).

## Factors Associated With Misclassification

In the multivariable model (**Table 4**), men (OR: 1.47, p = 0.021) had higher odds of misclassification when compared to women who initiated ART for non-PMTCT reasons (CD4, WHO stage, tuberculosis coinfection). Higher baseline CD4 (CD4 100- 199 OR: 1.95. p = 0.002, CD4 ≥ 500 OR: 1.81, p = 0.014) was also associated with higher odds of misclassification when compared to patients who initiated treatment with CD4 < 100. Health facility also remained statistically significant suggesting that facility level variability plays a role in misclassification. Patients who were linked to an Agincourt HDSS record in the PIRL database (OR: 2.09, p < 0.001) were more likely to be misclassified. Finally, patients who were between 1 and 2 years late (OR: 1.62, p = 0.001) were more likely to be misclassified. Older age (30–44 years OR: 0.73, p = 0.046, 45–59 years OR: 0.63, p = 0.046) was associated with lower odds of misclassification and patients on longer refill schedules (>3 months OR: 0.31, p = 0.009) were less likely to be misclassified.

## DISCUSSION

In this paper, we described the discrepancies between the treatment, vital and residency status of HIV patients enrolled in care between April 2014 and August 2017 in a rural South African setting as recorded in the national treatment database (TIER.Net) and their treatment outcome following a comprehensive record review and tracing study.

We found that TIER.Net misclassified 36% of the patient outcomes. ART initiation reason, baseline CD4, health facility attended, PIRL linkage, time since the last appointment, age, and ART refill schedule were all found to be significantly associated with misclassification. TIER.Net underestimated mortality and overestimated the number of patients who were LTFU. Seventynine percent of patients classified as LTFU in TIER.Net had a final outcome ascertained, mirroring findings from a systematic review of low and middle income country ART programmes which found that tracing generated higher estimates of mortality and lower estimates of LTFU (55). Our findings show that LTFU is still an important problem in ART programmes in this setting, even with routine patient tracing in place. TIER.Net also missed 43% of transfers with these silent transfers being the biggest contributor to misclassification among those documented as LTFU. We also found that 21.8% of patients had re-engaged in care, a phenomenon that was previously not well understood, but which is now increasingly recognized as becoming a common feature of ART programmes (55). Using our findings to

revise LTFU figures to reflect re-engagement may help improve programme evaluation and forecasting.

In our study, we found that 40% of deaths were missed by TIER.Net, indicating that mortality of ART patients would be underestimated if relying on this data source. Given the role that national statistics play in HIV/AIDS projections (41, 42, 56), our findings suggest a need for correction factors for the estimates of the effect of ART on mortality. Although South Africa has a good vital registration system in place (57), these data are not currently linked to clinic-based information. However, with the move to registering patient national ID numbers, clinics should consider matching patients that are LTFU to the national death registry and other available databases such as the national health laboratory services database to ascertain vital status as this has proved useful in other studies in South Africa (58–60). Clinics in the Agincourt HDSS study area and other HDSS sites could also consider using vital status data from annual demographic surveillance to ascertain vital status for all patients.

The number of patient transfers to another clinic that were missed in TIER.Net suggests that communication between clinics is sub-optimal and that the current system for transferring patients between clinics can be improved. With studies reporting patient fear and concern about provider reactions if they return to care after a treatment interruption (61, 62) it is possible that some patients considered it less stressful to self-transfer or restart treatment at another nearby clinic, rather than returning to the facility where they had initiated treatment. These silent transfers could lead to double counting of patients currently on treatment, the second of the 90-90-90 targets, potentially suggesting that the programme is performing better than it is. Furthermore, given that the national treatment programme relies on data from TIER.Net to plan and procure ART based on active patient numbers, misclassification in the database, and more specifically double-counting due to silent transfers may lead to inaccurate drug forecasts and misestimation of medicines and other commodities at the national level. This bias will only increase as the South African ART programme expands with more patients potentially moving into new clinics closer to their homes and more people initiate ART with the move to test and treat. Future work will consider how application of correction factors from this research would change programme statistics and drug forecasts.

It is also important to consider the risk that silent transfers pose with regards to drug resistance, as this misclassifies treatment experienced patients as treatment naïve and may lead to patients being offered regimens that have lost their optimal therapeutic benefit. This is particularly concerning because resistance testing is not commonly used in these settings, and can potentially lead to increases in levels of transmitted drug resistance (63). Better referral systems, patient education, regular information exchange between clinics, and provider training (64), could improve recording of transfers and clinic staff attitudes toward less adherent patients. The WHO also recommends enforcement of unique identifiers as paramount to improve patient safety, improve the efficient use of programme resources by reducing duplications, and to improve programme monitoring and evaluation (65). With national IDs becoming mandatory at clinic registration, information exchange could prove useful in identifying silent transfers. This should also become less of an issue when TIER.Net is upgraded to a fully networked database.

We found several factors to be associated with misclassification of outcome in TIER.Net, with older age and longer ART refill schedules found to be protective factors. Older patients were less likely to be classified as LTFU in TIER.Net which probably explains why they were subsequently



less likely to be misclassified. Given that longer ART refill schedules are synonymous with previous good adherence (66), these patients accounted for 11% of the patients LTFU and were also more likely to have re-engaged in care, a category that contributed very little to misclassification. Patients whose clinic record was successfully linked to an Agincourt HDSS record in the PIRL database were more likely to be misclassified. They were also more likely to have been resolved which could explain this association. Health facilities were also positively associated with misclassification, with the facilities with the highest proportion of patients classified as LTFU in TIER.Net being more likely to misclassify patients. Two of these clinics also had issues with routine tracing, with one clinic not undertaking any physical tracing at all, emphasizing an additional benefit of routine tracing. Finally, patients who had been LTFU for a longer duration were more likely to be classified as LTFU in TIER.Net, more likely to have transferred to another clinic, and less likely to have re-engaged in care which probably explains their higher likelihood of misclassification.

This analysis has several limitations. Firstly, TIER.Net was only consulted at a specific point in time. The cross-sectional nature of TIER.Net outcomes means that some may have changed, but we would have no way to ascertain this. We checked TIER.Net 12 months after the initial record review for all patients whose outcome after record review and tracing was still LTFU and 85% of the outcomes had not changed. However, for patients whose final outcome was resolved through new visit data in the PIRL database, it is likely that their TIER.Net outcome also changed. It is also possible that some of the patients categorized as LTFU in TIER.Net are due to the rigidity of the system as TIER.Net only allows for four possible outcomes; still in care, transferred out, LTFU and deceased (11, 67, 68). It is possible that for some patients, their outcomes were ascertained, but the rigidity meant that they could not be recorded in the database and may call for the inclusion of other possible outcomes in the database. The exclusion of patients for whom an electronic record could not be found from the multivariable analyses might bias our findings. However, given the relatively small number we expect that this bias is fairly small. Finally, we did not adjudicate causes of death, so it is possible that patients died from causes other than those related to HIV/AIDS. A strength of this study is that we attempted to trace all patients that were LTFU and not a sample. Therefore, the findings might be more generalizable to other settings. The multiple methods, data sources and levels of follow-up used to trace patients are also a strength.

In conclusion, although TIER.Net misclassified 36% of patient outcomes, this reflects the various challenges with the processes and upstream factors that lead to this misclassification and calls for their improvement rather than the utility of the database itself, as patients classified as LTFU were most likely to be misclassified. Clinics should consider training staff about ascertaining patient outcomes, putting more emphasis into patient tracing and using other data sources such as the national death register to improve ascertainment of patient treatment outcomes. For policy and planning purposes, programme evaluators should consider using correction factors to improve the accuracy of estimates from TIER.Net.

## DATA AVAILABILITY STATEMENT

The data used in these analyses are not yet publicly available as they currently being utilized for the first author's Ph.D. research. They will be deposited with the Agincourt HDSS data manager and made available on request at the end of his Ph.D. in 2021. Data from the PIRL database are also available by making a data request to the Agincourt HDSS data manager.

## ETHICS STATEMENT

Ethical approval was obtained from the London School of Hygiene and Tropical Medicine, the University of Witwatersrand and the Mpumalanga Department of health.

## AUTHOR CONTRIBUTIONS

The study was conceived by DE, AW, and GR. Fieldwork was planned and executed by DE, FG-O, and CK. Data collection was supervised by DE. Analyses were conducted by DE with input from all authors. The manuscript was drafted by DE with input

## REFERENCES


from BR, JR and all the other authors. All authors contributed to the interpretation of the findings and read and approved the final manuscript.

## FUNDING

This study was made possible with support from the Economic and Social Research Council (ES/JS00021/1), the Bill and Melinda Gates Foundation for the MeSH Consortium (OPP1120138), the Bill and Melinda Gates Foundation ALPHA grant (OPP1164897), and the MRC SHAPE UTT grant (MR/P014313/1).

## ACKNOWLEDGMENTS

The authors would like to thank all the participants in the study.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh. 2020.00100/full#supplementary-material


Sub-saharan Africa. J Acquir Immune Defic Syndr. (2017) 75(Suppl 2):S115– 22. doi: 10.1097/QAI.0000000000001343


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer PA declared a shared affiliation, with no collaboration, with the authors to the handling editor at the time of review.

Copyright © 2020 Etoori, Wringe, Kabudula, Renju, Rice, Gomez-Olive and Reniers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.