Accounting for clustering and attrition for self-reported outcomes in the design and analysis of population-based surveys: A case study of estimation of prevalence of epilepsy in Nairobi, Kenya

Mwanga, Daniel; Kipchirchir, Isaac  C; Muhua, George  O; Newton, Charles; Kadengye, Damazo  T

doi:10.3389/frma.2025.1583476

ORIGINAL RESEARCH article

Front. Res. Metr. Anal.

Sec. Research Methods

Volume 10 - 2025 | doi: 10.3389/frma.2025.1583476

Accounting for clustering and attrition for self-reported outcomes in the design and analysis of population-based surveys: A case study of estimation of prevalence of epilepsy in Nairobi, Kenya

Provisionally accepted

Daniel Mwanga^1*

Isaac C Kipchirchir²

George O Muhua²

Charles Newton³

Damazo T Kadengye⁴

¹African Population and Health Research Center (APHRC), Nairobi, Kenya
²Department of Mathematics, Faculty of Science, University of Nairobi, Nairobi, Kenya
³Nuffield Department of Psychiatry, University of Oxford, Oxford, United Kingdom
⁴Data Synergy and Evaluations, African Population and Health Research Center (APHRC), Nairobi, Kenya

The final, formatted version of the article will be published soon.

Population-based surveys are common for estimation of important public health metrics such as prevalence. Often, survey data tend to have a hierarchical structure where households are clustered within villages or sites and interviewers are assigned specific locations to conduct the survey. Self-reported outcomes such as diagnosis of diseases like epilepsy present more complex structure, where interviewer or physician-related effects may bias the results. Standard estimation techniques that ignore clustering may lead to underestimated standard errors and overconfident inferences.. This paper examines these effects for the estimation of the prevalence of epilepsy in a two-stage population-based survey in Nairobi and discusses an approach on how clustering can be taken into account in design and analysis. We used data from the Epilepsy Pathway Innovation in Africa project conducted in Nairobi and simulated attrition levels at 10% and 20% assuming missing at random (MAR) mechanism. Attrition was accounted for using sequential k-nearest neighbor method. We adjusted the expected prevalence based on clustering at multiple levels, such as site, interviewer and household using a random effects model. Intraclass correlation (ICC)>0.1 indicated presence of substantial clustering. We report point estimates with 95% confidence interval (CI). Crude prevalence of epilepsy was 9.40 cases per 1,000 people (95% CI: 8.60–10.20). There was substantial clustering at household level (ICC=0.397), interviewer level (ICC = 0.101) and site level (ICC = 0.070). Prevalence adjusted for clustering at household, interviewer and site was 9.15/1,000 (95% CI 7.11-11.20). Overall, clustering increased the standard error of the estimates and attrition led to underestimation of prevalence when not addressed. Imputation methods can mitigate this bias under appropriate assumptions. Accounting for clustering, particularly household, interviewer and site levels, is critical for valid estimation of standard errors in population-based surveys. Rigorous training and pre-survey testing can minimize measurement error in self-reported outcomes. Attrition can lead to underestimation of prevalence if not properly addressed. Attrition bias can be minimized by conducting targeted mobilization of participants to improve response rates and using statistical methods such as multiple imputation or machine learning-based imputation methods to address it.

Keywords: Prevalence, Epilepsy, Interviewer effects, clustering, hierarchical structure, Multi-level modeling

Received: 25 Feb 2025; Accepted: 24 Jul 2025.

Copyright: © 2025 Mwanga, Kipchirchir, Muhua, Newton and Kadengye. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Daniel Mwanga, African Population and Health Research Center (APHRC), Nairobi, Kenya

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.