Attrition in the European Child Cohort IDEFICS/I.Family: Exploring Associations Between Attrition and Body Mass Index

Attrition may lead to bias in epidemiological cohorts, since participants who are healthier and have a higher social position are less likely to drop out. We investigated possible selection effects regarding key exposures and outcomes in the IDEFICS/I.Family study, a large European cohort on the etiology of overweight, obesity and related disorders during childhood and adulthood. We applied multilevel logistic regression to investigate associations of attrition with sociodemographic variables, weight status, and study compliance and assessed attrition across time regarding children's weight status and variations of attrition across participating countries. We investigated selection effects with regard to social position, adherence to key messages concerning a healthy lifestyle, and children's weight status. Attrition was associated with a higher weight status of children, lower children's study compliance, older age, lower parental education, and parent's migration background, consistent across time and participating countries. Although overweight (odds ratio 1.17, 99% confidence interval 1.05–1.29) or obese children (odds ratio 1.18, 99% confidence interval 1.03–1.36) were more prone to drop-out, attrition only seemed to slightly distort the distribution of children's BMI at the upper tail. Restricting the sample to subgroups with different attrition characteristics only marginally affected exposure-outcome associations. Our results suggest that IDEFICS/I.Family provides valid estimates of relations between socio-economic position, health-related behaviors, and weight status.


INTRODUCTION
Epidemiological cohort studies are not only prone to nonresponse at baseline, but also to drop-out of participants during follow-up (1), called cohort attrition. Since non-response and drop-out are more likely among less healthy and disadvantaged study participants (2)(3)(4), it is especially important for cohort studies to assess selection effects. IDEFICS (Identification and prevention of dietary and lifestyle-induced health effects in children and infants) (5) and I.Family (IDEFICS/I.Family cohort) (6) is a large European prospective cohort including children from eight countries (Belgium, Cyprus, Estonia, Germany, Hungary, Italy, Spain, and Sweden) that has been investigating dietary, behavioral and socioeconomic factors in relation to non-communicable chronic diseases and disorders with a focus on overweight and obesity (5,6). In IDEFICS/I.Family, a total of 16,228 children and their parents took part in up to three physical examinations between 2007 and 2014 and completed questionnaires on medical history, dietary behavior and other aspects of children's life. The present analysis complements the IDEFICS/I.Family cohort profile (5,6). We extend the attrition analysis that included only the first follow-up examination (7) and we build on the observed selection effects at baseline (8) and the association between recruitment effort and dropout (9). Here we investigate the association of cohort attrition with sociodemographic characteristics, weight status, and study compliance in IDEFICS/I.Family ("study compliance" marks how far child and parents undertook all the requested measures and questionnaires). We also consider variations of attrition across the first and second follow-up and between the participating countries, focusing on selection effects by children's weight status.

Analysis Group
In IDEFICS/I.Family, data were collected in each country in two or more selected communities. The sociodemographic profile and infrastructure of the communities were similar and typical for their region. All children aged 2-9.9 years attending kindergarten or primary school within each community were eligible. Parents of potential study subjects were either approached directly by mail or by letters delivered through teachers and caretakers in kindergartens and schools. They were asked for consent to examine their children as well as to answer a number of questionnaires. Children and parents were informed about all aspects of the study. Parents gave their written informed consent prior to inclusion into the study; children 12 years or older signed a simplified consent form. Immediately before each examination, a study nurse informed each child orally about the module using a simplified preformulated text. Children were informed that they do not have to participate if they don't want to and examinations were only performed if children assented and parents consented. Consent could be given to single components of the study while refusing others.
All procedures performed in IDEFICS/I.Family were in accordance with the ethical standards of the institutional committee and the 1964 Declaration of Helsinki and its later amendments. Approval was obtained by each of the centers engaged in the fieldwork by its appropriate ethics committees. To ensure that data collection and study parameters were similar between countries, a common manual of operations containing standard operating procedures for all examinations was developed, and site visits were conducted in all study centers by a central quality control to ensure compliance.
In total, 16,228 children participated in the IDEFICS baseline examination (T0), carried out between September 2007 and May 2008 (Figure 1). All children who took part in the baseline examination were invited to the first follow-up (T1) between September 2009 and May 2010 where 11,041 children participated. Baseline and first follow-up included identical examination modules.
A second follow-up examination (I.Family, T3) was conducted between 2013 and 2014, again with similar examination modules (6). Children who participated at baseline, their siblings, and their parents were invited to take part in I.Family, and a total of 6,055 IDEFICS children were examined. Of the 11,041 children examined at the first follow-up, 5,097 children took part in I.Family. In addition, 958 children took part in I.Family who participated at baseline, but not in the first follow-up. Due to model constraints, these children were considered first followup drop-outs, that is, only baseline data were included in the analysis. A complete-cases analysis reduced the sample size to 15,618 children at baseline, 10,314 children at the first followup, and 4,852 children at the second follow-up. This resulted in a total of 25,932 person-wave observations at baseline and the first follow-up being included in the analysis. Baseline characteristics Note that percentages may not sum to 100% due to rounding. SD, standard deviation.
Frontiers in Pediatrics | www.frontiersin.org of the IDEFICS/I.Family baseline sample and the subsamples that participated in the two follow-ups are summarized in Table 1.

Outcome
The outcome cohort attrition was defined with respect to participation in the first (T1) and the second (T3) follow-up examination (0: participation vs. 1: dropout).

Exposures
The social position of families was classified according to the International Standard Classification of Education (ISCED) (10) using the highest educational attainment of mother or father (low: ISCED levels 0-2; medium: ISCED levels 3-4; high: ISCED levels 5 and higher). The household composition was described as the presence of non-adult siblings besides the participating child (yes vs. no) and the number of adults (age 18 or older) living in the household. The place of birth of parents served to define the migration background (full migrant: both parents foreignborn; partly migrant: one parent foreign-born; not migrant: otherwise). Children's age and mother's age on the day of the examination was recorded in years. For drop-outs at the first or second follow-up, children's, and mother's age was estimated by adding the mean duration between two examinations to the age at the previous examination. Because of collinearity and a higher percentage of missing values, the father's age was not considered in the analysis. The weight status was determined using the body mass index (BMI). Children's weight status (thin and normal weight, overweight, obese) was categorized according to Cole and Lobstein (11). Parent's weight status (selfreported) was categorized as "no parent overweight, " "at least one parent overweight, " and "missing." Overweight was defined as having a BMI ≥25. A score of study compliance was constructed separately for children and parents based on the number of key examination modules they participated in at baseline and at first follow-up ( Table 2). This was done by counting the number of completed modules (0: module not completed; 1: module completed). For children, key modules were blood pressure, bioelectrical impedance analysis (fasting state), waistto-hip ratio, skinfold thickness (subscapularis and triceps), blood sample (fasting state), morning urine, and saliva. Parent's (respectively mother or father) provided key modules included the general questionnaire, food frequency questionnaire, medical history, and the 24-h dietary recall. At the first follow-up, the collection of saliva was restricted to children without a saliva sample at baseline. Therefore saliva was defined as being available at first follow-up if a sample was available at baseline or first follow-up.

Statistical Analysis
The association between attrition and sociodemographic variables, weight status, and study compliance was assessed by estimating odds ratios (ORs) and 99% confidence intervals (CIs) using a multivariable multilevel logistic regression with respondents as the second-level variable and country as the thirdlevel to account for clustering (12). To avoid that meaningless associations become statistically significant just because of the large sample size and to account for multiple testing of a The collection of saliva at the first follow-up was restricted to children without a saliva sample at baseline. Therefore saliva was defined as being available at first follow-up if a sample was available at baseline or first follow-up.
associations a more stringent criterion for statistical significance (α = 0.01) was chosen. Data were transformed such that each unit of analysis represented a person-wave observation (13,14).
Variables included in the model were either time constant (e.g., sex of the child), or time-variant predictors (e.g., weight status of the child). Time-variant predictors were modeled as lagged covariates, that is, information at baseline was regressed on attrition at first follow-up and information at the first follow-up was regressed on attrition at the second follow-up. Sensitivity analyses were carried out to check for non-independence of siblings in the sample. Random sampling (n = 100) was used to select one child of each family and calculate a random intercept logistic regression model for each sample to obtain a mean odds ratio and a corresponding confidence interval for each predictor. The odds ratios of a logistic regression model with all children and the mean odds ratios for the 100 samples did not differ substantially. To assess the variation of attrition across time in separate models all possible interaction terms between potential predictors of attrition and time point of follow-up examination were calculated [time × (sex of child, age child, weight status child, compliance score of child, compliance score parent(s), mother's age, weight status parents, migration background, educational level, number of adults in household, siblings aged <18 years, and region)]. The heterogeneity between the countries was investigated by means of meta-analyses: Country-stratified logistic regression models with attrition as the dependent variable and the same predictors as in the random intercept logistic regression model were fitted and a random-effects meta-analysis (RE model) (15) was calculated for each predictor of the country-stratified logistic regression models. To evaluate the heterogeneity of attrition between the countries, the percentage of variation that is due to heterogeneity, I 2 (16), and forest plots were used. Selection effects on children's BMI across time were assessed with quantile-quantile plots (Q-Q plots) and Kolmogorov-Smirnov tests (KS test) (17). We explored the impact of selection effects on the cross-sectional association of social position and weight status. Children's weight status was converted into a binary variable (0: normal weight including thin vs. 1: obese including overweight) further referred to as overweight/obesity. Social position included educational level (as described above) and income level (low, low/medium, medium, medium/high, vs. high income). We estimated baseline associations and then estimated identical associations with subsamples restricted to first follow-up participants (T1) and second follow-up participants (T3) as well as associations at the first follow-up (T1) and the restricted sample of second follow-up participants (T3). In addition, we explored selection effects on the association between adherence to key messages of a healthy lifestyle promoted by IDEFICS/I.Family and overweight/obesity published by Kovacs et al. (18). In this analysis we included total screen time, moderate to vigorous physical activity (MVPA), and sleep duration as measures of adherence [see (18) for detailed information on instruments and operationalization].
In accordance with Kovacs et al. (18), we calculated a binary indicator for adherence on respective cut points for screen time, MVPA, and sleep. We estimated the baseline association of adherence and overweight/obesity and then estimated the identical association with subsamples restricted to first follow-up participants (T1) and second follow-up participants (T3). For the exposure-outcome association of adherence to key messages of a healthy lifestyle and overweight/obesity, as well as social position and overweight/obesity we estimated odds ratios and confidence intervals with multivariable multilevel logistic regression models. For the sake of comparability we used 95% confidence intervals in the analysis reproducing the association of adherence to key messages and overweight/obesity published by Kovacs et al. (18) (described above). All other analyses, as pointed out above, utilized 99% confidence intervals.
To quantify a potential bias we calculated the percent change in point estimates (CPE = OR subsample /OR full sample × 100 -100). We considered a CPE of above 10% as indicator of a bias. For Table 6 we stratified overweight/obesity by the combination of adherence to key messages regarding media consumption, physical activity and sleep. Children who did not adhere to the recommendations of screen time and physical activity and sleep duration were assigned to the group ---(1,666 children in T0 full sample). In contrast, children who did adhere to all recommendations of screen time and physical activity and sleep duration were assigned to the group + + + (263 children in T0 full sample). Children who adhered only to some of the recommendations were assigned accordingly. A full description of the analysis is given in Kovac et al. (18). Analyses were performed using R version 3.3.3 (http://www.rproject.org/).

RESULTS
The multilevel logistic regression model with cohort attrition as dependent variable ( Variations of attrition across time are depicted in Figure 2 as probabilities predicted from separate random intercept logistic regression models containing interaction terms between potential predictors of attrition and time point of follow-up examination. Age of child was not associated with attrition at the first follow-up but was positively associated with attrition at the second follow-up. Higher parent's study compliance was associated with lower attrition at the second follow-up, but was not associated with attrition at the first follow-up. A higher age of the mother was associated with lower attrition at first follow-up but not at the second follow-up. To assess how well the model represented data of individual countries, we explored with forest plots whether single countries differed notably from the overall pattern, that is, whether the sign of a countries' odds ratio for a given exposure variable differed from the pooled estimate (Figure 3). For 14 out of 17 predictors, estimates for all countries were in line with the pooled estimate. Female children in Belgium had a lower chance of attrition, whereas no association of sex was found for the pooled estimate. A medium educational level was associated with a lower chance of attrition in Italy, while the pooled estimate indicated a higher chance of attrition for a low or high educational level. Further, children from the control region in Belgium had a higher chance of attrition while no association for the region was evident in the pooled estimate. Between countries, substantial heterogeneity was observed for study compliance of children, weight status (overweight/obese; I 2 from 50 to 70%), age of the child, study compliance of parents, full migrant status, low or medium education, and control region (I 2 from 70 to 100%). Sensitivity analyses showed that exclusion of country-stratified odds ratios identified as exceptions attenuated I 2 : Excluding Belgium decreased I 2 to zero for the predictor female and decreased I 2 for the control region; excluding Italy decreased the I 2 of low education.
Since IDEFICS/I.Family was a multi-purpose cohort focusing on overweight and obesity, we further investigated selection  effects of children's BMI. BMI distributions for all children and the corresponding BMI distributions for children that did not drop out at a particular follow-up are displayed in Figure 4, column 1-3. The histograms of BMI at baseline and BMI at baseline without the children that dropped out at the first followup differed in the number of observations per bin but the Q-Q plot as well as the KS test (2 sided P-value of 0.30) (Figure 4B, column 1) indicated equal distributions. Similar results were obtained for the distribution of BMI at the first follow-up and the resulting distribution when second follow-up drop-outs were excluded (KS test: 2 sided P-value of 0.73) (Figure 4B, column 2) as well as for the distribution of BMI at baseline and the corresponding distribution without second follow-up dropouts (KS test: 2 sided P-value of 0.76) (Figure 4B, column 3). Density scatter plots with children's BMI at baseline plotted against BMI at the first follow-up (respectively BMI at first follow-up vs. BMI at second follow-up; BMI at baseline vs. BMI at second follow-up) and ß coefficients of linear regression models were used to evaluate selection effects of BMI across time ( Figure 4C). The correlation between children's BMI at different time points was consistent across time, both in the shape of the scatter plot and the ß coefficients (baseline vs. first follow-up: ß = 1.15, R 2 = 0.79; first follow-up vs. second follow-up: ß = 1.14, R 2 = 0.76; baseline vs. second follow-up: ß = 1.28, R 2 = 0.57).
We explored the impact of selection effects due to the association between childhood overweight and social position [e.g., (19); for a review, (20)] at baseline and both follow-ups. To this end, we estimated associations between BMI and social variables in the complete baseline sample and compared them to associations between the same variables in two subsamples restricted to participants of the first follow-up and participants of the second follow-up, respectively ( Table 4). We repeated this FIGURE 3 | Odds ratios for attrition (with 99% confidence intervals, CI) from country-stratified logistic regression models (ordered by baseline response) with attrition as dependent variable and same predictors as in the random intercept logistic regression model (Table 3) as well as a pooled estimate of a random-effects meta-analyses (RE model), and I 2 (%) as a measure of heterogeneity between the countries. I 2 values above 50% indicating substantial heterogeneity were observed for 9 out of 17 variables. Arrows at the upper limit of a CI: SD, standard deviation confidence interval extends past four. procedure with data from the first follow-up for all participants of the first follow-up and a subsample restricted to all participants of the second follow-up ( Table 4). At all time points, a lower income level was associated with a higher chance of overweight/obesity. Restricting the baseline association (T0) of income level to T1 participants marginally affected odds ratios indicated by a CPE of <10% but led to bigger confidence intervals [e.g., low income at baseline (T0): full sample (OR 1.43, 99% CI 1.12-1.82) vs. T1 participants (OR 1.34, 99% CI 1.00-1.81) vs. T3 participants (OR 1.55, 99% CI 1.02-2.38)]. A restriction to T3 participants resulted in a CPE for medium income level of 11%. For the restricted subsample at first followup (T1 full sample restricted to T3 participants), odds ratios tended to be higher as compared to other estimates of income level. Apart from medium income, CPE of income level was well above 10%.
A lower educational level was associated with a higher chance of overweight/obesity at baseline and first follow-up. Restricting the association of educational level and overweight/obesity did not affect the trend of this association, with a CPE for the baseline association restricted to T3 participants of 14.8% for medium educational level and the confidence intervals.
IDEFICS/I.Family covered multiple topics including diet, physical activity, sleep, and stress. Results from the baseline examination showed that adherence to key behaviors of a healthy lifestyle was associated with a lower chance of overweight/obesity (18). We checked whether BMI related selection effects changed the results of Kovacs et al. (18) if the sample was restricted  to participants of the first follow-up and the second follow-up, respectively. At baseline, adherence to the key messages total screen time, MVPA and sleep duration (  Table 6, rightmost columns) and the identical analysis restricted to subsamples of T1 or T3 participants ( Table 6). However, the majority of CPEs for a restricted subsample of T3 participants were well above 10%.

DISCUSSION
In line with earlier research our results suggest that higher attrition at the follow-ups was associated with a higher weight status of children, lower children's study compliance, older age, lower parental education, and parent's migration background (2,3,21,22). For a multi-purpose cohort focusing on overweight and obesity, the observed association between weight status and attrition was perhaps to be expected. For instance, children with higher BMI might have felt more uncomfortable having their weight measured (in underwear) at baseline, causing them to refuse participation in follow-ups. Or participation in IDEFICS/I.Family might not have met the expectations of children and/or parents concerning a health study, leading them to leave the cohort that "did not work out for them." However, while selection effects on children's BMI did occur, they appeared to only slightly distort the distribution at the upper tail, mainly above the 99% percentile.
We found that older children were less likely to take part in the second follow-up as compared to younger ones. In contrast to studies on adults, the consent of both parents and children was required for inclusion into this study, and it has been shown that this makes recruitment particularly challenging (23). In particular it remains unclear to which degree the opinion of parents and/or children were decisive for participating. It is reasonable to assume that, as they get older, children act more autonomously and hence have more say regarding whether or not to participate. As children transit into puberty, they might find epidemiological studies less interesting or might get increasingly uncomfortable with getting examined in underwear. Unfortunately, although puberty status was part of the study protocol at the second follow-up, it was not included at baseline and at first follow-up, rendering it impossible to investigate links between puberty status and attrition.
Furthermore, the association between children's age and attrition might also be influenced by residential mobility, which has been shown to be highly associated with attrition as it can lead to invalid contact data (14,22,24). In most of the participating countries, the transition from primary to secondary school happens when children are between 10 and 12 years old (except for Estonia's and Sweden's single structure school systems) and for many of the children, this transition took place between the first and second follow-up. Hence many families might have used this opportunity to relocate, possibly leading to dropouts if the family moved out of the study region or their contact data became invalid.
As participants were free to decide whether or not to take part in individual study modules, we used the study compliance of both parents and children as proxy-measures of motivation. The fact that both parent's and children's study compliance clustered among high values indicates that once people made the decision to take part they completed the study program as a whole. Nevertheless, children's participation was noticeably  lower for the collection of the invasive biosamples. Parents were more likely to complete all modules that took place in the study center (general questionnaire, food frequency questionnaire, and medical history), and less likely to complete the take-home questionnaires (24-h dietary recall). This could be due to the fact that the latter questionnaires were more time consuming and involved setting aside additional time for the study.
Unfortunately, it cannot be ruled out that parents of children with certain diagnoses covered by the medical questionnaire might have been more reluctant to complete it to avoid stigmatization. Previous results from the cohort published elsewhere showed that, for instance, the prevalence of ADHD in the cohort was somewhat lower as compared to the whole population (25).
Heterogeneity analyses revealed that countries differed considerably in how well the overall model captured the influence of different predictors on attrition. However, although the heterogeneity between the countries was high in terms of I², a closer look using country stratified forest plots revealed that for many predictors all countries showed similar trends. For some predictors, high heterogeneity estimates appeared to be caused by single outliers because excluding these outliers improved I² considerably. While there are plausible explanations for some of the deviations from the general trend, for others there are none. For instance, Italy's estimates for the influence of educational level probably deviated because of the small proportion of parents with high educational level in their sample. However, it is not clear why female children in Belgium were more likely to take part in the follow-ups, whereas no such association was obvious for other countries. Similarly, we cannot explain why attrition differed for control and intervention regions in Belgium, but not in other participating countries. Often such inconsistencies can be explained by investigating paradata recorded during recruitment [i.e., information about the process of the data collection (26)] with dedicated documentation systems [e.g., (9,27)]. Unfortunately paradata were only available for the German study cohort (9), rendering an analysis for the whole cohort impossible. The collection of paradata might thus be especially crucial in multicenter cohort studies, where documentation is often difficult to coordinate between different survey teams operating over long periods of time.
Analysis of selection effects on cross-sectional exposureoutcome associations revealed few effects on point estimates when restricting the full sample at baseline (T0) to participants of the first follow-up (T1). Results on CPEs after restricting the exposure-outcome associations to a subsample of second followup participants (T3) were mixed. In particular in the detailed analysis of adherence and overweight/obesity CPEs exceeded 10%, potentially caused by a sharp decline in the number of observations for the subgroups.

STRENGTHS AND LIMITATIONS
Strengths of our study include the large sample size from an international population and the highly standardized procedures for data collection that were enforced by a central quality control. As noted previously, interpretation of our results would have benefitted if information about puberty status would have been gathered at each time point and more centers would have collected paradata.

CONCLUSION
Potential bias in cohort studies induced by attrition may vary according to exposure and outcome (28) and even a high level of attrition may have a limited effect on estimates of associations between exposure and outcome (2,28,29). Our results, however suggest that the IDEFICS/I.Family cohort gives valid estimates of the associations of interest.

AUTHOR CONTRIBUTIONS
HP, FL, TV, MT, DM, GE, SdH, LM, GW, and WA contributed to data collection; ML, HP, WA, and SR designed and implemented the research; ML, HP, WA, and SR analyzed and interpreted the data; ML, and SR drafted the manuscript. All authors discussed the results, critically commented on the manuscript, and gave their final approval to the submitted version of the manuscript.

ACKNOWLEDGMENTS
The baseline data collection and the first follow-up work as part of the IDEFICS Study [www.idefics.eu] were financially supported by the European Commission within the Sixth RTD Framework Programme Contract No. 016181 (FOOD). The most recent follow-up was conducted in the framework of the I.Family study [www.ifamilystudy.eu] which was funded by the European Commission within the Seventh RTD Framework Programme Contract No. 266044 (KBBE 2010-14). The research presented here incorporates data from both projects. Additional resources were invested by all participating partners. GE acknowledges the Swedish Research Councils (VR and Forte) for support of the IDEFICS and I.Family studies. The authors gratefully acknowledge the work of the IDEFICS consortium.
We are grateful to Florence Samkange-Zeeb for critically reviewing an earlier version of the manuscript.