Cognitive Development Trajectories in Preterm Children With Very Low Birth Weight Longitudinally Followed Until 11 Years of Age

Background: There is a high prevalence of cognitive dysfunction in very low birthweight (500–1250 g) infants (VLBW). Understanding long-term risk factors associated with cognitive development in preterm children requires longitudinal characterization. Thus, follow-up evaluations, including identification of risks and resilience influences–are important to promote health and cognitive abilities of children born preterm. Aim: To examine changes in cognitive development from birth until 11 years of age in preterm children with very low birthweight. Methods: 24 VLBW infants, at the Karolinska University Hospital, Stockholm, were assessed with regards to cognitive functioning at three times during development at 18 months, 5 and 11 years of age using standardized tests. Longitudinal data were analyzed using Generalized Estimating Equation (GEE) univariate and multivariate models. Results: The follow-up rate was 100%. Level of cognitive functioning at 18 months and at 11 years was similar. Females had higher cognitive scores than males at all three timepoints. We found that intraventricular hemorrhage (IVH) and prolonged invasive ventilatory support (>7 days) had a negative effect on cognitive functioning. Higher levels of parental education had a favorable influence on cognitive functioning over time. Conclusion: Level of cognitive development at 18 months was highly predictive of level of cognitive function at 11 years of age and differences in assessment scores between male and female VLBW infants persisted. Additional longitudinal studies, performed before school entry and across childhood, are needed to further elucidate the cognitive trajectories of preterm children.


INTRODUCTION
Pre-term birth is associated with dysfunctional development of vital organs and increased risk of cognitive impairment later in life. Some problems appear during the first weeks of life and can be successfully treated, whilst others have a permanent influence on the development. Brain injury such as intraventricular hemorrhage (IVH) and periventricular leukomalacia (PVL) are associated with a high risk of neurodevelopmental disability (Volpe, 1980;Volpe et al., 2011). Preventing brain injury by supporting the respiratory control systems in the preterm infant is crucial. Apnea of prematurity can prolong the need for invasive ventilatory support and bronchopulmonary dysplasia, which are both associated with neurodevelopmental impairment (Janvier et al., 2004;Hofstetter et al., 2008;Doyle and Anderson, 2009). The degree of prematurity and the presence of comorbidities of more than one harmful factor influence the severity of developmental deficiencies in cognitive functioning, as well as in academic achievements . Inflammation has emerged as a critical contributor to both normal development and injury outcome in the immature brain (Hagberg et al., 2015). Neonatal factors found to predict a lower adulthood IQ include: respiratory distress syndrome, IVH, mechanical ventilation, mobility problems, parenteral nutrition, low to middle socioeconomic status of parents, and poor parent-infant relationship (Breeman et al., 2017). Furthermore, a range of perinatal vulnerability factors have been associated with male sex supporting the concept that male sex is an important biological risk factor in extremely preterm infants. Future prospects for preterm children are of utmost interest for parents, pediatric medicine, schools, and society (Moore et al., 2012;Aylward, 2014).
The results of cognitive assessments in children who were born very (week 28-32) or extremely (<week 28) preterm, range from severe and mild levels of intellectual disability to cognitive levels above average. The prevalence for severe cognitive delay is higher in populations of very (Murray et al., 2014) and extremely premature children (Johnson et al., 2009). Though the majority of preterm children perform within normal range of general cognitive functioning, as a group they perform 0.5-1 SD below that of full term children (Bohm et al., 2002;Rose et al., 2011;Luttikhuizen dos Santos et al., 2013;Linsell et al., 2018). Specific cognitive functions such as attention, working memory, and processing speed are also often delayed (Rose et al., 2012;Murray et al., 2014).
Early developmental assessment of cognition from 18 to 24 months post term age (corrected for prematurity) tend to be stable in preterm children with average cognitive development, but future cognitive functioning seems harder to predict when children are performing 1 to 2 SD below expectation, especially in VLBW infants (Roberts et al., 2010;Luttikhuizen dos Santos et al., 2013;Wong et al., 2016).
There is strong evidence that parental education acts as a predictor for cognitive development in preterm children (Bohm et al., 2002;Breeman et al., 2017). In addition, parental level of education, employment and income have additionally shown independent, and additive effects on cognitive gain across preschool years (Manley et al., 2015;Beauregard et al., 2018). Cognitive outcome after preterm birth is heterogeneous, and group level analyses may disguise individual variability in development.
Thus, long-term studies that address individual patterns and explore trajectories in cognitive development are emerging (Stalnacke et al., 2015;Wong et al., 2016;Mangin et al., 2017). A significant number of children born very preterm or VLBW experience difficulties in school. To be able to reduce the long-term risks associated with VLBW birth, an improved understanding of the mechanisms and risk factors placing these children at risk of cognitive delay and dysfunction is necessary. Identifying factors affecting the predictive accuracy of early neurodevelopmental assessments and individual trajectories of overall, as well as specific, cognitive function is important in order to enable earlier support and intervention.

Aim
The aim of the present study was to investigate trajectories of cognitive functioning at the age of 18 months, 5 and 11 years in a Swedish cohort of preterm children with very low birth-weight (500-1250 g). The concordance over time in different aspects of cognition were studied as well as the differences within the cohort predicted by sex, preterm birth factors, medical risk factors, and parental level of education.

MATERIALS AND METHODS
The Swedish cohort is part of an international multisite followup study, the Caffeine for Apnea of Prematurity trial (CAP), a randomized and placebo-controlled study of the safety and efficacy of neonatal Caffeine citrate (Methylxanthines), for management and/or prevention of apnea in premature children with a birth-weight of 500-1250 g (Schmidt, 2005;Schmidt et al., 2011). Information about the random assignment is confidential to members of the double-blind CAP trial. Therefore, effects of drug treatment are not evaluated in this study.

Participants
The present Swedish cohort consists of 24 VLBW infants, born at the Karolinska University Hospital between 2001-2004. They were enrolled in the study during the first week after birth and received caffeine therapy or placebo, until it was no longer needed during the neonatal period (Schmidt et al., 2006). The characteristics of the preterm infants and parental education is presented in Table 1.

Ethics Statement
The study was performed in accordance with European Community guidelines. The regional ethics committees at the Karolinska Institutet and Stockholm County approved the study (2012/1401). Informed written consent was obtained from the parents. Feedback to parents was communicated after assessments.

Procedure
Follow-up in terms of medical, motor, and cognitive assessment was performed three times. The cognitive assessment was performed by clinical psychologists at 18-24 months and at the 5th and 11th years of age. The assessment at 18 months, at 5 years as well as at 11 years were corrected for preterm birth. Child and parent ratings of behavior was collected at 11 years: Presentation of motor assessments and behavior ratings have been planned for in the near future. Respiratory support (Invasive and non-invasive ventilatory support combined) <8 days. CLD = chronic lung disease, BPD = bronchopulmonary dysplasia, IVH = intraventricular hemorrhage, ROP = retinopathy of prematurity.

Tests and Measures
General cognitive development/functioning was estimated at the three assessments points, with the second edition of the Bayley Scales of Infant Development, (BSID-II), mental development index (MDI), WPPSI-III full scale index (FSIQ), and WISC-IV full scale index (FSIQ), respectively. A validated WISC-IV-short form was used (Crawford et al., 2010) in exchange for WASI-II and DLS Swedish Reading and Spelling tests replaced the corresponding sub-tests from WRAT-4. Standard scores on cognitive indexes have a mean of 100 and standard deviation (SD) of 15, with higher scores indicating a higher level of cognitive development/functioning (Bayley, 1993;David, 2006David, , 2007. Tests and measures evaluated in the study are presented in Table 2.

Statistical Methods
Descriptive statistics are presented either as means, SD, and medians (ranges) for continuous data, or as frequencies or percentages for categorical variables. For continuous variables the paired t test was used to examine within subject changes between two assessment time points and the independent t test was applied to examine difference between males and females. The Mann-Whitney test was employed when there was a violation of the assumptions of normality and equal variance.
Based on the three indexes, BSID-II (MDI), WPPSI-III (FSIQ), and WISC-IV (FSIQ) a standardized cognitive scale was created (−2 to + 2). The Friedman ANOVA was performed to compare the three scales considering the ordinal property of the outcome variable. Post hoc analysis with Wilcoxon signed-rank tests was conducted with a Bonferroni correction applied since the overall Friedman ANOVA was significant.
We employed Generalized Estimating Equation (GEE) to analyze the change over time and to examine the fixed effects of male/female sex. A time x sex interaction term was introduced in the model to examine heterogeneity effect. GEE was also performed to control for potential confounders. Explanatory variables were selected based on clinical relevance, earlier research findings and univariate GEE models. Upon completion of the univariate analyses, we selected variables for the multivariate analyses including factors judged to be potential confounders. We have also employed Linear mixed model (LMM) to examine the relationship between the response IQ score as numerical continuous outcome variable and clinical and demographic explanatory variables. Both GEE and LMM simultaneously examine the relationship between each predictor and the outcome variable and the relationship between changes in the predictors and changes in the dependent variable. The dependent variable cognitive level was coded as −2, −1, 0, 1, and 2 for each timepoint, based on the standardized cognitive level which represent severe delay (≤ = 70), moderate delay (71-85), normal (86-114), high (115-130), and superior (≥131) cognitive levels, respectively. Categorical explanatory variables were coded depending on their level. If only two levels existed, the reference category was the category with the higher code number: The variable sex was coded as female = 1 and male = 2, thus male was the reference category. The categorical variable "educational level" constituted six levels, which were coded as 1 = less than elementary, 2 = elementary, 3 = high school, 4 = diploma, 5 = university degree (Bachelor or Masters), and 6 = Ph.D. holder. Less than elementary was the reference category. IVH was coded as 0 = no IVH, 1 = Grade 1, 2 = Grade 2; grade 2 was the reference category. Time was coded as 1 = 18 months, 2 = 5 years, and 3 = 11 years (the reference category). Level of respiratory support was coded as: 0 = Invasive (endotracheal tube in situ) and noninvasive (continuous positive airway pressure-CPAP) ventilatory support combined <8 days, 1 = mild (combined ventilatory support >8 days but invasive ventilatory support ≤6 days) and 2 = prolonged (invasive ventilatory support >6 days).

RESULTS
All 24 children in the Swedish cohort participated in all three assessments, thus the follow up was 100%. Table 1 shows the characteristics of preterm birth, medical and social background factors for the complete group as well as separated for by sex. At the 5-year and 11-year assessment, no child was deaf, blind or had cerebral palsy. Preterm medical risk variables that were considered to have too few participants to be included in further analyses are not shown in the tables. The level of the cognitive index score at the three assessment points for each gender are summarized in Figure 1. Table 2 shows all tests, including abbreviations, cognitive, and academic measures for the Swedish CAP cohort. The test results are summarized in Table 3. The mean test score of the study subjects is presented by sex, maternal education level, paternal education level, and IVH (Figures 2A-D, respectively). The independent sample t-test was employed to compare means for between group differences and the paired t-test for within-subject change ( Table 4).

Cognitive Level Through Development
Friedman test revealed a statistically significant difference in the standardized cognitive index over time, χ 2 = 18.7, df = 2, p < 0.001. Post hoc analysis with Wilcoxon signed-rank tests was conducted with a Bonferroni correction applied, resulting in a significance level set at p = 0.017. We observed a statistically significant difference between Bayley-II MDI (at 18 months) and WPPSI-III FSIQ (at 5 years) (p = 0.001) and between WPPSI-III FSIQ (at 5 years) and WISC-IV FSIQ (at 11 years) (p = 0.002). However, we did not observe significant differences between BSID-II MDI and WISC-IV FSIQ (Z = −1.67, p = 0.096). WISC-FI was significantly related to WISC-SI, Mathematics SS, sex, and IVH (Supplementary Table 1). LMM also revealed that time was a statistically significant predictor of cognitive score. We observed a statistically significant difference between WPPSI-III FSIQ (at 5 years) and WISC-IV FSIQ at 11 years (p = 0.02). However, we did not observe a significant difference between BSID-II MDI and WISC-IV FSIQ (p = 0.79). The univariate and multivariable General Estimating Equation (GEE) analysis were carried out and are presented in Tables 5, 6 respectively. Sex was observed to be a predictor of standardized  RCFT, delayed recall 11.7 (7.9) 13.9 (5.0) 10.0 (9.4) 0 -25.5 cognitive score. Female sex was positively associated with standardized score in all multivariable models, Table 6. Univariate GEE revealed that the likelihood of having a higher cognitive score was positively related to time at 5 years (B = 1.58, Wald Chi-square = 10.8, p = 0.001) with 11 years as a reference. However, when the dependent variable was the standardized Z score, GEE revealed that the likelihood of having a higher Z cognitive score was positively related to time (Wald Chi square 18.7, df = 2, p < 0.001). Higher Z score value was positively related to time at 5 years (B = 2.1, Wald Chi-square = 12.9, df = 1, p < 0.001) with 11 years as a reference.
General Estimating Equation showed that IVH was a predictor of standardized cognitive score. In addition, level of respiratory support was a significant predictor of standardized cognitive score in the univariate (p = 0.03) and multivariable GEE analysis when controlling for sex and time in the latter analysis (p = 0.04). SGA and BPD were not statistically significant in either univariate and multivariable GEE analyses, when controlled for sex and time, Tables 5, 6.
Univariate GEE analysis revealed a positive relationship between parental education levels and the standardized score ( Table 5). Both maternal and paternal educational levels were positively associated with the standardized score in all multivariable models when controlling for sex and time (Supplementary Table 2). We did not observe interactions between time and other clinical and demographic factors in all used multivariable models.  Mean differences, 95% confidence interval (CI) and p-values for paired t and independent t tests.
The independent t-test revealed that there was statistically significant score differences in between the sexes for many items indicating higher score for girls compared to boys ( Table 4). The paired mean differences were also statistically significant for most items ( Table 4).

Cognitive Functioning and Academic Achievement at 11 Years of Age
Data analyses revealed broad confidence intervals for results of specific cognitive measures (TEA-Ch, RCFT, and WISC-IV digit span) at 11 years of age. WISC-IV SI was found significantly lower than WISC-IV FSIQ at 11 years (p ≤ 0.001). Measures of academic achievement showed sex difference in Mathematic WRAT-4 (p = 0.04) and Spelling DLS measures (p = 0.01), favoring girls. We did not observe differences in reading ability.

DISCUSSION
This prospective cohort study has three assessment points of cognitive development and a follow-up rate of 100%, which adds stability to the results and bolster our conclusions. We found that, male sex and parental education had a significant impact on cognitive test results. This concurs with previous studies but the present data further underline the effect of sex and parental education in long term cognitive and academic outcomes (Bohm et al., 2002;Linsell et al., 2015Linsell et al., , 2018Mangin et al., 2017). IVH was identified as a strong predictor of cognitive outcome, and so was the cumulative duration of invasive ventilatory support. This is in accordance with recent studies, e.g., (Breeman et al., 2017). However, other medical complications (ROP, CLD/BPD, SGA, and sepsis) did not contribute to the explained variance. Sex differences were seen in all tests given, with females acquiring higher scores than males. Parental education was on average high in both mothers and fathers and all levels significantly influenced the cognitive outcome in the multivariate analyses.
The GEE is an appropriate statistical method to fit a marginal model for longitudinal data analysis since we have repeated measures over time. We measured 24 children at three time points with three different cognitive tests to examine their cognitive development/functioning. The repeated measurements thus provides a multivariate response of similar individuals. GEE is a common approach to longitudinal data based on population-averaged (marginal) approach. The GEE models the average response over the subpopulation sharing a common value of the predictors, as a function of the predictors (Liang and Zeger, 1986).
Cognitive z-scores from −2 to +2 are represented in this Swedish cohort of VLBW infants. GEE showed that the cognitive results were similar at 18 months and 11 years of age. We found the results to be important since predictability of early assessments varies. Robust findings based on metaanalyses and single studies imply that the predictability for later cognitive functioning in pre-school and school-aged children vary, from moderate in very preterm to poor in extremely preterm infants. In contrast, several studies of cognitive outcome between pre-school and middle-school age as well as adolescence, report stability in cognitive development (Stalnacke et al., 2015;Mangin et al., 2017).
Summarizing other studies of cognitive development from infancy to adolescence is challenging due to methodological deficiencies. Outcome studies often have only one or two assessment points, lack of a control group, loss to follow up, and a varied application of standardized test norms and statistical methodologies, rendering the conclusions of these studies hard to compare (Wong et al., 2016). Based on our study design and results we find it important to take into account that time points and time between cognitive assessments can affect findings in cognitive follow-up of preterm children. Of equal importance are longitudinal studies, and maintaining high retention throughout follow-up (Doyle and Anderson, 2018). Recently, the longitudinal EPI Cure study showed that cognitive test score in infancy and early childhood reflect early adult outcomes (Linsell et al., 2018).
The cognitive results at 5 years of age were markedly higher compared to cognitive levels at both 18 months and 11 years of age, which requires an account of potential methodological issues in the performance of said test. The Swedish WPPSI-III norms are based on British norms and validated in a limited sample of Swedish children (David, 2006). Thus, the high results at 5 years of age might be due to an incomplete Swedish validation of the British WPPSI-III norms. Test construction, validity and reliability issues of cognitive assessment in preterm children can have an impact on cognitive results and are also addressed in other studies (Luttikhuizen dos Santos et al., 2013;Spencer-Smith et al., 2015). The results are similar at 18 months and 11 years, both at individual and group level. Thus, the apparent increase at the 5-year assessment are likely due to test norm differences. Nevertheless, the heterogeneous nature of cognitive outcomes in individuals emphasize the importance of long-term followup and monitoring of infants born VLBW (Manley et al., 2015;Beauregard et al., 2018).
We know little about the developmental pace regarding different aspects of cognition, especially for preterm children, with their risk of suboptimal neurocognitive development/ cognition. There is still considerable debate and uncertainty with regards to whether very preterm children grow into or out of their cognition problems (Mangin et al., 2017). Our and other long-term studies, that address individual patterns and explore trajectories in cognitive development, indicate that cognitive test scores in infancy and early childhood may reflect cognition and academic performance during early school years (Stalnacke et al., 2015;Wong et al., 2016;Mangin et al., 2017;Linsell et al., 2018). Notably, cognitive trajectories may differ. Recently, several distinct language trajectories were revealed, in very preterm and full term infants examined at 2, 5, 7, and 13 years (Nguyen et al., 2018). This and our study underline the importance of monitoring cognition in children born very preterm before school entry and across childhood.
The process of cognitive maturation is complex, multidimensional and influenced by genetic predisposition, environmental factors and experience (Fuster, 2005) and ruptures in development (Anderson, 2001;Anderson et al., 2008). Effects on brain development (Haynes et al., 2011;Volpe et al., 2011) and coherent alternated cognitive trajectories (Mangin et al., 2017) are found in preterm children (Thompson et al., 2018). Comparing the cognitive results at 5 years and 11 years of age indicated both stability and change in the cohort. Level of verbal intelligence (VI) was significantly lower at 11 years and so was the visuo-constructive measure (Beery VMI). Perceptual intelligence (PI) was found to be stable. Processing speed (SI) was higher at 11 years. The visuo-constructive measure (Beery VMI) was significantly lower at 11 years. Declining results and differences between cognitive functions may be affected by specific deficits especially in executive functions. In our study it was not possible to draw a conclusion from executive tests at 11 years since the single specific cognitive measures, TEA-Ch, RCFT, and Digit Span, had broad confidence intervals and in combination with the effect of the cohort size could lead to random results. Problems with attention, working memory and processing speed are significantly more present among preterm children and tend to emerge during development (Rose et al., 2011(Rose et al., , 2012Murray et al., 2014). Deficits in attention and processing speed are identified as important abilities contributing to lower level of cognitive intelligence in preterm children (Rose et al., 2009(Rose et al., , 2011(Rose et al., , 2012. Cognitive functioning in preterm children is of great importance for school performance. Results in academic achievement are of special interest, as they will show any issues that may be a problem in school. Our test battery included mathematics, reading and spelling. Mathematics were significantly correlated to FSIQ at 11 years (r = 0.77) and WISC PI, WISC SI, and IVH-2 were found to significantly explain most of the variance (R 2 = 0.64) of the dependent variable Mathematics.
Sex differences were seen in all tests given, with females acquiring higher scores than males. Except for the VMI visual perception (VP) and fine motor coordination, this was also consistent between the two assessment points. Females also performed higher in academics, mathematics and spelling. Male sex is known to be a disadvantage with regards to mortality, morbidity and incidence of brain injury in preterm children (Marlow et al., 2005;Hintz et al., 2006;Skiold et al., 2014). The major type of brain injury involves cerebral white matter and the principal cellular target is the developing oligodendrocytes . Neonatal white matter abnormalities are associated with cognitive impairment across childhood (Mangin et al., 2017). The view of male sex as a risk factor for cognitive development varies. Extremely premature boys have an increased risk of lower cognitive outcome (Linsell et al., 2018) and of developing severe cognitive disability (Marlow et al., 2005). However, in very preterm children the difference based on male sex decreases with age and environmental factors become more significant (Linsell et al., 2015;Mangin et al., 2017). The present results are in line with previous as well as recent studies Linsell et al., 2018;Thompson et al., 2018) and underlines the importance of considering sex in the potential cognitive developmental of preterm infants. Structural asymmetries and sexual brain dimorphism already exist at 1 month after birth in healthy term infants (Dean et al., 2018). In general, males have larger total brain volume and volumes differ by sex in regionally specific brain regions. We speculate that the different rates of maturation between the sexes renders the male brain, and hence male sex, a risk factor for long term cognitive outcome especially in VLBW preterm infants. Some of the underlying mechanisms for sex differences found, are delayed myelinization in preterm males, lower white matter volumes in males and differences in cerebral white matter microstructure compared to preterm females (Constable et al., 2008;Skiold et al., 2014).
We cannot exclude the effect of neonatal caffeine therapy from our findings. However, no adverse long-term effects of caffeine on development have been shown in the CAP studies, e.g., (Schmidt et al., 2006(Schmidt et al., , 2017Mürner-Lavanchy et al., 2018). The CAPtrial indicated a positive developmental trend in cognitive scores, between 18 months and 5 years of age, independent of treatment. Thus, our data are consistent with the results from this larger cohort (Schmidt et al., 2012). However, in the present cohort, the data and trajectories suggest that the increase was temporary and likely due to test norm differences. Nevertheless, cognitive results in our cohort indicate similar levels between 18 months and 11 years of age. Thus, on a group level, this does not support the suggestion that cognitive outcomes for preterm VLBW infants may improve throughout childhood (Ment et al., 2003). However, they are in line with, and underscore previous findings with regards to the importance of IVH, level of ventilatory support, sex as well as parental education for the long-term cognitive and academic outcomes . These data emphasize the importance of early and repeated individual assessment enabling early intervention as well as adequate support during early childhood into adolescence, especially for those with cognitive delay and at risk for cognitive decline (Stalnacke et al., 2015;Mangin et al., 2017).
A strength of the present study is the 100% follow-up rate. The small group size is a limitation, which is why all medical variables and possible background variables could not be included in the model. The specific cognitive measures at 11 years of age, were found to have broad confidence intervals and thus in combination with the effect of cohort size these results could be random.
In summary analyzing measures of cognitive development across time better clarify changes and risk factors. We suggest long-term study designs, with more than two assessment points and with a substantial time between follow-up. It is possible to explore trajectories of cognitive function in a small cohort, when the cohort remains the same. Exploring individual patterns of cognition and brain development, as well as the underlying mechanisms associated with these, is necessary to increase knowledge about the maturation of preterm children. Cognitive development extends well beyond adolescence; and thus future follow-up should continue beyond 11 years. Early identification of children in need of support to promote development is an imperative necessity, especially for those with cognitive delay and at risk for cognitive decline.

DATA AVAILABILITY
All datasets generated for this study are included in the manuscript and/or the Supplementary Files.

AUTHOR CONTRIBUTIONS
EH and BB conceptualized and designed the study. EH, SS, and BB acquired the data and revised the manuscript. All authors analyzed the data, drafted a significant portion of the manuscript or figures, and accepted the final version of the manuscript. Freemasons Children's House, Astrid Lindgren Children's Hospital and Swedish National Heart and Lung (2015-0558) Foundations. The funding sources of the study had no role in the study design, data collection, analysis, interpretation, or writing of the results of this study, or in the decision to submit.

ACKNOWLEDGMENTS
We acknowledge research nurse Lena Legnevall for providing technical assistance and psychologist Anette Holm and Stephanie Cullberg-Sundén for assisting with the testing procedures. We thank Dr. Louise Steinhoff and Ph.D. Wiktor Phillips for English language assistance and Prof. Peter J. Anderson for discussion and advice. We are indebted to the CAP investigators who made this study possible and importantly the children and their families who participated in this follow-up study.