The Usefulness of Electronic Health Records From Preventive Youth Healthcare in the Recognition of Child Mental Health Problems

Background and Objectives: Early identification of child mental health problems (MHPs) is important to provide adequate, timely treatment. Dutch preventive youth healthcare monitors all aspects of a child's healthy development. We explored the usefulness of their electronic health records (EHRs) in scientific research and aimed to develop prediction models for child MHPs. Methods: Population-based cohort study with anonymously extracted electronic healthcare data from preventive youth healthcare centers in the Leiden area, the Netherlands, from the period 2005–2015. Data was analyzed with respect to its continuity, percentage of cases and completeness. Logistic regression analyses were conducted to develop prediction models for the risk of a first recorded concern for MHPs in the next scheduled visit at age 3/4, 5/6, 10/11, and 13/14 years. Results: We included 26,492 children. The continuity of the data was low and the number of concerns for MHPs varied greatly. A large number of determinants had missing data for over 80% of the children. The discriminatory performance of the prediction models were poor. Conclusions: This is the first study exploring the usefulness of EHRs from Dutch preventive youth healthcare in research, especially in predicting child MHPs. We found the usefulness of the data to be limited and the performance of the developed prediction models was poor. When data quality can be improved, e.g., by facilitating accurate recording, or by data enrichment from other available sources, the analysis of EHRs might be helpful for better identification of child MHPs.


INTRODUCTION
Despite having different healthcare systems, most high income countries provide some form of preventive child care that aims to monitor a child's healthy development during the first years of life (1)(2)(3). In the Netherlands, preventive well-child care is separated from curative care. Nurses and community pediatricians [preventive youth healthcare professionals (PYHPs)] provide free of charge preventive healthcare for all children aged 0 to 19 years during periodic health check-ups (4). The goal of these check-ups is to prevent disease, promote health and allow early identification of health risks, disease, and developmental problems (4). Over 80-90% of children are regularly seen in preventive youth healthcare (PYH) (5,6). PYHPs work closely together with, amongst others, professionals in schools and in case of issues, PYHPs can provide additional advice or schedule extra visits, or refer children to family physicians (FPs) or to specialized care (4). Part of the role of PYHPs also concerns prevention and early identification of mental health problems. Mental health problems (MHPs) affect 10-20% of children and adolescents worldwide (7). MHPs are the leading cause of healthrelated burden in the first three decades of life (8). Half of all lifetime MHPs occur by the age of 14 years and 75% by the age of 24 years (9). To minimize the impact of MHPs, early identification is important so that adequate treatment can be provided (10).
Although PYH has an important role in the identification of MHPs as most children are regularly seen in PYH, a substantial part of MHPs are not being recognized by PYHPs (11). In order to improve the identification of child MHPs, several studies investigated the development of prediction models to identify MHPs with routine healthcare data from British and Dutch FPs. The models showed moderate predictive performances (12,13). In the Dutch study, information regarding risk factors for MHPs related to the child's family (e.g., parental education level, parental MHPs), environment (life events) and school performance was not well-recorded in electronic health record (EHRs) of the child (13). These risk factors were important predictors for MHPs in a prospective cohort study among Dutch children from the general population in which the developed prediction model showed a good discriminative performance (14).
PYHPs gather this information regarding children and their families during check-ups and record this in the EHRs of the children, and so the information from these EHRs might potentially be useful in the identification of MHPs. For EHR data to be suitable for reuse in scientific research, the data needs to be complete, accurate and consistent (15). To our knowledge it is yet unknown how well and how complete the information is that is recorded in the EHRs. The aim of this study is to explore the usefulness of EHR data from Dutch PYH in predicting MHPs. Research questions are: what is the quality of the data and how well do they predict child MHP?

Study Design and Setting
A population-based cohort study was carried out using data from children aged 0-19 years visiting PYH centers of the Regional Abbreviations: CMHPs, concern for mental health problems; EHR, electronic health record; FPs, family physicians; KIVPA, short indicative questionnaire for psychosocial problems among adolescents; MHPs, mental health problems; PYH, preventive youth healthcare; PYHPs, preventive youth healthcare professionals; SDQ, strengths and difficulties questionnaire; T0, timepoint 0; T1, timepoint 1.
Public Health Service Hollands Midden located in the greater Leiden area, the Netherlands. The data that was anonymously extracted from the EHRs included demographics, information regarding pregnancy, family and social circumstances and information from scheduled visits and extra consultations with PYH.
The data consisted of all EHR data from 2010-2015 and all summary data from a prior electronic registration system from 2005-2010 for children born between 1994 and 2012. During the first four years of life, around 15 PYH visits are scheduled. In both primary school (children age 4-11 years) and secondary school, (children age 12-18 years) children are generally seen twice (4). The routine visit in grade 4/5 of secondary school was implemented in 2014. For all school-aged children from one routine visit [timepoint 0 (T0)], we aimed to predict the presence of MHPs during the next routine visit [timepoint 1 (T1)], thereby creating four subpopulations ( Table 1). This means that for children visiting PYH at age 5/6, we used the data at the previous standard routine visit at the age of 4 years to predict mental health problems at age 5/6. We did the same for the other subpopulations.

Outcomes
PYHPs are trained to recognize problems at an early stage. They can refer children to primary and secondary (mental) healthcare for further diagnostics or treatment. A PYHP's concern about MHPs can therefore be an early signal for child MHPs. Our main outcome was a first PYHP recorded concern for MHPs (CMHPs). We defined CMHPs 1) when PYHPs reported abnormal psychosocial functioning in the child's record, e.g., problems in making contact with others or hyperactive behavior and/or 2) when the child received extra healthcare regarding mental health (within PYH or within curative care) (Supplementary Table 1). We also performed analyses for when the outcome was only the element extra healthcare use for CMHPs as this reflects more severe MHPs.

Determinants
Possible determinants were selected based on a PYH guideline for psychosocial problems and a systematic review regarding determinants for identified MHPs in primary care (Supplementary Table 2) (16, 17). In addition, an expert panel consisting of authors NK and MC, two FP's, a pediatrician and a PYHP, was consulted on possible determinants on based their knowledge and experience in addition to the systemic review and guidelines (13,16). The determinants were measured up until T0. Most data was already labeled normal/abnormal. Validated cut off points, that are used in PYH, were applied to continuous data, e.g., for results of validated screening instruments Strengths and Difficulties Questionnaire (SDQ) and short indicative questionnaire for psychosocial problems among adolescents (KIVPA). The determinants number of extra healthcare visits in PYH and number of referrals were dichotomized into ≥1 yes/no. Some determinants can change over time, we then included either the first or last registered value at T0. For the other determinants we included the first known registered value. Due to sparseness of the data, we clustered closely related determinants: for example the determinant "Substance use" consisted of the items "alcohol use, " "drugs use, " "smoking, " "water pipe use, " and a more general item "substance abuse/addiction" (Supplementary Table 2). PYHPs can also include information in free text fields, due to privacy reasons we did not have access to this free text.

Usefulness of the Data for Research
The usefulness, including completeness and validity, of the data was assessed by investigating the amount of cases (children with CMHPs), missing data and the continuity of the data, i.e., the overlap in children between populations. As children are followed in time, we expected a continuity in the data, resulting in overlapping populations.
Most determinants should either be always present in EHRs as they would always be checked during visits, e.g., length and weight, or would only be recorded in case of abnormality, e.g., smoking. The determinants SDQ and KIVPA should always be recorded, so their absence could have significance. Missingness could also mean an abnormal value and could be predictive. We therefore included a missing category in the analyses for the SDQ and KIVPA (18,19). For the other determinants we assumed that in case a determinant was not recorded, the value of the determinant was normal (20).

Statistical Analyses
Descriptive statistics were carried out with SPSS (version 25). If a determinant was present in <1% of the children in a subpopulation, the determinant was not included in the analysis of that subpopulation. As we aimed to predict a first recorded CMHP, we excluded children with CMHPs before or at T0. To develop prediction models for a first recorded CMHP, we performed logistic regression analyses with R (version 3.5.3) (21)(22)(23)(24). The ability of the model to distinguish between children who are recognized with a first CMHP and those who are not (discrimination), was assessed using the c-statistic or concordance statistic (25). A c-statistic can have a value of 0 to 1, with a value of 0.5 meaning that the model is no better than predicting CMHP than random chance. The closer the value is to one the better the model. The in-sample calibration of the model was assessed by the calibration plot of actual probabilities vs. predicted probabilities. The models were internally validated using bootstrap resampling (500 bootstrap samples) and estimating shrinkage factors (26). Brier scores were calculated to assess the average prediction error: it quantifies how close predictions are to the actual outcome and can range from 0 for a perfect model to 0.25 for a non-informative model with a 50% incidence of the outcome (with a lower incidence of the outcome the maximum score for a non-informative model is lower (27,28).
The Ethics Committee of the Leiden University Medical Centre issued a waiver of consent (G16.018).

Role of the Funding Source
This study was supported by ZonMW, the Netherlands, Organization for Health Research and Development (grant 839110012). ZonMw did not have any role in study design, the collection, analysis, and interpretation of data, the writing of the report and the decision to submit the paper for publication.

Usefulness of the Data for Research
This study included 26,492 children. The number of children per subpopulation ranged between 1,265 (population D) and 10,789 children (population C) ( Table 2). The number of children excluded because of CMHPs ≤T0 varied between 402 (population A) and 3,088 (population D). The overlap in children between subpopulations was low and the number of CMHPs varied greatly between populations. Population C had a high number of CMHPs, much higher than the other subpopulations, which might be largely explained by limited overlap in children between population B and C. We assumed that population C contained not only incident cases but also prevalent cases of CMHPs, which could not be excluded since no information of these children from before the age of 10 was present. For population B the overlap with previous years was also small, but in that population it concerned data from the pre-school period. During the pre-school period MHPs are less frequently identified and therefore the CMHPs in population B were more likely to refer to incident CMHPs (29,30).
Since our aim was to predict incident CMHPs and different determinants can play a role in incident or prevalent cases, we excluded population C from further analyses.
The amount of missing data from the determinants ranged from 4.4 to 100%, a large number of determinants had missing data for over 80% of the children (Supplementary Table 3).

Prediction of a First Concern of Mental Health Problems
Population A Population A consisted of 10,146 children aged 3-4 years of which 3,628 children (35.8%) had a first recorded CMHPs during the next routine visit at age 5-6 years ( Table 2). Determinants

Model Performance
The models' discriminatory accuracies for a first recorded CMHPs were low with corrected c-statistics of, respectively, 0.54,

DISCUSSION
In this population-based cohort study we explored the usefulness of routine healthcare data from Dutch PYH in predicting MHPs. The usefulness of the data was suboptimal as the number of cases differed greatly between subpopulations, a substantial part of the data was missing and the continuity of the data, i.e., following children for a longer time period resulting in overlapping populations, was much less than expected. We aimed to develop prediction models in school-aged children visiting PYH that would predict first concerns for MHPs during the next routine check-up in PYH. Unfortunately, the discriminatory performances of the models were poor and the models in their current form appeared not to be useful in the early identification of MHPs. The use of data from routine EHRs has become increasingly popular over the past years, also for policy purposes (31). To our knowledge this is the first study exploring the usefulness of EHRs from Dutch PYH in predicting child MHPs. Our population-based cohort study reflects Dutch routine PYH and gives an insight into the current state of the electronic healthcare registration of PYH. Although we expected that there would be a continuity in the data as we aimed to follow children for a longer time, we observed little overlap between the different subpopulations. Our time window of 2005-2015 and the fact that children can go to secondary schools outside the region, meaning they are monitored by a different regional PYH of which we did not possess data, might play a role, but we expect other (technical) reasons we are not yet aware of to also play a role: such as changes in registration systems (e.g., the change from paper to digital in 2010) in which data from the old system needed to be migrated to the new system. This meant that it was difficult to exclude prevalent CMHP cases from successive populations.   In population C for instance, 58% of the children were found to have CMHPs, much higher than expected according to literature (7,17). Population D was small, as the timepoint 1 visit was only implemented in 2014, this resulted in less stable models.
The electronic system PYHPs use to record findings from clinical care is technically built in such a manner that important information from previous consultations should remain present in the system. For instance, information on ethnicity, pregnancy and birth weight would still be present during visits in primary school. However, in our extracted data, this was not always the case, resulting in substantial missing data for many of these unchangeable determinants. We do not think missing data played a large role in our outcome, as (extra healthcare use for) CMHPs when present, would be a specific finding PYHPs would register as it is part of the basic tasks of PYH. Missing data in routine healthcare datasets are a known problem (20). One way to reduce the effect of missing data is imputation. However within routine healthcare data, missing data is seldom solely missing at random, which means you have to carefully choose your method of imputation and choosing not to impute might even be the better option (18,20). In this study, we applied the commonly used assumption that a missing value would indicate a negative value, or in other words "if it is not mentioned, it is not there" (20) for most determinants. Given the large amount of missing data, we question whether this assumption still holds as prevalence rates of determinants such as family MHP or smoking were lower than expected from literature (32,33). For determinants SDQ and KIVPA, which should be filled out by all parents of primary school students and adolescents in secondary schools prior to visiting a PYH and is registered standardly in the registration system, we included a missing category as missing data could refer to parents not being able (illiteracy, non-dutch) or wanting to fill-out the questionnaire which could be predictive. This did not result in better performing models.
Our study was the first study examining routine healthcare data from preventive youth healthcare with regards to child MHP identification. Such medical registries were originally built to assist healthcare professionals in daily practice, they were not built for research purposes. It is known that it takes time to improve medical registries in such way that they can be better used for research purposes (34). Several strategies to improve the quality of electronic healthcare data are suggested in the literature, which could also apply to the electronic health data of the PYH (20). Training professionals in accurate recording has proven to enhance the quality of registered data in primary care (34). Another suggested strategy is the implementation of information from external sources (20). Part of the missing data in this study, e.g., information regarding parental educational level, financial problems, and information regarding birth and pregnancy, could possibly be improved by linking data from Statistics Netherlands and the Dutch Perinatal Registry (35,36). Another solution might be the implementation of short electronic questionnaires prior to scheduled visits in which parents fill out relevant information with an automatic upload into the child's EHR. Or, like the Dutch Perinatal Registry, create a national dataset with key information which is gathered in a standardized way. An even more advanced option would be a shared digital record in which parents and PYHPs can both record information. PYHPs can also include relevant information regarding determinants in free text which we did not have in our extraction due to privacy reasons. We recommend to repeat this study with improved data and to investigate the usefulness of free text, for instance with natural language processing techniques (37).
The developed models in this study had a poor predictive performance, however we found that some known risk factors for MHPs had a predictive value. In addition, several determinants such as previous extra PYH visits and school problems, were associated with CMHPs, but not with extra healthcare use for CMHPs, meaning that PYHPs have concerns and monitor, but do not opt for extra care. Determinants like environmental stressors and parental concerns regarding parenting skills were even associated with a decreased risk of extra healthcare use for CMHPs. This could indicate that PYHs have concerns regarding the child's environment rather than regarding MHPs of the child itself. One can imagine that PYPHs in this case would use preventive interventions aimed at the child's environment, like Triple P, which could affect children positively (38). Regarding life events, our study suggests that PYHPs are less likely to monitor as life events in the older age groups were associated with an increased risk of (extra healthcare use for) CMHPs. In addition, because our outcome measurement CMHPs is based on the judgement of PYPHs and is not an objective measurement, this makes predicting CMHPs more difficult to begin with.
Increased SDQ-scores for psychosocial problems had limited prognostic value, whereas borderline increased SDQ-scores were associated with an increased risk of (extra healthcare use for) CMHPs. This can be explained by the fact that SDQ-scores were measured at T0. We saw that children with increased SDQscores at T0 were more likely to have registered CMHPs at the same T0 and would therefore be excluded from our study. This was less likely for the borderline scores. Another explanation can be that screening instruments are not always predictive for PYHPs' actions and concerns. Mieloo and colleagues found that when using a screening instrument, 38% of the children with an increased score on that instrument were registered as such by the PYHP and 22% of the children with an increased score were referred for extra care (39). It would be interesting to investigate what PYHPs do with increased SDQ-scores, also during later visits.
In contrast to our findings, a prospective cohort study in the Dutch general population which developed models that estimated the risk of MHPs in adolescents showed a good performance (14). In this study, information on determinants was collected via questionnaires that were sent to the parents. Important determinants for MHPs were, amongst others, maternal educational level, family history of psychopathology and environmental stressors such as frequently moving house, severe disease or death in the family, and parental divorce (14). A lot of these determinants did not show a positive association with CMHPs in our study although they are known risk factors for MHPs (16). A possible explanation for this might be the high number of missing values in this study.
We are aware that the data we used in this study is specific to the Dutch healthcare system and the registration used in this particular region, and we expect the generalizability of our findings to be limited in other settings. However, many countries do have a form a preventive youth healthcare or well-child clinics, that monitor a child's healthy development in some way (1)(2)(3). In addition, validated mental health screening instruments are widely used (40).
Depending on the type of preventive youth healthcare and digital registration used, we would recommend adapting our current approach to different settings and available routine healthcare data to explore the possibilities of digital information from preventive youth healthcare for the early identification of child MHPs.

CONCLUSION
In conclusion, this study explored the usefulness of data acquired from EHRs from Dutch PYH in estimating the risk of mental health problems in children. The data quality was sub-optimal and the developed prediction models showed poor performances. When data quality can be improved by facilitating accurate recording and increasing the proportion of data that can be entered through forms of structured input, EHR data from PYH is likely to be valuable in its contribution to the timely recognition of child MHPs.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because of the nature of the included data.
Requests to access the datasets should be directed to n.r.koning@lumc.nl.

AUTHOR CONTRIBUTIONS
NK conceptualized and designed the study, carried out the analyses, and drafted the initial manuscript and revised the manuscript. FB, MC, and MN conceptualized and designed the study, had access to the data and supervised the analyses, and critically reviewed and revised the manuscript. AB and SC carried out the analyses and revised the manuscript. IP, NL, and DD conceptualized the study, provided the data, participated in the interpretation of the data, and reviewed and revised the manuscript for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.