Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Dev. Psychol., 17 December 2025

Sec. Development in Infancy

Volume 3 - 2025 | https://doi.org/10.3389/fdpys.2025.1727158

Monitoring development from 0–6 years: an online system, standardized for Dutch children

  • 1Department of Clinical Child and Family Studies, Utrecht University, Utrecht, Netherlands
  • 2Department of Methodology and Statistics, Utrecht University, Utrecht, Netherlands

Background: Monitoring and screening children's development from infancy into early childhood is important for prevention purposes, as potential delay may best be addressed as soon as possible. For an efficient monitoring system, standardized online tools with population specific norm scores are needed. An online system, “Ontwikkeling Voorop! 0–6,” was created to evaluate Communication, Gross and Fine Motor functioning, Problem Solving and Personal-Social Behavior of children aged 0–6, using caregiver reports.

Methods: Pilot studies evaluated the first versions of the system. Next, data were collected of a representative sample of 1690 Dutch children aged 0-6 years (mean = 24.6 months, SD = 19.1, 50.5% male). Norm scores were modeled for each developmental domain on the conditional distribution of the total raw scores based on the children's age. Reliability and validity of the system was investigated.

Results: Norm scores on a scale from 0–100 centiles were created per age in months. Internal consistency measures (ω = 0.88–0.93 per domain) as well as repeated and inter-rater assessments indicated good reliability with moderate to strong correlations. Convergent validity was sufficient, with moderate to strong correlations between the results of the monitoring system and the Bayley-III-NL, WPSSI-IV-NL and Schlichting test for domains measuring similar constructs. Cut-off scores based on the 3rd, 10th and 90th centiles can be used to identify children in need of attention.

Discussion: The online system is a feasible and efficient way to monitor development of young children. It has standardized norm scores that are reliable and valid for the Dutch population.

Introduction

Monitoring and screening of development and functioning from infancy into early childhood is important for early detection of children who may benefit from extra attention and stimulation in different domains. Signs of developmental delay may not be very clear at a young age, even in children at risk, so a monitoring process with repeated screening for slow development or delay can improve chances for detection. A monitoring system increases the potential for secondary prevention purposes which allows to address early signs of delay as soon as possible. Parental awareness of child development, supported by standardized monitoring systems, helps in identifying children who may need extra attention (Cuomo et al., 2021; Marshall et al., 2016). This is important as early identification of emotional and behavioral challenges was found to reduce the need for specialized services (Barger et al., 2022; Theunissen et al., 2022). Moreover, high-quality monitoring systems help tailor support to children's specific needs (Hutchins et al., 2023).

Monitoring systems entail repeated assessments of child development from infancy or toddler age onwards. Assessments of young children have been found to predict later developmental outcomes and show an age-related pattern. First, developmental outcomes in infancy are related to outcomes at toddler age, which in turn are related to outcomes at preschool age, and these are related to outcomes in early childhood and school age (Van Baar and de Graaff, 1994; Van Baar et al., 2006; Bogičević et al., 2021; Schonhaut et al., 2021; Schlichting et al., 2023). Effect sizes of these longitudinal associations are generally moderate, but stronger for assessments with a smaller age gap in between, and also stronger for assessments carried out from preschool age onwards. Assessments done by trained professionals are costly, so accessible alternatives are important. In some studies parents were found to provide reliable information regarding their child's language and fine motor skills when compared to direct assessments carried out by a trained professional (Miller et al., 2017). Another study found that day care professionals also were able to provide reliable information on development of the children they have in care (Filgueiras et al., 2013). An instrument based on information from parents and/or day care professionals might be preferred as a first step in an efficient routine monitoring or surveillance program, aiming at identification of children who may need extra attention from parents and professionals. A next step would be to differentiate which smaller group of children would need to be targeted for either further structured monitoring or referral to more elaborate developmental assessments or interventions (Cairney et al., 2021).

To use such an efficient monitoring system, standardized tools are needed with population specific norm scores that underly and strengthen decisions for further action regarding the children that need extra attention. As such a system was not yet available, the current study describes the creation of an online monitoring system, based on information of parents and day care professionals for Dutch children, called “Ontwikkeling Voorop! 0–6.”

A comparison of different screening tools showed that three frequently used questionnaires [Ages & Stages Questionnaire, Third Edition (ASQ-3); Parents' Evaluation of Developmental Status (PEDS); and Survey of Wellbeing of Young Children (SWYC): Milestones] offered adequate specificity, but modest sensitivity, for detecting developmental delays among children aged 9 months to 5 years (Sheldrick et al., 2020). Not one questionnaire emerged superior overall, but shared decision making by parents and day care professionals based upon the results was found to be important. Therefore, questions in a screening tool need to be easily understood by caregivers (Sheldrick et al., 2020).

The quality of the content of the questions within a standardized tool to evaluate early child development needs to be based on clear descriptions of behaviors, skills and responses of young children and preferably with supporting illustrations that allow immediate and systematic observations in playful situations. Distinguishing different domains in development like communication, or gross motor functioning, in the questions, allows a profile of capacities that reflect well-known milestones in development. Toys or other materials described in the questions to evoke and evaluate the children's skills should be available in daily life at home. The description of a series of small tasks with a clear aim, that need performance in a structured manner, and that increase in difficulty with older age (e.g., like the ability to stack 2, 3 or 6 small blocks), increases standardization. Such series of related tasks allow to observe the quality of the child's performance that reflects what the child can or cannot yet, do.

The quality of the skills of very young children, who show a fast developmental pace, also need to be related to the children's age in months. Therefore, a tool should also provide a quantitative evaluation to allow a comparison to the functioning of same-aged peers. To what extent the skills and behaviors of the children actually fit a certain age, needs to be based on detailed standardization evolving from statistically derived, reliable and valid norm scores for the population. The importance of population specific norm scores for evaluation of child development was illustrated by the creation of norm scores for the Dutch version of the Bayley-III-NL, which evaluates the cognitive, language and motor development of children from 1–42 months of age (Van Baar et al., 2014). The use of US norm scores for Dutch children led to substantial differences in Bayley scale outcomes compared to the Dutch norm scores, with some age groups showing discrepancies of up to one standard deviation on the gross motor subscale (Steenis et al., 2015). This indicates that the developmental pace of Dutch children differs from that of the US population.

In line, population specific norms are also necessary for screening tools. Varying, but mostly insufficient, sensitivity and specificity in two cohorts of 9–16 months and 18–42 months aged children was found when a short screening list with six items per domain using the US norm scores, was compared to the Bayley-III-NL using the Dutch norm scores (Steenis et al., 2015).

Hence, our objective was to develop an online monitoring system with standardized questions suitable for tracking the development of young children in The Netherlands. The system also had to allow multiple informants, such as parents and day care professionals, to complete assessments for the same child, as well as enable repeated measurements over time to monitor developmental progress. Third, it should provide population-specific norm scores for each child to ensure accurate comparisons. The purpose of the current paper is to present the process of creating norm scores, as well as determining the reliability and validity of a newly developed online monitoring system.

Methods

Design of the online monitoring system

Questions

The questions of the monitoring system were largely based on the Ages and Stages Questionnaire (ASQ) (Squires and Bricker, 2009). All questions on age related skills and behavior of the children are formulated in Dutch in a positive tone and answered with “yes,” “sometimes” or “not yet.” Small pictures are frequently added for illustration and for some questions further information can be provided in an open field. Nine to 16 questions are asked regarding general concerns of caregivers related to their child's health and functioning. For Communication, Gross Motor, Fine Motor, Problem Solving, and Personal Social behaviors, all questions are arranged sequentially, forming one long questionnaire per developmental domain with gradually increasing, age-related difficulty of the skills. This led to a total of 55 items for Communication, 63 for Gross Motor, 65 for Fine Motor, 65 for Problem Solving, and 58 for Personal-Social.

For the development of the norm scores, demographic information on the participants was needed. Therefore, several demographic background questions were included in the online system, to be able to compose a representative sample for The Netherlands. These demographic questions concerned educational level, relationship status and ethnic background of the parents, the language spoken at home, the postal code and the child's sex, date of birth and gestational age.

Starting-, reverse-, and discontinuation rules

Next, an online and adaptive system, including starting points, reverse rules and discontinuation rules, was made to allow easy implementation and good feasibility for answering the questions. The starting question for each respondent is determined by their child's age, and the question was chosen to be a relatively easy, age-appropriate question for different age groups. If a child received a “not yet” answer on any of the first three questions after a starting question, the system automatically reversed and presented four easier questions. After answering “not yet” on three consecutive questions, the system automatically discontinued and moved on to the next domain and finished with the general and the demographic questions. The online system needs an answer for every item before it moves on. After completing the questionnaire, respondents could download a report containing the questions and their answers. In addition, suggestions for joyful activities were provided at the end of the report that can be played with children of the age concerned.

Use of the system

Within the online system, specific roles are assigned to the users. “Respondents” can either be parents or day care professionals. “Administrators”—who know the parents and day care professionals—are able to invite both kinds of respondents, who then receive a link to the questionnaire. These “Administrators” cannot access the answers and results of the questionnaire. For clinical use and for practitioners, a person with the separate and exclusive role of “mentor”—e.g., a doctor or a day care professional who already knows the child—is able to download a report that includes the results for that individual child. This procedure enables the “mentor” to communicate the results to the parents. Parents do need to provide explicit consent that their information can be shared with a “mentor” for monitoring and follow-up purposes. For scientific use, the role of “researcher” was created, which allows to download pseudonymized data on a group of children in an excel file for research projects. Through a connection code entered by the “administrators” of a specific project, data of the monitoring system can be coupled anonymously to other data a “researcher” already may have acquired.

Procedure

Parents could scan a QR code which directed them to the website of our monitoring system “Ontwikkeling Voorop! 0–6.” Here, they could fill in their e-mail address and their child's sex, date of birth, and gestational age, which would result in an automated e-mail containing the link to their personal questionnaire. Upon clicking the link to the questionnaire, participants were first presented with an information letter, including a consent box. All participants had to provide active consent before entering the questionnaire by agreeing that their data could be used anonymously for scientific research. The child's age was calculated based on the current date and the date of birth of the child—which determined the starting question. Upon registration, parents could select an option if their children were born preterm, that resulted in a questionnaire starting with a question based on the age corrected for the number of weeks of premature birth. Administrators could also choose this option, and then they should also have indicated how many weeks a child was born preterm. Participants could pause the questionnaire and resume at a later moment in time. In addition, after the first click, the online system automatically encrypted personal details and created a code for every participant, so that outcome data were pseudonymized. The GDPR regulations were also followed by informing the participants on the purpose of the study on our website and through flyers or letters used for acquisition. All participants provided informed consent before they started answering the questions. The studies involving human participants were reviewed and approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University, number 21–0065. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Design of the research project: pilot studies

To evaluate the clarity of the questions in Dutch and the feasibility of the online system, a pre-pilot study was conducted from May 2020 to December 2020 using the first version of the online system. In addition, the appropriateness of the starting questions used for children of different ages was evaluated. Furthermore, the sequence of questions was evaluated to ensure that the difficulty level gradually increased with age, and replacements were made when needed. Next, a pilot study was carried out from February 2021 to September 2021 in a larger sample using the second version of the online system, in which adjustments based on the pre-pilot study were incorporated. Again, the appropriateness of the starting questions used for children of different ages and the order of the questions were evaluated. See Supplementary File A for further explanation.

Design of the research project: creation of norms

A third and final version of the monitoring system was used to collect data from October 2021 to October 2023 in a representative sample of parents and young children in The Netherlands, in order to create norm scores and to evaluate reliability and validity of the system.

Acquisition of participants

Caregivers of a child aged from 0–72 months were asked to participate with respectively the pre-pilot, the pilot study and the final study to create Dutch norm scores. Recruitment of participants was done by researchers, students and personal networks, as well as day care institutions. In addition, participants in the final study were recruited through advertisements on social media, through our website and through websites for parents with young children.

We aimed to get information of around 200 answers per item, as this would result in sufficient information to estimate the norm scores.

Based on the characteristics of the representative sample collected in a previous study to create norms for the Bayley-III-NL (Van Baar et al., 2014) we intended to compose a sample that was representative for sex and maternal education, and that consisted of sufficient children to allow estimation of percentile ranks for children from 1 month to 72 months of age who were living in the Netherlands. We did not use ethnic background as a representative characteristic, as this is entangled with the language used, which may be different from Dutch, the language used in the system. As region was not found to show a significant difference in the analyses for the Bayley-III-NL (Van Baar et al., 2014) that characteristic was not used, although we tried to acquire participants from all over the country. Children with a known risk factor like premature birth or a health problem were not excluded, as they are also part of the general population.

Data analyses

Creation of norms

A total raw score per domain was computed for each individual child, by assigning 10 points to all questions preceding the starting question, or the question to which a “reversal” was needed. Next, the questions shown, were answered with “yes,” “sometimes” or “not yet,” which were scored with 10, 5, and 0 points, respectively. If three consecutive questions received a “not yet” response, all remaining questions were automatically scored with 0 points. Total raw scores were calculated by summing all points, with higher scores indicating a more progressed developmental level.

The R package gamlss (R Core Team, 2025; Stasinopoulos and Rigby, 2007) was used to model the conditional distribution of the total raw scores in relation to the child's age in months, for each developmental domain. The conditional distribution used for each total score was the Box-Cox t distribution. This distribution was selected as it suits the situation of a positive total score and because it is a very flexible distribution with four parameters: location, scale, skewness, and kurtosis. Each of these parameters was modeled as a function of age in months. A smoothing-spline was used as the function for each developmental domain and each parameter. A smoothing-spline is a smooth piece-wise cubic polynomial that is able to realistically follow the relevant conditional parameter over age, avoiding the knot-selection issue in ordinary piece-wise regression by just selecting the effective degrees of freedom. The degrees of freedom for each smoothing-spline were determined by a search procedure that took into account theoretical considerations (monotone increasing percentile curves), satisfactory visual worm-plot results (Van Buuren and Fredriks, 2001), and especially low cross-validation prediction error. Once a model was selected for a developmental domain, percentile ranks for all combinations of the raw score and age were tabulated for that domain.

To assist with a quick identification of children that may need further attention based on a quantitative statistical comparison to agemates, cut-offs were selected to identify children who did not show age-appropriate functioning. As percentiles of 3, 10, and 90 are frequently used as cut-off for this purpose, a classification reflecting an attention and traffic light score was created for each domain separately, with the cut-off scores based on the following standardized percentile rank scores.

• Percentile scores ≥90% result in blue

• Percentile scores >10% and < 90% result in green

• Percentile scores ≤ 10% and >3% result in orange

• Percentile scores ≤ 3% result in red

Children showing a red score in one domain, or an orange score in at least two domains may need further attention, in line with the cut-off scores used for other screening instruments (Squires and Bricker, 2009).

Reliability

Internal consistency

Internal consistency was calculated per domain using Macdonald's Omega, for which estimates of >0.70 are considered to be sufficient (Cohen, 1969). Omega was used to take into account differences in variances of the questions and their contribution to a latent factor. A one-factor model was fitted for each domain and the estimated factor loadings and unique variances were used to compute an estimation of Omega for every observed pattern of scores.

Repeated assessments and interrater reliability

Repeated assessments by the same assessors, a group of parents and a group of day care professionals, were analyzed using Pearson correlations, which may show a weak effect size < 0.30, moderate 0.30–0.50 or strong >0.50 (Cohen, 1969). In addition, interrater correlations of assessments of parents and day care professionals referring to the same child were evaluated.

Validity

Convergent and divergent validity were evaluated with data from samples where the monitoring system was used next to other age appropriate standardized tests. Pearson correlations were used to evaluate the relation between the monitoring system and data of the Dutch version of the Bayley scales, Bayley-III-NL (Bayley, 2006; Van Baar et al., 2014), for children from 6 to 30 months of age. The Bayley-III-NL is administered by a trained instructor and provides scores for Cognition, Language (using subscales for comprehension and expression) and Motor functioning (with subscales for gross and fine motor development). For older children of 2.6–6.11 year of age, data was used of the Dutch version of the Wechsler Preschool and Primary Scale of Intelligence – Fourth Edition (Hurks and Hendriksen, 2020) (WPPSI-IV-NL), which measures cognitive development. In addition, data of the new Schlichting language test for children between 2.3 from 6.3 years of age, could be used (Schlichting et al., 2024). Correlations below 0.30 were interpreted as weak, between 0.30 and 0.50 as moderate, and >0.50 as large (Cohen, 1969).

Results

Participants

The respondents of the norm sample were parents of children aged 1–72 months. The sample was based on representativeness for the Dutch population of adults between 25 and 45 years of age with regard to maternal education, sex of the children and children's age distribution, see Table 1. Find the age distribution of the children in our sample in Supplementary File B.

Table 1
www.frontiersin.org

Table 1. Characteristics of the representative sample and the Dutch population.

Questions were answered mostly by mothers (96%) and complete data was available for 2,990 children. Participating parents lived across the Netherlands (12% east, 52% west, 30% north, 6% south). Concerning risk factors, it was found that 7.2% of the parents answered that their child had a health problem, such as allergies, asthma, epilepsy, or heart problems; 5.5% of the children (n = 100) was born preterm, before 37 weeks gestation. For 70% a questionnaire was answered with a starting question based on the age corrected for prematurity. For the other 30% the option to use corrected age was not selected. Most parents identified themselves as Dutch and most spoke Dutch as first language at home. As parents with a high educational level (i.e., higher education or university level) were overrepresented, we weighed the sample for educational level. This meant that all data from parents with a low or medium educational level was retained, whereas a random subsample of highly educated participants was added, while maximizing the post-stratification weight for low education at 2. This resulted in a total sample of 1,690 children to determine the norm scores. A maximum post-stratification weight (psw) of 2.0 was applied to the lowest education level, hence 2 = π1/p1, where π1 = 0.11 and p1 = 93/n. This resulted in different weights: for 93 children assessed by parents with the lowest educational level psw = 1.998; for 539 children in the medium educational category psw = 1.03, and for a randomly chosen subsample of 1,058 children assessed by parents with educational levels in the highest category, psw = 0.89.

Mean age of the children in this sample was 24.6 (SD = 19.1) months. In Table 1 the characteristics of the sample used to create the norm scores are presented, as well as the reference percentages in the Dutch population (Centraal Bureau voor de Statistiek | CBS, 2024).

The data of the representative sample of 1,690 children was used to create the norm scores for the monitoring system. All domains together contained 306 items. Per item, the number of answers given varied from 277 to 848, see Supplementary File B for the ranges per domain. Based on the models for the conditional distribution of the total score for the child's age in months, the norm scores were created for each domain by estimation of the best percentile curves, as shown in Figure 1.

Figure 1
Graphs displaying centile lines for children's development across five areas: Communication, Gross Motor, Fine Motor, Problem Solving, and Personal Social. Each graph plots total scores against age in months, with centile lines labeled from one to ninety-nine. Data points are represented as scattered dots.

Figure 1. Centile lines estimated for age in months, per developmental domain.

Evaluating the “traffic light” outcomes of the children over all domains together, showed that 160 (9.5%) children had a red score, ≤ P3, for at least one domain; 428 children (25.3%) had a score ≤ P10, reflecting orange or red for at least one domain; and 123 (7.3%) had an orange score, P3 ≤ P10, for at least two domains. A “blue” score, ≥ P90, was seen in 507 (30%) children for at least one domain.

Internal consistency

The Omega reliability estimates indicated good internal consistency for all domains, as these were all > 0.88; Communication = 0.93; Gross Motor = 0.90; Fine Motor = 0.88; Problem Solving = 0.91; and Personal Social behavior = 0.93.

Repeated assessments and interrater correlations

Data of repeated assessments by parents and by day care professionals, as well as repeated interrater assessments of parents and daycare professionals, were done for the same child around the same age. These data were collected in two waves between September 2021 and February 2023 at 1 day care center.

For 51 children (55% boys), parents answered the questionnaire twice, with 6 months in between. The mean ages of the children for the first and second waves by the parents were 24 (SD = 12) and 30 (SD = 10) months, respectively. Day care professionals at the same day care center assessed 124 children (51% boys) twice. The mean ages of the children for the first and second wave of assessments by the day care professionals were 24 (SD = 12) and 30 (SD = 10) months, respectively. In addition, 100 children at a first wave and 46 children at a second wave were assessed by both parents and day care professionals and they were respectively, 24 (SD = 12) months in both groups at the first wave and 30 (SD = 10) months at the second wave.

Pearson correlations (Table 2) are all moderate to strong in size and significant with p < 0.001, except for the Personal-Social domain (r = 23) at the first wave of assessments by parents and day care professionals. Most correlations between the parents and day care professionals concerning the same children are somewhat stronger for the second wave of assessments compared to the first wave.

Table 2
www.frontiersin.org

Table 2. Pearson correlations of the repeated assessments and interrater assessments for all domains for their first and second waves.

We checked if the scores from the 51 parents who had information for repeated assessments at the first and second wave differed from the scores of the 53 other parents that only answered the questions at the first wave, but no significant differences were found (results not shown). Scores from the day care professionals for 124 children who had repeated assessments at the first and second wave, also did not differ from the scores of the 68 children by day care professionals that only answered the questions at the first wave (results not shown).

Validity

Validity of the monitoring system was evaluated by computing Pearson correlations with other age appropriate and standardized assessments. Data of the monitoring system was available from parents as well as day care professionals. Trained instructors administered standardized assessments using three different instruments: the Bayley-III-NL (Van Baar et al., 2014) for children (61% boys) with a mean age of 18,6 months (SD = 5.6; range 8–28), the WPPSI-IV-NL (Hurks and Hendriksen, 2020) (53% boys) with a mean age of 3.3 years (SD = 1.0; range 2–5) or the Schlichting Language test (Schlichting et al., 2024) (44% boys) with a mean age of 3,8 years (SD = 1.2; range 2–6).

Parent-completed assessments showed some significant correlations with the Bayley scales: the Communication domain correlated moderately with the Bayley Language scale, and the Gross Motor domain correlated moderately with the Motor scale of the Bayley scales. For the older children, most domains of the monitoring system, except the Gross Motor domain, significantly correlated with a moderate to strong effect size to the WPPSI scales, specifically to the Total and the Verbal indices. All domains of the monitoring system correlated significantly with weak to moderate strength to the general Language Quotient (TQ) of the Dutch language test for children from 2–6 years (Schlichting et al., 2024). Communication related to most facets of the language test, whereas the gross motor domain related to the least facets, see Table 3.

Table 3
www.frontiersin.org

Table 3. Pearson correlations of the monitoring system with the Bayley-III-NL, WPPSI-IV-NL, and Schlichting language test.

The assessments by the day care professionals regarding Communication, Problem Solving and Personal Social behavior correlated significantly with the Cognition and Language scales of the Bayley-III-NL, see Table 3. Although the Gross Motor domain showed moderate to strong correlations, these were not statistically significant in this small group. For the older children, most domains of the monitoring system showed moderate to strong correlations with the WPPSI-IV-NL Indices.

Discussion

This study reported the creation of an online system to monitor the development of young children from 0–6 years. In addition, the creation of the norm scores for the Dutch population and its reliability and validity is reported. After a careful preparation that included a pre-pilot and a pilot study, Dutch norm scores could be created for the questions of the system, based on a representative sample of the Dutch population. The norm scores of the system are presented automatically in the system per domain in percentile rank scores, as well as in a “traffic light” indicator. This online monitoring system was found to be reliable and it resulted in valid outcomes. All five developmental domains showed high internal consistency, indicating that each domain measures one underlying construct. In addition, the results of the repeated assessments within parents and within day care professionals, showed moderate to strong correlations for nearly all domains, even though the second measurement was 6 months later. Therefore, the system shows predictive potential for the future developmental level of the children. Interrater reliability between the assessments of parents and day care professionals regarding the same children was moderate to strong. The stronger correlations between parents and day care professionals observed in the second wave may be explained by the day care professionals' increased familiarity with the children. The questions answered in these repeated assessments were age-appropriate, so some of the questions differed between the two waves. The results are based on standardized percentile scores, so these outcomes concern the general developmental level of the children and not the evaluation if the parents or day care professionals gave the same answer on the same question at the second wave.

We have developed an online system that is able to screen and to monitor child development as well. A screening tool classifies whether a child is delayed or not, which is prone to classification errors (Sheldrick et al., 2015). A monitoring system allows more detailed and precise information regarding children's developmental level, and it enables to detect children with severe or borderline developmental delay, as well as children who are ahead in their development. In our new online monitoring system, children are presented with age-appropriate questions, and the questions are adapted to the answers given. If children do not yet show skills that expected based on their chronological age, easier questions are provided. If children are faster, the system provides more difficult questions up until the point that the respondent answers “not yet” three times in a row. This allows to get a complete picture of the children's development. This new system enabled us to create norms (i.e., percentile score per age in months), which makes this system suitable to monitor children's development over time for ages between 1 and 72 months. The online tool also allows parents and professionals to answer the same questions about the same child, hence providing a clear base for conversations and advice about the developmental progression of the child. In addition, now the norm scores have been built into the system, the final standardized scores for a preterm born child are based on the corrected age of the child, when the option for prematurity is used. Our paper adds to the current body of the scientific literature, presenting how such a system can be created and how norms can be established. The online monitoring tool can also be useful both for research and practice, by expanding the opportunities for evaluations, screening and monitoring of development in early childhood.

Regarding validity of the system, it was found that the assessments of all developmental domains can be seen as sufficiently valid for children of different ages from 0–6 years. Generally, significant associations were found for domains and scales that conceptually are supposed to correlate, as well as no, or lower, correlations between domains that are supposed to be less related, like gross motor functioning and cognitive tasks.

For (clinical) practice, the “traffic light” scores of the system form a quick indication per domain to identify children that may be at risk, as they show skills that are “not yet age appropriate” or “below age expectations” in comparison to their peers, or indeed for children that may be fast in development, as they show a “fast” score. This indicator may help practitioners to decide which children would need further attention and in what way. Children with a red score (≤3rd percentile) in at least one developmental domain, need to be referred for further assessment of their development and functioning with an established developmental test, in order to evaluate if extra stimulation or intervention might be appropriate. Children who show an orange score (3rd – ≤ 10th percentile) in at least two domains, may need to be monitored closely, perhaps first with a repeated assessment within 6 months to evaluate their developmental progression and if needed, also with a referral for an evaluation by a professional using an established developmental test. These criteria to flag children have frequently been reported and advised to be used as cut-off scores in practice (e.g., Squires and Bricker, 2009). The children who show a “blue” score (≥90th percentile) in any domain might develop faster than most of their same-aged peers, and they might also benefit from specific stimulation adapted to their needs and interests. For them a referral for further assessment with an established developmental test might also be helpful. For all children, questions marked as “not yet” at the end of the questionnaire indicate their zone of proximal development (Vygotsky, 1978), which may result in ideas for potentially useful stimulating activities.

Incorporating this system in day care institutions will result in standardized outcomes regarding developmental level of the children that may underly their reports and conversations with parents or other health care providers. Including the system and hence the parents' vision on child development in existing early childhood monitoring programs, like the follow up programs for extremely preterm born children, may further improve potential referrals and support for children's cognitive, motor, and socio-emotional skills. A standardized online monitoring system may form an important anchor point in a specific follow-up or a community-based prevention program. Other strengths of using this monitoring system are its cost efficiency and easy and quick data collection, compared to other questionnaires or developmental tests. Moreover, it may help parents and day care professionals to better observe child development in detail, and to increase knowledge about child development based on the detailed questions (Hughes et al., 2016), particularly for disadvantaged children (Busse and Gathmann, 2020; Felfe and Lalive, 2018). It is important to use positively framed terms in conversations with parents. For parents, the categories may need to be presented in positive terms like “he develops at his own pace and may enjoy specific stimulation.”

In addition, the use of an online system provides an efficient way to connect science and practice, as it allows monitoring of individual children as well as the collection of pseudonymized data from larger groups. As data collection continues, norms for specific subgroups can be created.

A limitation of our study may be found in the representativeness of our sample for the Dutch population. In composing the sample, we followed accepted procedures for generating norm-sores in developmental assessments as closely as possible (Van Baar et al., 2014). However, we could not include ethnic background as a key characteristic for representativeness as proficiency of the Dutch language was needed to fill out our questionnaire. Regarding prematurity it turned out that for 30% of the preterm born children a questionnaire was used that did not start with a question based on the age corrected for prematurity. This concerned only a small number (1.78% of the sample) of children. However, due to the reversal rule, the respondents would already automatically be reversed to four questions of an easier level, if a “not yet” answer was given for one of the first three items, which provided some safety guard for too difficult questions in these cases. Including or excluding children with risk factors has also been a point of discussion. Parental characteristics such as conscientiousness, or anxiety, might also have influenced the results. However, as we used a large sample, representative for the Dutch population, such factors may not have been overrepresented in our sample. Refraining from the identification of risk factors as either an in- or an exclusion factor, has indeed resulted in a sample that encompasses children with a variety of risk factors—just as you would expect in the population. Further study could be done to evaluate if such personality characteristics do show a specific effect and would need to be taken into account in representative samples.

Although the current study provides evidence of the validity of the system, further studies that compare Ontwikkeling Voorop! 0–6 to standardized tests regarding different domains of functioning of infants, toddlers, and preschool aged children, could strengthen the validation of the system. We did not yet perform sensitivity analyses concerning the traffic light scores, as we only had a very small sample size of children scoring “slow” or “very slow,” who also had been examined with another test. Such sensitivity analyses warrant investigation in future studies with adequate sample sizes. The correlations between our online monitoring system with the developmental assessments (Bayley and WPPSI) were stronger for older children (WPPSI) than for the younger children (Bayley). In addition, the patterns of correlations are somewhat different for the parents and the day care workers. This finding was considered important, but also beyond the scope of the current study, and it should be addressed in further research.

A limitation of the monitoring system may be that for some children many questions need to be answered, especially for children with “blue” scores (≥90th percentile). As respondents could stop and resume their answers later, and the number of questions could differ even for children of the same age, we were unable to determine the exact time needed to fill out the complete questionnaire. Nevertheless, we are currently working on shorter versions that will stop at the cut-off for a “green” score (i.e., 10th percentile), which indicates that a child shows age-appropriate functioning, which may suffice for certain screening or monitoring aims. Further study may also include deleting redundant items. A Computer Adaptive Test (CAT) version is now under development, based on an IRT model, which may also result in a shorter version. Finally, a more general limitation concerns the use of a questionnaire, as this requires quite some language capacity and concentration of the respondents to answer the questions—even if these are formulated at a basic level and include pictures for clarification.

All in all, the monitoring system forms a reliable and valid tool standardized with norm scores for the Dutch population. It offers a feasible and efficient way to monitor young children's development, which can contribute to further support their development.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving human participants were reviewed and approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University, number 21-0065. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

AVB: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Supervision, Validation, Writing – original draft, Writing – review & editing. MV: Conceptualization, Data curation, Investigation, Methodology, Validation, Writing – review & editing. DH: Visualization, Formal analysis, Methodology, Writing – review & editing. LP-T: Data curation, Investigation, Validation, Writing – review & editing. LK: Conceptualization, Investigation, Methodology, Validation, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The pre-pilot and the pilot studies were funded by the city Council of Amsterdam and the Kinderopvangfonds.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdpys.2025.1727158/full#supplementary-material

References

Barger, B., Rice, C., Benevides, T., Salmon, A., Sanchez-Alvarez, S., and Crimmins, D. (2022). Are developmental monitoring and screening better together for early autism identification across race and ethnic groups? J. Autism Dev. Disord. 52, 203–218. doi: 10.1007/s10803-021-04943-8

PubMed Abstract | Crossref Full Text | Google Scholar

Bayley, N. (2006). Bayley Scales of Infant and Toddler Development, 3rd Edn. Bloomington: NCS Pearson, Inc.

Google Scholar

Bogičević, L., Verhoeven, M., and van Baar, A. L. (2021). Exploring predictors at toddler age of distinct profiles of attentional functioning in 6-year-old children born moderate-to-late preterm and full term. PLoS ONE 16:e0254797. doi: 10.1371/journal.pone.0254797

PubMed Abstract | Crossref Full Text | Google Scholar

Busse, A., and Gathmann, C. (2020). Free daycare policies, family choices and child development. J. Econ. Behav. Organ. 179, 240–260. doi: 10.1016/j.jebo.2020.08.015

Crossref Full Text | Google Scholar

Cairney, D. G., Kazmi, A., Delahunty, L., Marryat, L., and Wood, R. (2021). The predictive value of universal preschool developmental assessment in identifying children with later educational difficulties: a systematic review. PLoS ONE 16:e0247299. doi: 10.1371/journal.pone.0247299

PubMed Abstract | Crossref Full Text | Google Scholar

Centraal Bureau voor de Statistiek | CBS (2024). Available online at: https://www.cbs.nl/nl-nl (Accessed October 29, 2024).

Google Scholar

Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press.

Google Scholar

Cuomo, B., Joosten, A., and Vaz, S. (2021). Scoping review on noticing concerns in child development: a missing piece in the early intervention puzzle. Disabil. Rehabil. 43, 2663–2672. doi: 10.1080/09638288.2019.1707296

PubMed Abstract | Crossref Full Text | Google Scholar

Felfe, C., and Lalive, R. (2018). Does early child care affect children's development? J. Public Econ. 159, 33–53. doi: 10.1016/j.jpubeco.2018.01.014

Crossref Full Text | Google Scholar

Filgueiras, A., Pires, P., Maissonette, S., and Landeira-Fernandez, J. (2013). Psychometric properties of the Brazilian-adapted version of the Ages and Stages Questionnaire in public child daycare centers. Early Hum. Dev. 89, 561–576. doi: 10.1016/j.earlhumdev.2013.02.005

PubMed Abstract | Crossref Full Text | Google Scholar

Hughes, M., Joslyn, A., Wojton, M., O'Reilly, M., and Dworkin, P. H. (2016). Connecting vulnerable children and families to community-based programs strengthens parents' perceptions of protective factors. Infants Young Child 29, 116–129. doi: 10.1097/IYC.0000000000000059

Crossref Full Text | Google Scholar

Hurks, P., and Hendriksen, J. (2020). WPPSI-IV-NL Wechsler Preschool and Primary Scale of Intelligence, Fourth Edition, Nederlandstalige bewerking, Technische handleiding. Amsterdam: Pearson Benelux B.V.

Google Scholar

Hutchins, H., Abercrombie, J., and Lipton, C. (2023). Promotion of early childhood development and mental health in quality rating and improvement systems for early care and education: a review of state quality indicators. Early Child Res. Q. 64, 229–241. doi: 10.1016/j.ecresq.2023.03.006

PubMed Abstract | Crossref Full Text | Google Scholar

Marshall, J., Coulter, M. L., Gorski, P. A., and Ewing, A. (2016). Parent recognition and responses to developmental concerns in young children. Infants Young Child 29, 102–115. doi: 10.1097/IYC.0000000000000056

Crossref Full Text | Google Scholar

Miller, L. E., Perkins, K. A., Dai, Y. G., and Fein, D. A. (2017). Comparison of parent report and direct assessment of child skills in toddlers. Res. Autism Spectr. Disord. 41–42, 57–65. doi: 10.1016/j.rasd.2017.08.002

PubMed Abstract | Crossref Full Text | Google Scholar

R Core Team (2025). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available online at: https://www.R-project.org/ (Accessed May 15, 2025).

Google Scholar

Schlichting, L., Duinmeijer, I., and Diender, M. (2024). Schlichting Test voor Taalbegrip en Taalproductie. Houten: Bohn Stafleu van Loghum.

Google Scholar

Schlichting, L. E., Vivier, P. M., Berger, B., Parrillo, D., and Sheldrick, R. C. (2023). From descriptive to predictive: linking early childhood developmental and behavioral screening results with educational outcomes in kindergarten. Acad. Pediatr. 23, 616–622. doi: 10.1016/j.acap.2022.07.022

PubMed Abstract | Crossref Full Text | Google Scholar

Schonhaut, L., Maturana, A., Cepeda, O., and Serón, P. (2021). Predictive validity of developmental screening questionnaires for identifying children with later cognitive or educational difficulties: a systematic review. Front. Pediatr. 9:698549. doi: 10.3389/fped.2021.698549

PubMed Abstract | Crossref Full Text | Google Scholar

Sheldrick, R. C., Benneyan, J. C., Kiss, I. G., Briggs-Gowan, M. J., Copeland, W., and Carter, A. S. (2015). Thresholds and accuracy in screening tools for early detection of psychopathology. J. Child Psychol. Psychiatry 56, 936–948. doi: 10.1111/jcpp.12442

PubMed Abstract | Crossref Full Text | Google Scholar

Sheldrick, R. C., Marakovitz, S., Garfinkel, D., Carter, A. S., and Perrin, E. C. (2020). Comparative accuracy of developmental screening questionnaires. JAMA Pediatr. 174, 366–374. doi: 10.1001/jamapediatrics.2019.6000

PubMed Abstract | Crossref Full Text | Google Scholar

Squires, J., and Bricker, D. (2009). Ages and Stages Questionnaires® A Parent-Completed Child Monitoring System Third Edition. Baltimore, MD: Paul H. Brookes Publishing Co.

Google Scholar

Stasinopoulos, D. M., and Rigby, R. A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. J. Stat. Softw. 23, 1–46. doi: 10.18637/jss.v023.i07

Crossref Full Text | Google Scholar

Steenis, L. J. P., Verhoeven, M., Hessen, D. J., and van Baar, A. L. (2015). Performance of dutch children on the Bayley III: a comparison study of US and Dutch Norms. PLoS ONE 10:e0132871. doi: 10.1371/journal.pone.0132871

PubMed Abstract | Crossref Full Text | Google Scholar

Theunissen, M. H. C., Bezem, J., Reijneveld, S. A., and Velderman, M. K. (2022). Developmental monitoring: benefits of a preventive health care system. Eur. J. Pediatr. 181, 3617–3623. doi: 10.1007/s00431-022-04577-7

PubMed Abstract | Crossref Full Text | Google Scholar

Van Baar, A. L., and de Graaff, B. M. T. (1994). Cognitive development at preschool-age of infants of drug-dependent mothers. Dev. Med. Child Neurol. 36. doi: 10.1111/j.1469-8749.1994.tb11809.x

PubMed Abstract | Crossref Full Text | Google Scholar

Van Baar, A. L., Steenis, L. J. P., Verhoeven, M., and Hessen, D. J. (2014). Bayley-III-NL. Technische Handleiding. Amsterdam: Pearson Assessment and Information B.V.

Google Scholar

Van Baar, A. L., Ultee, K., Gunning, W. B., Soepatmi, S., and Leeuw, R. (2006). Developmental course of very preterm children in relation to school outcome. J. Dev. Phys. Disabil. 18, 273–293. doi: 10.1007/s10882-006-9016-6

Crossref Full Text | Google Scholar

Van Buuren, S., and Fredriks, M. (2001). Worm plot: a simple diagnostic device for modelling growth reference curves. Stat. Med. 20, 1259–1277. doi: 10.1002/sim.746

PubMed Abstract | Crossref Full Text | Google Scholar

Vygotsky, L. S. (1978). Mind and Society: The Development of Higher Psychological Processes. Cambridge: Harvard University Press.

Google Scholar

Keywords: monitoring, screening, child development, early identification, norm scores, day care, parent report

Citation: Van Baar A, Verhoeven M, Hessen D, de Paauw-Telman L and Krijnen L (2025) Monitoring development from 0–6 years: an online system, standardized for Dutch children. Front. Dev. Psychol. 3:1727158. doi: 10.3389/fdpys.2025.1727158

Received: 17 October 2025; Revised: 18 November 2025;
Accepted: 18 November 2025; Published: 17 December 2025.

Edited by:

Marco Lunghi, University of Padova, Italy

Reviewed by:

Lucia Ráczová, Constantine the Philosopher University in Nitra, Slovakia
Stefania Zoia, University of Padova, Italy

Copyright © 2025 Van Baar, Verhoeven, Hessen, de Paauw-Telman and Krijnen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Anneloes Van Baar, YS5sLnZhbmJhYXJAdXUubmw=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.