Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Pediatr., 02 December 2025

Sec. Children and Health

Volume 13 - 2025 | https://doi.org/10.3389/fped.2025.1706162

Case-control accuracy study for TOY8 digital developmental screening tool for detecting delays among children aged 3–5 years


Teck-Hock Toh,,,
&#x;Teck-Hock Toh1,2,3,4*Yvonne Mei-Fong Lim,&#x;Yvonne Mei-Fong Lim5,†Jeffrey Soon-Yit Lee,Jeffrey Soon-Yit Lee1,2Wai-Nam ChanWai-Nam Chan1Zhen-Ying LowZhen-Ying Low6Kamilah DahianKamilah Dahian1Siok-Cheng Chew,Siok-Cheng Chew3,7Sheamini SivasampuSheamini Sivasampu5William Kian-Boon Law,&#x;William Kian-Boon Law5,‡Amar-Singh HSS,&#x; for TOY Case-Control Study Team
Amar-Singh HSS3,‡ for TOY8 Case-Control Study Team
  • 1Clinical Research Centre, Sibu Hospital, Ministry of Health Malaysia, Sibu, Malaysia
  • 2Department of Paediatrics, Sibu Hospital, Ministry of Health Malaysia, Sibu, Malaysia
  • 3National Early Childhood Intervention Council, Kuala Lumpur, Malaysia
  • 4Faculty of Medicine, Nursing & Health Sciences, SEGi University, Kota Damansara, Malaysia
  • 5Institute for Clinical Research, National Institutes of Health, Ministry of Health Malaysia, Setia Alam, Malaysia
  • 6Department of Paediatrics, Kapit Hospital, Ministry of Health Malaysia, Kapit, Malaysia
  • 7Department of Early Childhood Education, Methodist Pilley Institute, Sibu, Malaysia

Background: Developmental delays affect up to 18% of children worldwide, particularly in disadvantaged populations. Early identification is critical; however, existing tools are often resource-intensive, language-dependent, and unsuitable for large-scale use in low- and middle-income countries. TOY8 is a smartphone-based, play-oriented developmental screening tool developed in Malaysia for children aged 3–5 years, available in Malay and English.

Purpose: To validate TOY8 against the Griffiths Scales of Child Development, 3rd Edition (Griffiths III), determine optimal cut-offs, and assess parental perceptions of feasibility and acceptability.

Methods: We conducted a case-control study in Sarawak, Malaysia. Participants underwent TOY8 screening followed by Griffiths III assessment. Screening performance was evaluated using sensitivity, specificity, likelihood ratios, and receiver operating characteristic (ROC) analyses. Optimal cut-offs were derived by maximizing sensitivity while maintaining specificity at ≥0.6. Parental perceptions were measured using questionnaires.

Results: We recruited 127 children (64 with developmental delay, 63 without). TOY8 demonstrated good sensitivity (0.77) for detecting any developmental delay and higher sensitivity for severe delay (0.84). Cognitive, speech-language, and fine motor domains demonstrated excellent discrimination (AUC 0.82–0.84), but lower sensitivity for gross motor (0.41–0.54) and personal-adaptive domains (0.59–0.64). Refined domain-specific cut-offs (ROC: 44–50) improve screening accuracy. Parents rated TOY8 highly: 98.4% found it easy/very easy to use, 99.2% useful, and 96.9% acceptable.

Conclusion: TOY8, the first digital developmental screening tool validated in Malaysia, demonstrated good accuracy, particularly in domains predictive of school readiness. Its brevity, ease of use, and strong parental acceptability support its feasibility for community and preschool settings. TOY8 offers a scalable solution for early detection in resource-limited contexts, directly advancing United Nations Sustainable Development Goal (SDG) 3 on health and well-being, and SDG 10 on reducing inequalities by improving access to developmental screening in underserved populations.

Introduction

Early childhood is a period of rapid physical, cognitive, motor, and socio-emotional development, typically monitored through developmental milestones that provide benchmarks for expected achievements at different ages (1). Inability to achieve age-based developmental milestones can signal underlying developmental disorders, or global developmental delay, defined as a significant lag in two or more domains (2, 3). Reported prevalence is as high as 16% to 18% and appears to have increased, largely reflecting improved awareness and detection rather than a true rise in incidence (4). Rates are higher in disadvantaged populations, underscoring the importance of timely detection and intervention (4, 5).

In Malaysia, developmental surveillance is routinely provided only until 18 months of age, integrated with the Ministry of Health's immunization schedule. Between the ages of two and six, children generally do not receive structured surveillance, while private sector services are limited and costly (6). On the other hand, some disorders often become more evident in the preschool years, including developmental language disorder, autism spectrum disorder, intellectual disability, attention deficit hyperactivity disorder, and specific learning disorders (e.g., dyslexia) (3, 6). This gap mirrors international concerns, where structured developmental surveillance after age three is inconsistently practiced (3).

Various instruments are available for developmental screening and assessment. Common screening tools, such as the Brigance Early Childhood Screen, Schedule of Growing Skills, and Denver Developmental Screening Test, rely on parental or teacher reports, direct observation, or both. More comprehensive assessments, including the Griffiths Scales of Child Development, 3rd Edition (Griffiths III), and the Bayley Scales of Infant and Toddler Development, provide in-depth evaluations across multiple domains. However, these assessments are resource-intensive, requiring trained healthcare professionals and lengthy administrative time, which limits their feasibility for community screening, especially in low- and middle-income countries (LMIC) with limited specialist services. Furthermore, few culturally adapted tools have been validated for children under five years in such contexts (7). Parent-report instruments, such as the Child Behavior Checklist and the Gilliam Autism Rating Scale, are easier to use but focus largely on behavioral domains and also require professional interpretation. Although effective in high-risk groups, they are not practical for universal screening. Accordingly, a pressing need remains for an accessible and scalable tool designed to be used with minimal specialized expertise. Given recommendations that developmental surveillance continue through preschool years, there is a clear need for practical tools applicable to 3- to 5-year-olds in community and preschool settings (3).

TOY8 is a digital, play-based developmental screening tool developed in Malaysia for children aged 3–5 years, available in both Malay and English. Created by an ex-Nintendo game developer, it uses artificial intelligence (AI) to analyze children's interactions with engaging, game-like tasks that assess five developmental domains: gross motor, fine motor, speech-language, cognitive, and personal-adaptive skills. Designed for use by trained non-healthcare professionals on a smartphone, TOY8 provides an immediate, domain-specific developmental profile to support early identification and communications with parents (8). This approach bridges technology, play, and accessibility, enabling culturally appropriate developmental screening in community and preschool settings.

Early internal validation of TOY8 involving 9,000 Malaysian children demonstrated good psychometric reliability and cross-language consistency, reflecting potential as a scalable community tool (9). However, formal validation against a gold-standard developmental assessment is necessary before widespread adoption. This study, therefore, aimed to evaluate the sensitivity, specificity, and discrimination of TOY8 compared with Griffiths III for screening developmental delays in children. In addition, we also aimed to establish optimal cut-off scores for detecting developmental delays and to evaluate their feasibility, acceptability, and usability in community settings.

By addressing a critical gap in preschool developmental surveillance, this work aligns with the United Nations Sustainable Development Goals (SDGs) (10). It supports SDG 3, ensuring healthy lives and promoting well-being for all at all ages, and SDG 10, reducing inequalities by expanding access to early identification and intervention in underserved populations (11). Findings from this study will inform the role of TOY8 as a culturally adapted, practical, and scalable tool for early detection of developmental delays in Malaysia and other LMICs.

Materials and methods

Study design and reporting

This prospective, case-control validation study employed stratified random sampling of children, enabling evaluation of TOY8's accuracy across a broad spectrum of developmental delays (12). Recruitment was conducted between April and August 2024. Parents provided written informed consent. All children first completed TOY8, followed by the Griffiths III as the reference standard. The study was approved by the Medical Research and Ethics Committee, Ministry of Health Malaysia [NMRR ID-24-00262-TV1 (IIR)]. It was conducted in accordance with the principles outlined in the Declaration of Helsinki. The manuscript is reported in accordance with the Standards for the Reporting of Diagnostic Accuracy (STARD) (13).

Participants and sites

Children aged 3 years 0 months to 5 years 11 months (calculated from date of birth to the date of Griffiths III assessment) were eligible. A case was defined as a child with a known developmental delay identified by site investigators at the Lau King Howe Memorial Children's Clinic @ Agape Centre, a community-based child development clinic. A control was a child without developmental, behavioral, or learning concerns, who might be siblings to a child attending the clinic, visited the Mother-and-Child Health Clinic (MCH), general pediatric clinics at Sibu Hospital, or attended local preschools.

Eligibility criteria

All children could understand and communicate in simple Malay or English. Cases are children with developmental disabilities confirmed by a pediatrician, including autism spectrum disorder (DSM-5 Level I or II), or global developmental delay (significant delay in ≥2 developmental domains). Children unable to ambulate unassisted (either because of cerebral palsy, spina bifida, limbs abnormalities, muscular dystrophy, or any other developmental disabilities), has confirmed hearing impairment (except for those not requiring hearing aids), and/or visual impairment affecting abilities to read and not corrected by spectacles, and those too hyperactive or non-cooperative in completing the tests, were excluded. Children with obvious genetic conditions (e.g., Down syndrome or other genetic disorders) were also excluded.

Developmental screening tool

TOY8 was administered on a smartphone, it engages children with a cartoon character through interactive tasks such as answering questions, drawing, or stacking. Screenings were supervised by trained non-healthcare personnel, such as early childhood education teachers with knowledge and experience in child development. Minimal training is required to use the tool effectively. TOY8 evaluates five domains: cognitive, speech-language, fine motor, gross motor, and personal-adaptive behavior. A domain score <40 indicated developmental delay. Further details on the TOY8 tool are described in the article by Wo et al (9). Parents also completed a questionnaire component of TOY8 on gross motor and social-emotional skills.

Developmental reference assessment test

Griffiths III measures child development from birth to six years across five subscales: Foundations of Learning, Language and Communication, Eye-Hand Coordination, Personal–Social–Emotional, and Gross Motor (14). Standardization of the Griffiths Mental Development Scales—Extended Revised was conducted by the Association for Research in Infant and Child Development in 2015 on samples from the United Kingdom and Ireland (15). In this study, Griffiths III was administered by trained research nurses or a pediatrician under the supervision of the first author (16). Children scoring below the 10th centile in any subscale were categorized as having mild-moderate delay; those below the 3rd centile were categorized as severe delay.

For validation, TOY8 domains were mapped against Griffiths III subscales:

• Cognitive → Foundations of Learning

• Speech-Language → Language and Communication

• Fine Motor → Eye and Hand Coordination

• Gross Motor → Gross Motor

• Personal-Adaptive Behavior → Personal-Social-Emotional

Reliability testing

To assess inter-rater reliability, a subset of 10 children (from those with and without a history of developmental delay) underwent Griffiths III with simultaneous scoring by two assessors. While a child was undergoing the Griffiths III test, a second rater sat and observed the assessment without interruption and scored the assessment. The intraclass correlation coefficient was 0.99 (95% CI: 0.998, 0.999), indicating excellent agreement. Bland-Altman analysis showed a mean difference of 0.01 (95% CI: −0.884, 0.894), with a maximum raw score difference of two points, suggesting no systematic bias (17).

Children in the control group identified with developmental delays on Griffiths III were offered a free follow-up assessment by a consultant pediatrician at the same clinic. Those without concern or delay continued routine developmental surveillance at their MCH clinics.

Sample size calculation, sampling and study procedure

In this study, the sample size was estimated to be at least 124 subjects, in a 1:1 ratio for children with and without a known history of developmental delay. The calculation was based on the method by Burderer et al. (18). The total sample size required was 124 children, with 62 each for those with and without history of delay. Details of the sample size calculation can be found in the Supplementary File S1.

To ensure demographic representation, sampling was stratified by age and sex. Details of the approximate distributions are summarized and available in the Supplementary File S1. All eligible records were identified from patient registration books, categorized by age and gender strata, and randomly sampled.

Each child attended at least two visits. At enrollment (days −14 to day 0), procedures included obtaining informed consent, screening for eligibility criteria, and collecting demographics and medical history. TOY8 screening was generally conducted on the same day as enrolment, though in some instances it occurred within one week [n = 15 (21.1%) for those with a history of delay; n = 8 (12.5%) for those without]. Because both TOY8 and Griffiths III measure developmental status at a specific age, the interval between the two tests was kept short. Griffiths III testing was performed at least seven days (and up to 14 days) after TOY8 to minimize potential carry-over effects, such as improved task familiarity, motor coordination, or memory recall from the first test. On the TOY8 screening day, parents were also surveyed on their perceptions of the tool's ease of use, usefulness, and acceptability.

If a child was unwell on the test day (e.g., fever, acute respiratory illness, recent seizure, or medication use within 24 h), testing was postponed. Parents were allowed to be present during both TOY8 and Griffiths III sessions, but were not permitted to intervene, prompt, or correct the child. Those who intervened were cautioned or asked to leave the room. Families received a one-off reimbursement of MYR30 (about USD7) for transport per visit involving TOY8 and Griffiths III. Assessment reports were provided free of charge. Data from participants who withdrew were not included in the analysis.

Statistical analysis

Analyses were conducted using R version 4.4.1 (19), with statistical significance set at p < 0.05. Missing data were minimal; only ethnicity data were incomplete (2%). For parental education, the highest level attained by either parent was recorded. Sensitivity and specificity of the TOY8 were reported, and 0.7 was the accepted minimum sensitivity (20). To facilitate interpretation of screening results at an individual child level, likelihood ratios were computed in addition to sensitivity and specificity. The positive likelihood ratio (LR+) gave the probability of the outcome given a positive score on the screening tool. The absolute value of the LR+ from 1 to 2 indicated a minimal probability of the outcome, 2–5 a small probability, 5–10 a moderate probability, and ≥10 indicated a large, conclusive probability (21).

Receiver operating characteristic (ROC) curves illustrated the trade-off between sensitivity and specificity. The area under the ROC curve (AUC) was used to quantify discriminative performance: 0.7–0.8 acceptable, 0.8–0.9 excellent, and >0.9 outstanding (22, 23). Finally, optimal cut-off scores for the TOY8 subscales were determined by maximizing sensitivity while maintaining a specificity of ≥0.6. This criterion was chosen to reflect the screening purpose of TOY8, the high expected prevalence of developmental delay in the community, and the associated costs of missed cases (24, 25).

Results

Participants

Of 135 eligible children, 127 (94.1%) completed the TOY8 and Griffiths III assessments. Eight participants were excluded due to incomplete study procedures (seven with history of developmental delays, one without): unable to complete the Griffiths III because of hyperactivity and/or lack of cooperation (n = 5), violation of eligibility criteria (n = 1), and withdrawal due to transport issues (n = 2). The final cohort had a median age of 4.5 years (range 3.1–5.96 years); 51.2% (n = 65) were boys. The case-control design yielded equal representation by age strata: 3 years (n = 42, 33.1%), 4 years (n = 43, 33.9%), and 5 years (n = 42, 33.1%). In terms of ethnic distribution, 42.5% (n = 54) were Chinese, 26.8% (n = 34) were Malays, 21.3% (n = 27) were Iban, and the remaining 7.9% (n = 10) belonged to other ethnicities. Socio-economic status was balanced between groups, as proxied by parents’ educational level, where 32.3% (n = 41) had secondary education or below, 24.4% (n = 31) had a Certificate/Diploma, and the remaining 43.3% (n = 55) had a Bachelor's degree or higher. TOY8 was substantially quicker to administer than Griffiths III [median 21 min (IQR 17–26) vs. 90 min (IQR 75–105)].

Table 1 presents the socio-demographic characteristics of the children with developmental delay (i.e., cases) and those without delay. All socio-demographic characteristics were similar between the two groups (p > 0.05) except for ethnicity and parental education. Notably, 10 children initially recruited as controls were later found to have developmental disorders on Griffiths III, highlighting undetected problems within the community sample.

Table 1
www.frontiersin.org

Table 1. Baseline characteristics of children with and without a known history of developmental delay.

TOY8 scores in children with and without developmental delays

The study population was diverse with respect to the number of subscales affected by delay on the Griffiths III assessments. Thirty-one percent (n = 39) of the children had no delay, 21.3% had a developmental delay in only one subscale, 12.6%, 8.7%, 11.0% and 15.7% had a delay in two, three, four, and five subscales, respectively. Table 2 shows the percentage of children with positive TOY8 scores for any developmental delay. Delays were most common in Speech-Language (66.9%, n = 85), followed by Personal-Adaptive Behavior, and least in Gross Motor (29.1%, n = 37). TOY8 scores were consistently lower among children with delays compared to those without (p < 0.001). The largest score gaps were observed in Cognitive (median difference = 23.02), Personal-Adaptive Behavior (21.92), and Speech-Language (18.80).

Table 2
www.frontiersin.org

Table 2. TOY8 T scores between any developmental delays and no delay groups by subscales.

Screening performance of TOY8

With a cut-off score of 40, TOY8 demonstrated adequate sensitivity (0.77) for detecting mild-to-moderate developmental delay and good sensitivity (0.84) for severe delays (Table 3). Cognitive and Fine Motor domains demonstrated acceptable to good sensitivity for mild-to-moderate delays (0.71 and 0.73) and severe delays (0.80 and 0.79).

Table 3
www.frontiersin.org

Table 3. Screening performance of TOY8 for mild-to-moderate delays and severe delays.

At the individual level, children with positive TOY8 results were 2.32 times more likely to experience delay. The Speech-Language domain showed the strongest discriminative power, with LR+ values of 5.34 and 7.19 for mild-to-moderate and severe delays, respectively, indicating a moderately high probability of true delay. Predictive and negative predictive values were not reported as they depend on prevalence. ROC analysis (Figure 1) showed excellent discrimination for the Cognitive, Speech-Language, and Fine Motor domains (AUC 0.82–0.84). Personal-Adaptive Behavior and Gross Motor were acceptable (AUC 0.74 and 0.71). For severe delays, AUCs were similar or slightly higher. Notably, TOY8 showed the strongest correspondence with the Griffiths III in the Cognitive, Speech-Language, and Fine Motor domains, which align closely with the Griffiths III Foundations of Learning, Language and Communication, and Eye-Hand Coordination subscales.

Figure 1
ROC curve displaying the sensitivity and specificity of various developmental subscales. Five lines represent Cognitive, Speech and Language, Fine Motor, Personal/Adaptive, and Gross Motor with respective area under the curve values and confidence intervals.

Figure 1. Receiver operating characteristics curve by subscale with their corresponding AUC and 95% CI for mild-to-severe delay.

While an initial threshold of 40 was used, ROC analysis indicated that optimal cut-offs varied by domain (44–50). These refinements improved accuracy by reducing false negatives (missed cases) and false positives (unnecessary referrals). Optimal domain-specific cut-offs and corresponding sensitivity/specificity are presented in Table 4.

Table 4
www.frontiersin.org

Table 4. Optimal cut-off for TOY8 screening tool subscales and the corresponding sensitivity, specificity and accuracy.

Parental perceptions of TOY8 were highly positive. Most rated it easy to use [34.1% (n = 44) very easy; 64.3% (n = 83) easy] and useful for detecting developmental issues [50.4% (n = 65) very useful; 48.8% (n = 63) useful]. Acceptability was high, with 96.9% agreeing that TOY8 was suitable for screening. Only two parents (1.6%) found it difficult, one (0.8%) not useful, and four (3.1%) did not find it acceptable. Perceptions did not differ significantly by parent education, gender, or ethnicity (p > 0.05).

Discussion

This study presents the first validation of a digital developmental screening tool in Malaysia, demonstrating that TOY8 can accurately identify delays among children aged 3–5 years. TOY8 performed exceptionally well in detecting cognitive, speech-language, and fine motor delays, with AUCs ranging from 0.82 to 0.84. These domains are crucial because deficits in cognitive and speech-language development strongly predict later academic achievement and social adjustment. At the same time, fine motor skills underpin school readiness tasks such as writing, drawing, and self-care. Not an unexpected yet important finding was that ten children without a history of delay were later identified with developmental disorders on Griffiths III. Although the study was not designed to estimate the prevalence of undetected difficulties, this observation underscores the shortcomings of current Malaysian surveillance practices, which often end after 18 months. TOY8 may therefore be useful in identifying subtle or previously unrecognized concerns in preschoolers, enabling earlier referral and intervention.

Language delay is one of the most common developmental concerns, affecting up to 20% of preschoolers, and even higher proportions in disadvantaged groups. (2628) Persistent language impairment is associated with long-term academic difficulties, including reduced literacy, comprehension, and increased grade repetition (2931). Similarly, fine motor delays contribute to challenges in handwriting, mathematics, and self-care, which impair school readiness (32, 33). Conversely, higher levels of executive function and fine motor ability predict stronger kindergarten performance (34). Thus, a tool like TOY8, sensitive to these critical domains, addresses an important gap in early identification.

TOY8 appeared to be less sensitive in detecting gross motor and personal-social delays, though specificity remained high (0.82–0.84). Several factors explain this. First, the study excluded children with severe developmental disabilities, including those with significant physical limitations or hyperactivity, leading to more conservative estimates of accuracy in these domains. Second, gross motor delays in preschoolers are often subtle, involving coordination or motor planning rather than fundamental motor skills, and are less detectable through parent report than through professional observation (35). To enhance detection, future versions of TOY8 could integrate brief video-based motion-capture modules or combine screening with direct observational assessments such as the Test of Gross Motor Development, or teacher- or clinician-administered tools like the Movement Assessment Battery for Children—Second Edition (Movement ABC-2) Checklist and the Developmental Coordination Disorder Questionnaire (DCDQ). These complementary approaches emphasize the importance of multimethod assessment to identify and describe motor difficulties in young children, although each tool has its own practical limitations (35, 36). Likewise, personal-social skills are complex, context-dependent, and difficult to capture solely via parental questionnaires. The limitations of parent-reported tools are well-documented: they exhibit variable sensitivity, can diverge from direct assessments, and are influenced by caregiver perceptions (20, 37, 38). Incorporating refined cut-offs for both domains, or supplementing TOY8 with additional validated social-behavioral questionnaires, may strengthen detection in this area while maintaining feasibility for community-based use. This reflects broader evidence that the real-world accuracy of screening tools remains variable across settings and requires ongoing adaptation (39). Despite these limitations, TOY8 and Griffiths III assess overlapping constructs, with TOY8's digital, play-based design providing a scalable alternative for community-based screening.

Malaysia's multilingual context adds complexity. Most children are exposed to more than one language at home and preschool, making monolingual development uncommon. For example, it is not unusual for a Malay family that uses Malay dialects at home, and children are exposed to English and formal Malay language when they attend preschool. Similarly, for Chinese or Iban children, they might use the Chinese language (or one of the Chinese dialects) or Iban (or another Sarawak native language), respectively, at home. When the children attend preschool, they would all be exposed to English and the national language (i.e., Malay). Such diversity can influence both performance and parental reporting. This study did not systematically record children's primary and secondary languages, which is a limitation. Although some inferences can be drawn from speech-language domain performance, they do not fully capture the complexities of the multilingual environment. Future studies should examine the language environment in detail, though TOY8's bilingual design (Malay and English) already reflects local realities.

ROC analysis underscored the importance of domain-specific cut-offs rather than a single threshold. While an initial cut-off score of 40 was proposed, optimal thresholds varied by domain (44–50), improving sensitivity without major loss of specificity. Since the costs of missed cases generally outweigh those of false positives, higher sensitivity is desirable in screening contexts. These refined cut-offs provide a practical, evidence-based guide for referrals.

Parents rated TOY8 very positively, reporting high ease of use, usefulness, and acceptability. These findings are important, as parental acceptance influences both uptake and sustainability of screening programs, increasing the chances of timely detection and intervention. Positive parental engagement also supports collaboration with teachers and healthcare providers, enhancing developmental support networks for children.

Beyond its clinical implications, TOY8 has the potential to contribute to global child health and equity agendas (10). It supports SDG 3 by ensuring early detection of developmental delays and promoting well-being from the preschool years (11). Its scalability and bilingual design also address SDG 10 by reducing inequalities in access to developmental screening across diverse communities (11). Given the close links between child development, nutrition, and educational attainment, TOY8 indirectly supports SDG 2, which aims to end hunger and promote the growth and development of children (11).

The study has several strengths. Its case-control design with age- and sex-matching, use of Griffiths III as a gold standard, and bilingual format ensured methodological rigor and cultural relevance. Together, these features enhance the robustness of the findings and support the tool's cultural adaptability within Malaysia. TOY8's strong performance in speech-language, cognition, and fine motor detection addresses the critical preschool window highlighted by international guidelines. The American Academy of Pediatrics emphasizes continued surveillance between ages 3 and 5, when conditions like language disorder, autism, ADHD, and learning disabilities often emerge (3). TOY8's feasibility in community and preschool settings offers a solution to this widely recognized gap.

However, the study sample was confined to Sarawak and overrepresented Chinese participants in the control group, which may affect generalizability. Children with severe motor or sensory impairments were excluded, which restricted the applicability to this subgroup, although such children are typically identified earlier during routine MCH visits. The lack of systematic language environment data also limited the interpretation of language outcomes. Future work should validate TOY8 across different Malaysian regions and ethnic groups, in longitudinal cohorts, and in real-world preschool settings.

Despite these limitations, TOY8 shows important advantages. It demonstrated accuracy in core developmental domains, required minimal training, and was time-efficient. These features make it scalable in community and preschool settings where specialists are scarce. An additional strength is TOY8's integration with a digital intervention program, offering immediate, customized support for parents and teachers. This positions TOY8 not only as a screening tool but also as part of a broader digital ecosystem for early identification and intervention (8)..

Implications for policy and practice

TOY8 has significant potential for integration into early childhood and public health systems. Its short administration time and ease of use by trained non-specialists enable screening to take place in preschools and community health services, where children naturally spend time, extending reach beyond hospital-based care. Embedding TOY8 within national maternal and child health programs and preschool policies could strengthen developmental surveillance during the critical 3- to 5-year gap before school entry, with bilingual availability enhancing equity across Malaysia's diverse communities (3). Earlier and more efficient identification of at-risk children can reduce waiting times and optimize the use of scarce specialist resources. Importantly, TOY8 is designed to complement, not replace, comprehensive assessment tools such as the Griffiths III, serving as a scalable first-line screener to prioritize children for detailed evaluation. As a smartphone-based tool, it can integrate into digital health ecosystems, facilitating longitudinal monitoring, data aggregation, and evidence-informed planning. Finally, the domain-specific cut-offs identified in this study provide refined referral thresholds that balance sensitivity with health system capacity.

Future research directions

This study provides strong initial validation of TOY8, but further work is needed. Longitudinal studies should assess whether children identified through TOY8 benefit from earlier referral and intervention, and whether these translate into improved school readiness and long-term outcomes. The tool's digital design also allows integration with personalized learning and parent-guided activities. Ultimately, research on international adaptation, particularly in LMIC, could expand its role as a culturally accessible and adaptable tool for developmental screening.

Conclusion

TOY8 is the first digital developmental screening tool formally validated in Malaysia, demonstrating good performance in detecting cognitive, speech-language, and fine motor delays—domains most predictive of later outcomes. Its short administration time, ease of use, and high parental acceptability support feasibility for community deployment. Although refinement is needed for the gross motor and personal-adaptive domains, TOY8 represents a culturally adapted and scalable solution.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Medical Research & Ethics Committee (MREC). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.

Author contributions

THT: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Visualization, Writing – original draft, Writing – review & editing. YL: Formal analysis, Writing – original draft, Writing – review & editing. JL: Data curation, Formal analysis, Writing – review & editing. WNC: Project administration, Writing – review & editing. ZYL: Project administration, Writing – review & editing. KD: Project administration, Writing – review & editing, Resources. SCC: Project administration, Writing – review & editing. SS: Writing – review & editing. WL: Writing – review & editing, Methodology, Supervision. ASHSS: Supervision, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This study was sponsored by Toybox Creations & Technology Sdn. Bhd. (1290599-A), with support from the Institutes of Clinical Research, the National Institute of Health Malaysia, and the Clinical Research Centre, Sibu Hospital. The funders had no role in study design, data analysis and interpretation of the results.

Acknowledgements

We thank the Director General of Health Malaysia for his permission to publish this paper. We would also like to thank the rest of the DADOC-TOY8 Study Team for their assistance during the study, namely local team members in Sibu and Kapit (Nur Alfreena Alfie, Siew-Ming Ting, Christina Connie Anak Agong, Macdonna Anak Augustine, Dr Vicky Wai-Kay Ng, Dr Kat-Siong Pau, Jennifer Saleh, Dr Justina Yih-Yann Ting, Pei-See Yong, Ting-Ting Yong), those from the Toybox Creations & Technology Sdn. Bhd., Malaysia (Masaki Ishibashi, Shun Matsuzaka, Maika Muraguchi), and Early Childhood Education Students from Sibu Methodist Pilley Institute (Suman Priya Devi, Stephanie Ya-Ting Then, Luoy-Ya Wong). We also thank the parents (and children) for their willingness to participate in this study, as well as the clinic staff/school teachers (especially those from Sibu Fu Yuan Kindergarten and Woodlands International School) who helped identify participants.

Conflict of interest

THT and ASHSS have been involved in research on TOY8 and other digital health or developmental screening tools, and have received funding.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fped.2025.1706162/full#supplementary-material

References

1. Bellman M, Byrne O, Sege R. Developmental assessment of children. Br Med J. (2013) 346:e8687. doi: 10.1136/bmj.e8687

PubMed Abstract | Crossref Full Text | Google Scholar

2. Vasudevan P, Suri M. A clinical approach to developmental delay and intellectual disability. Clin Med (Lond). (2017) 17(6):558–61. doi: 10.7861/clinmedicine.17-6-558

PubMed Abstract | Crossref Full Text | Google Scholar

3. Lipkin PH, Macias MM, Council on Children with Disabilities, Section on Developmental and Behavioral Pediatrics. Promoting optimal development: identifying infants and young children with developmental disorders through developmental surveillance and screening. Pediatrics. (2020) 145(1):e20193449. doi: 10.1542/peds.2019-3449

PubMed Abstract | Crossref Full Text | Google Scholar

4. Kim S. Worldwide national intervention of developmental screening programs in infant and early childhood. Clin Exp Pediatr. (2022) 65(1):10–20. doi: 10.3345/cep.2021.00248

PubMed Abstract | Crossref Full Text | Google Scholar

5. Demirci A, Kartal M. The prevalence of developmental delay among children aged 3–60 months in izmir, Turkey. Child Care Health Dev. (2016) 42(2):213–9. doi: 10.1111/cch.12289

PubMed Abstract | Crossref Full Text | Google Scholar

6. Amar-Singh HSS, Mohamad II, Selva KS, Zulkifli I, CodeBlue. Universal Screening for Developmental Disabilities for Children in Malaysia—Malaysian Paediatric Association. Kuala Lumpur: Galen Centre, Social Health Analytics Sdn. Bhd. (2025). Available online at: https://codeblue.galencentre.org/2025/02/universal-screening-for-developmental-disabilities-for-children-in-malaysia-malaysian-paediatric-association/ (Accessed September 1, 2025).

Google Scholar

7. Faruk T, King C, Muhit M, Islam MK, Jahan I, Baset KU, et al. Screening tools for early identification of children with developmental delay in low- and middle-income countries: a systematic review. BMJ Open. (2020) 10(11):e038182. Erratum in: BMJ Open. 2021 November 16;11(11):e038182corr1. doi: 10.1136/bmjopen-2020-038182corr1. doi: 10.1136/bmjopen-2020-038182

PubMed Abstract | Crossref Full Text | Google Scholar

8. Toybox Creations and Technology. Screening for Educators. Kuala Lumpur: TOY EIGHT (2022). Available online at: https://www.toyeight.com/screening/educators (Accessed August 31, 2025).

Google Scholar

9. Wo SW, Alagappar PN, Yahya AN, Woo PJ. Validation of the English version of the TOY8 developmental screening tool: examining measurement invariance across languages, gender and income groups. BMC Psychol. (2025) 13(1):214. Erratum in: BMC Psychol. 2025 May 19;13(1):521. doi: 10.1186/s40359-025-02845-3. doi: 10.1186/s40359-025-02489-3

PubMed Abstract | Crossref Full Text | Google Scholar

10. United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development. New York: United Nations (2015). Available online at: https://sdgs.un.org/2030agenda (Accessed September 1, 2025).

Google Scholar

11. United Nations. The Sustainable Development Goals. New York, NY: United Nations (2015). Available online at: https://sdgs.un.org/goals (Accessed September 1, 2025).

Google Scholar

12. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. (2013) 4(2):627–35.24009950

PubMed Abstract | Google Scholar

13. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Fam Pract. (2004) 21(1):4–10. doi: 10.1093/fampra/cmh103

PubMed Abstract | Crossref Full Text | Google Scholar

14. Green E, Stroud L, O’Connell R, Bloomfield S, Cronje J, Foxcroft C, et al. Griffiths Scales of Child Development, 3rd Ed. Part II: Administration and Scoring. Oxford: Hogrefe (2016).

Google Scholar

15. Gee M, Gee K. Association for Research in Infants and Children Development. Manchester: Griffiths III (2025).

Google Scholar

16. Chan WN, Lee JSY, Dahian K, Toh TH, DADOC—TOY8 Study Team. Factors influencing the duration of griffith III assessment. Malays J Paediatr Child Health. (2025) 31(S1):85–91. doi: 10.51407/mjpch.v31iS1.354

Crossref Full Text | Google Scholar

17. Alfie NA, Lee JSY, Ting SM, Toh TH, DADOC—TOY8 Study Team. Inter-rater reliability of Griffiths III in a multi-lingual community. Malays J Paediatr Child Health. (2025) 31(S1):25–30. doi: 10.51407/mjpch.v31iS1.359

Crossref Full Text | Google Scholar

18. Buderer NM. Statistical methodology: i. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. Acad Emerg Med. (1996) 3(9):895–900. doi: 10.1111/j.1553-2712.1996.tb03538.x

PubMed Abstract | Crossref Full Text | Google Scholar

19. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing (2024). Available online at: https://www.R-project.org/?utm_source=chatgpt.com (Accessed August 31, 2025).

Google Scholar

20. Glascoe FP. Screening for developmental and behavioral problems. Ment Retard Dev Disabil Res Rev. (2005) 11(3):173–9. doi: 10.1002/mrdd.20068

PubMed Abstract | Crossref Full Text | Google Scholar

21. Edman EW, Runge TJ. Sensitivity, Specificity, LR+, and LR−: What Are They and How Do You Compute Them? Indiana, PA: Indiana University of Pennsylvania (2014).

Google Scholar

22. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. (2010) 5(9):1315–6. doi: 10.1097/JTO.0b013e3181ec173d

PubMed Abstract | Crossref Full Text | Google Scholar

23. White N, Parsons R, Collins G, Barnett A. Evidence of questionable research practices in clinical prediction models. BMC Med. (2023) 21(1):339. doi: 10.1186/s12916-023-03048-6

PubMed Abstract | Crossref Full Text | Google Scholar

24. Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. (2022) 75(1):25–36. doi: 10.4097/kja.21209

PubMed Abstract | Crossref Full Text | Google Scholar

25. Florkowski CM. Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev. (2008) 29(Suppl 1):S83–7.18852864

PubMed Abstract | Google Scholar

26. Horwitz SM, Irwin JR, Briggs-Gowan MJ, Bosson Heenan JM, Mendoza J, Carter AS. Language delay in a community cohort of young children. J Am Acad Child Adolesc Psychiatry. (2003) 42(8):932–40. doi: 10.1097/01.CHI.0000046889.27264.5E

PubMed Abstract | Crossref Full Text | Google Scholar

27. Feldman HM, Messick C. Chapter 13: language and speech disorders. In: Wolraich ML, Drotar DD, Dworkin PH, Perrin EC, editors. Language and Speech Disorders in Developmental-Behavioural Pediatrics Evidence and Practice. Philadelphia, PA: Mosby Elsevier (2008). p. 467–82.

Google Scholar

28. Coplan J. Normal speech and language development: an overview. Pediatr Rev. (1995) 16(3):91–100. doi: 10.1542/pir.16-3-91

PubMed Abstract | Crossref Full Text | Google Scholar

29. Lara-Díaz MF, Mateus-Moreno A, Beltrán-Rojas JC. Reading and oral language skills in children with developmental language disorder: influence of socioeconomic, educational, and family variables. Front Psychol. (2021) 12:718988. doi: 10.3389/fpsyg.2021.718988

PubMed Abstract | Crossref Full Text | Google Scholar

30. Hall NE, Segarra VR. Predicting academic performance in children with language impairment: the role of parent report. J Commun Disord. (2007) 40(1):82–95. doi: 10.1016/j.jcomdis.2006.06.001

PubMed Abstract | Crossref Full Text | Google Scholar

31. Catts HW, Bridges MS, Little TD, Tomblin JB. Reading Achievement growth in children with language impairments. J Speech Lang Hear Res. (2008) 51(6):1569–79. doi: 10.1044/1092-4388(2008/07-0259)

PubMed Abstract | Crossref Full Text | Google Scholar

32. Li Y, Wu X, Ye D, Zuo J, Liu L. Research progress on the relationship between fine motor skills and academic ability in children: a systematic review and meta-analysis. Front Sports Act Living. (2025) 6:1386967. doi: 10.3389/fspor.2024.1386967

PubMed Abstract | Crossref Full Text | Google Scholar

33. Grissmer D, Grimm KJ, Aiyer SM, Murrah WM, Steele JS. Fine motor skills and early comprehension of the world: two new school readiness indicators. Dev Psychol. (2010) 46(5):1008–17. doi: 10.1037/a0020104

PubMed Abstract | Crossref Full Text | Google Scholar

34. Cameron CE, Brock LL, Murrah WM, Bell LH, Worzalla SL, Grissmer D, et al. Fine motor skills and executive function both contribute to kindergarten achievement. Child Dev. (2012) 83(4):1229–44. doi: 10.1111/j.1467-8624.2012.01768.x

PubMed Abstract | Crossref Full Text | Google Scholar

35. Ke L, Barnett AL, Wang Y, Duan W, Hua J, Du W. Discrepancies between parent and teacher reports of motor competence in 5–10-year-old children with and without suspected developmental coordination disorder. Children (Basel). (2021) 8(11):1028. doi: 10.3390/children8111028

PubMed Abstract | Crossref Full Text | Google Scholar

36. Eddy LH, Bingham DD, Crossley KL, Shahid NF, Ellingham-Khan M, Otteslev A, et al. The validity and reliability of observational assessment tools available to measure fundamental movement skills in school-age children: a systematic review. PLoS One. (2020) 15(8):e0237919. doi: 10.1371/journal.pone.0237919

PubMed Abstract | Crossref Full Text | Google Scholar

37. Sheldrick RC, Marakovitz S, Garfinkel D, Carter AS, Perrin EC. Comparative accuracy of developmental screening questionnaires. JAMA Pediatr. (2020) 174(4):366–74. doi: 10.1001/jamapediatrics.2019.6000

PubMed Abstract | Crossref Full Text | Google Scholar

38. Schonhaut L, Maturana A, Cepeda O, Serón P. Predictive validity of developmental screening questionnaires for identifying children with later cognitive or educational difficulties: a systematic review. Front Pediatr. (2021) 9:698549. doi: 10.3389/fped.2021.698549

PubMed Abstract | Crossref Full Text | Google Scholar

39. Rah SS, Jung M, Lee K, Kang H, Jang S, Park J, et al. Systematic review and meta-analysis: real-world accuracy of children’s Developmental screening tests. J Am Acad Child Adolesc Psychiatry. (2023) 62(10):1095–109. doi: 10.1016/j.jaac.2022.12.014

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: screening, digital tool, developmental delay, preschool children, TOY8, Malaysia

Citation: Toh T-H, Lim YM-F, Lee JS-Y, Chan W-N, Low Z-Y, Dahian K, Chew S-C, Sivasampu S, Law WK-B and HSS A-S (2025) Case-control accuracy study for TOY8 digital developmental screening tool for detecting delays among children aged 3–5 years. Front. Pediatr. 13:1706162. doi: 10.3389/fped.2025.1706162

Received: 15 September 2025; Accepted: 31 October 2025;
Published: 2 December 2025.

Edited by:

Supriya Bhavnani, Sangath, India

Reviewed by:

Francesca Felicia Operto, University of Salerno, Italy
Jieun Jeong, Catholic University of Daegu, Republic of Korea

Copyright: © 2025 Toh, Lim, Lee, Chan, Low, Dahian, Chew, Sivasampu, Law and HSS. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Teck-Hock Toh, dG9odGhAbW9oLmdvdi5teQ==

These authors have contributed equally to this work and share first authorship

These authors have contributed equally to this work and share last authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.