A predictive model for stunting among children under the age of three

Background In light of the global effort to eradicate stunting in childhood, the objective of this research endeavor was to assess the prevalence of stunting and associated factors, simultaneously construct and validate a risk prediction model for stunting among children under the age of three in Shenzhen, China. Methods Using the stratified random sampling method, we selected 9,581 children under the age of three for research and analysis. The dataset underwent a random allocation into training and validation sets, adhering to a 8:2 split ratio. Within the training set, a combined approach of LASSO regression analysis and binary logistic regression analysis was implemented to identify and select the predictive variables for the model. Subsequently, model construction was conducted in the training set, encompassing model evaluation, visualization, and internal validation procedures. Finally, to assess the model's generalizability, external validation was performed using the validation set. Results A total of 684 (7.14%) had phenotypes of stunt. Utilizing a combined approach of LASSO regression and logistic regression, key predictors of stunting among children under three years of age were identified, including sex, age in months, mother's education, father's age, birth order, feeding patterns, delivery mode, average daily parent-child reading time, average time spent in child-parent interactions, and average daily outdoor time. These variables were subsequently employed to develop a comprehensive prediction model for childhood stunting. A nomogram model was constructed based on these factors, demonstrating excellent consistency and accuracy. Calibration curves validated the agreement between the nomogram predictions and actual observations. Furthermore, ROC and DCA analyses indicated the strong predictive performance of the nomograms. Conclusions The developed model for forecasting stunt risk, which integrates a spectrum of variables. This analytical framework presents actionable intelligence to medical professionals, laying down a foundational framework and a pivot for the conception and execution of preemptive strategies and therapeutic interventions.


Introduction
Stunting is conceptualized as a protracted insufficiency of adequate nutritional intake in pediatric populations, thereby impeding linear growth trajectories and culminating in cumulative growth deficiencies (1-3).The initial 1,000 days are a pivotal phase in the early growth and development of children, often resulting in enduring adverse outcomes for those who fail to achieve their full developmental capabilities (4).The prompt identification of developmental disorders holds paramount significance for the welfare of children and their respective families.
The growth and development of children has been investigated and studied in many countries.As per the latest estimates reported in the 2017 Lancet Series, approximately 249 million children (43%) under the age of 5 in low-and middle-income countries (LMICs) faced a heightened risk of suboptimal development in 2010, attributed primarily to stunting or extreme poverty exposure, with the highest concentration observed in South Asia and sub-Saharan Africa (5).A search assessed the nutritional condition of 293 infants and toddlers aged between 0 and 24 months in the Ecuadorian highlands, discovering that stunting was present in 56.2% of the children (6).Furthermore, a comprehensive study revealed that 19.5% of Argentinean children exhibited a predisposition to neurodevelopmental disorders (7).Another study encompassed 330,613 children across 63 nations.Collectively, for all the countries studied, approximately 25% of the children were suspected of having stunt.Regionally, the prevalence spanned from 10% in Europe and Central Asia to a noteworthy 42% in West and Central Africa (8).A prospective birth cohort analysis in Shanghai, China revealed that the overall prevalence of suspected developmental delay at 2, 6, and 12 months of age was 15.6%, 15.8%, and 12.6% respectively (9).An additional investigation disclosed that 14.6% of children exhibited suboptimal scores on the Early Childhood Development Index (ECDI) in the cognitive domain.Furthermore, a positive correlation was observed between these low development scores and the presence of stunting (10).In impoverished regions of China, an alarming 39.7% of children under three years of age are at a heightened risk of developmental delays (11).
To devise effective preventative strategies, it is paramount to gain a comprehensive understanding of the risk factors associated with stunting.Iwayama's research revealed that both advancing maternal and paternal age significantly related to child development and growth in the general populace (12).A comparative analysis of matched groups indicates that being underweight and experiencing a shortened gestation period may contribute to suboptimal weight gain and impaired head growth during infancy (13).Pursuant to the World Health Organization's national assessment of suboptimal infant and young child feeding (IYCF) practices, the duration of breastfeeding is classified as highly satisfactory, whereas the early initiation of breastfeeding and exclusive breastfeeding (EBF) are deemed as satisfactory (14).Animal-sourced foods, which are abundant in essential amino acids, play a crucial role in promoting linear growth and developmental outcomes among young children residing in low-and middle-income countries (15).A research initiative undertaken between 2009 and 2012, encompassing 1,324 children within the age bracket of 0-24 months and domiciled in rural Pakistan, demonstrated that child diet and mother-child interactions improved children's cognitive, language and motor development (16).White matter hyperintensity (WMH) in patients with cerebral small vessel disease (CSVD) is strongly associated with cognitive impairment (17).
The Bayley Scales of Infant Development (BSID) has been a long-standing and extensively utilized instrument for the identification and diagnostic assessment of neurodevelopmental delays among infants.The Bayley-III assessment has attained widespread recognition as the "gold standard" in diagnostic evaluations for early childhood development (ECD) (18).However, BSID is impractical for routine screenings in lowresource settings and must be performed by a trained professional (19)(20)(21).The necessity to track the development of large cohorts of children necessitates a reliable yet cost-effective method of assessment.Individual examinations by professionals for each child would be prohibitively expensive.A viable alternative is to directly inquire parents regarding their infant's behavioral patterns.The Parents' Evaluation of Developmental Status (PEDS), the Child Development Inventories (CDI), and the Ages and Stages Questionnaires (ASQ) have been highly regarded by the American Academy of Pediatrics as outstanding instruments, exhibiting excellent psychometric properties (22).The ASQ is an inexpensive instrument that necessitates an approximate 15-minute time investment for its screening and assessment (23).Furthermore, ASQ has been translated into 16 languages in low-and middle-income countries (LMICs), with at least 23 LMICs having used the questionnaire (24).Many studies have formally addressed the reliability and application of the Ages and Stages Questionnaires 3rd edition (ASQ-3) (25-28).What's more, the United Nations International Children's Emergency Fund (UNICEF) has endorsed the utilization of the ASQ-3, as a means to verify whether children are experiencing normal neurological development (7).
Although there are a number of studies exploring the associated factors of stunting, they all have some limitations.For example, the sample size is mostly concentrated below 2,000, and the data collection area is also relatively concentrated, which may limit their representativeness of the sampling (29,30).Additionally, few studies have deeply explored the relationship between gestational diseases and developmental delay.
Against this backdrop, the present study comprehensively identified risk factors for stunting, thereby establishing a predictive model with high accuracy.Additionally, it aimed to assess the prevalence of stunting and its associated factors among children under three years of age in Shenzhen, China (31).Nomograms, serving as efficient and precise assessment tools, have the potential to assist clinical medical personnel in objectively identifying children under three years of age who are vulnerable to growth stunting.

Sample selection
Participants for the study comprised 9,581 infants and toddlers (0-3 years of age) who received health checkups at the Child Health Clinic of Shenzhen Social Health Center between August 2021 and June 2022.Screening is performed on children and completed by parents or primary caregivers to identify children at risk for developmental delay.The questionnaire investigators were all child health care doctors who were professionally trained and passed the assessment.

Inclusion and exclusion criteria
The following are the criteria for inclusion: (1) infants and toddlers must be between 0 and 36 months; (2) voluntary participation and informed consent; (3) the primary caregiver must be present at the assessment site concurrently with the children.The exclusion criteria include: (1) infants and young children with developmental delays or serious congenital hereditary diseases; (2) parents (caregivers) suffering from intellectual disability or emotional problems; (3) children older than 36 months or refuse to sign the informed consent; and (4) non-primary caregivers present at the assessment site.Based on 9,805 infants under the age of three, 212 individuals with missing main information and 8 individuals with repeated measurements were excluded, and 9,581 individuals meeting the inclusion and exclusion criteria were selected to participate in the analysis.

Outcome variable
The stunt of infants and toddler children were screened and assessed by ASQ-3.ASQ-3 is extensively utilized on a global basis for comprehensive developmental screening of children aged 1-66 months scale (32,33), and it is the most validated and recommended by the UNICEF to determine whether children have normal neurological development (7).A study has substantiated the high reliability of the ASQ, thereby establishing its efficacy as a screening method for developmental delays (33).
The evaluation results of ASQ-3 are divided into: (1) above the threshold, the total score >2 x-1 s, indicating that the child is developing normally; (2) near the threshold, the total score is 2 x−2 s∼2 x−1 s, indicating that the child may need additional help in one or more areas, but do not show obvious abnormalities and require further monitoring; (3) below the threshold, the total score is <2 x−2 s, indicating an abnormal development of the child.The ASQ results in one or more areas below the threshold indicate developmental abnormalities, while those above and near the threshold are normal.

Variables
Collect information about infants and their parents through the basic information table, mainly include: (1) Basic information of children: age and gender; (2) Birth status: gestational week, mode of delivery, birth weight and birth order; (3) Feeding situation: feeding mode within six months of age (Feeding methods of infants within 6 months of age), these include exclusive breastfeeding (only the mother's milk without any other dairy or animal milk), mixed feeding (breastfeeding along with milk powder), and bottle-feeding (only milk powder).Complementary food (Food other than breast and milk powder in infancy) and colostrum feeding (Breast milk secreted by the mother within 2-3 days after delivery); (4) Parental information: parental education level, parental childbearing age, maternal employment status, occupation and maternal health status during pregnancy (gestational diabetes mellitus, hypertension during pregnancy, anemia during pregnancy, bacterial vaginitis, placenta previa, prenatal depression); (5) Family social and economic status: family type, housing area and district; (6) Others: average parent-child reading time, average daily time of interaction with other children, and average time of outdoor exercise per day.

Data analysis
To balance the risk of overfitting and underfitting, while ensuring that the model can fully learn the data features during the training stage, the partition ratio of the dataset was selected by 8:2.The dataset was randomly partitioned into training and validation sets, with a 8:2 ratio, to ensure appropriate model generalization.The training set facilitated the selection of salient features and the subsequent development of the predictive model.Conversely, the validation set served to evaluate the performance of the trained model.Categorical variables were presented in terms of frequency (percentage), and the chi-square test was employed to compare variations between distinct groups.To address potential collinearity among candidate variables, LASSO regression models were implemented, enabling the identification of optimal predictor variables.Subsequently, a logistic regression analysis was executed to ascertain the odds ratios and their respective 95% confidence intervals in both univariate and multivariate contexts.
In the current investigation, the discriminative capacity of the model was assessed using the area under the receiver operating characteristic curve (AUROC).Additionally, a calibration curve was employed to quantify the concordance between predicted probabilities and actual observations.To evaluate the clinical validity of the model, decision curve analysis (DCA) was performed.All data were processed utilizing the R software package (version 4.3.1).All statistical tests were conducted with two-tailed significance, and a P-value threshold of <0.05 was considered statistically significant.

Descriptive analysis
The cross-sectional analysis included 9,581 respondents, with average age of 13.42 ± 9.16 months (Mean ± SD) and a 43.7% female proportion.The majority of sample' birth weights were within the normal range (93.2%).Additionally, 49.2% of the infants were exclusively breastfed until 6 months of age.More than half of infants were first-born (55.5%).When considering family factors, it is noteworthy that over half of mothers (56.7%) possess a college education.Concerning social and environmental factors, 11.8% of respondents read more than half an hour per day, while 19.8% spent more than 2 h outdoors per day (Table 1).The selection process of subjects is shown in Figure 1.

Prevalence and associated variables of stunting
The prevalence of delayed growth and development was found to be 7.14% in this study.The 18 variables, including the gender, age in months, maternal education, health status during pregnancy, father age, birth order, delivery mode, feeding mode, average parent-child reading time and average outdoor time per day, average interaction time per day with other children, average outdoor time per day,

LASSO logistic regression
To reduce noise, variables that proved to be insignificant during the initial univariate analysis were excluded from the LASSO regression.This research utilized a LASSO regression model to pinpoint potential predictors of stunt (Figures 2A,B).Following this, the identified factors linked to stunt were integrated into a logistic regression model.In the end, it was determined that elements like gender, age in months, health status of children, birth order, feeding patterns, average parent-child reading time per day, average interaction time per day with other children and average outdoor time per day were correlated with growth retardation (Table 2).

Developing predictive models
By implementing a tenfold cross-validation strategy, the Least Absolute Shrinkage and Selection Operator (LASSO) regression technique was leveraged to discern the most influential predictors that constitute the model's architecture.Subsequently, a comprehensive multiple logistic regression analysis was executed to construct the predictive model.An evaluation of the variance inflation factor (VIF) indicated that all variables possessed VIF values beneath the critical threshold of 4, indicating the absence of multicollinearity and a favorable model fit.The model was developed utilizing RStudio, facilitated by the "rms" package, and a nomogram was subsequently created leveraging the "nomogram" function from the same package (Figure 3; Table 3).

Validating predictive models 3.5.1 Assessment of the model-based discriminative ability
The Area Under the Curve (AUC), or area under the ROC curve, serves as a statistical measure to gauge the efficacy of a  The variable filtering process of the Lasso regression.(A) Lasso coefficient profiles of the candidate features; (B) the selection of optimal parameters (lambda) by tenfold cross-validation, in which dotted vertical lines were drawn at the optimal values by using the minimum criteria and limits defined by 1 standard deviation.classification algorithm, highlighting the likelihood that a randomly positive sample will rank higher than a randomly chosen negative sample.This metric is frequently applied to evaluate the performance of machine learning classifiers.In this context, the predictive model's discriminative capacity was appraised by analyzing the occurrence of developmental delays among infants aged 0-3 years in both the training set and the validation set.As illustrated in Figures 4A,B, the AUC for the predictive model, when applied to the training cohort, was calculated to be 0.678 with a 95% confidence interval ranging from 0.6555 to 0.7002.In contrast, the AUC for the validation cohort yielded a slightly higher value of 0.734, accompanied by a 95% confidence interval extending from 0.6892 to 0.7782.The nomogram demonstrates robust discriminative capabilities and predictive accuracy, effectively distinguishing between typical developmental trajectories and growth impediments.

Correcting the predictive model
The model's congruence was evaluated utilizing calibration plots and the Hosmer-Lemeshow goodness-of-fit statistic, where a non-significant p-value (greater than 0.05) indicates a superior fit.The outcomes of the goodness-of-fit test revealed that the model exhibited commendable calibration for the training dataset (χ 2 = 6.2051, df = 8, p = 0.6243) as well as for the validation dataset (χ 2 = 11.794,df = 8, p = 0.1606).The graphical representations of the calibration curves for both the training and validation cohorts, derived from the logistic regression analysis, are delineated in Figures 5A,B, respectively.

Clinical validity assessment
The practical applicability of the model was appraised through the Decision Curve Analysis (DCA) framework, with the graphical depictions presented in Figures 6A, B. The DCA demonstrated that the net benefit accruing from the predictive model within the internal validation group markedly surpassed that of the two null hypotheses scenarios.This finding intimates that the nomogram-based  Proposed nomogram for stunt.

Discussion
This study revealed that the prevalence of stunting among children under the age of three at 7.14% by employing stratified sampling to select 9,581 study participants.In this retrospective study, we developed and valided a nomogram to predict the stunting among children under the age of three.This model has better discrimination ability, clinical applicability and calibration based on AUC, DCA, Calibration plots and the Hosmer-Lemeshow good-ness-of-fit test.Moreover, the logistic regression analysis demonstrated that age, gender, health status, birth order, feeding patterns, average parent-child reading time per day were the best predictors of stunting.
This research demonstrated a correlation between stunting and gender, highlighting a predisposition towards stunt among the male population, thereby corroborating previous findings reported by Hong Zhou and colleagues (34).The rationale behind the conducted analysis potentially stems from the speculation that stunting is associated with genetic inheritance or mutational alterations within genes.Furthermore, our study revealed no significant association between stunt and factors such as birth weight, maternal gestational diabetes, or gestational hypertension.However, contrasting research has indicated that low birth weight, maternal gestational hypertension, premature rupture of membranes (PROM), and smoking during pregnancy are all contributory factors that elevate the risk of neurodevelopmental impairments (NIs).Conversely, EBF and a higher socioeconomic status appear to be protective factors that mitigate this risk (35).New evidence has emerged, suggesting that antenatal depression, or maternal depression experienced during pregnancy, holds profound implications not only for the    mother's own well-being but also for the emotional and behavioral trajectory of the developing child.This finding underscores the intricate link between maternal mental health and the future psychosocial development of offspring (36).

Risk factors Assignment
Our research has illuminated the significance of feeding patterns as predictors of stunting, with numerous studies corroborating the unequivocal benefits of EBF on child development.A cluster-randomized controlled trial, conducted in the western Kenyan sub-county of Bondo, revealed noteworthy positive correlations between EBF during the 3-to 6-month age window and various aspects of child development, particularly in the domains of communication, gross motor skills, and problemsolving abilities (37).In a study conducted by Wallenborn JT and his team, it was observed that adhering to the World Health Organization's (WHO) recommendations for EBF remains crucial for promoting healthy physical growth and cognitive development, even in environments where complementary foods are readily accessible (38).A comprehensive cohort analysis  conducted in rural South Africa has revealed that EBF is associated with a reduced prevalence of conduct disorders and a modest correlation with enhanced cognitive development among male children (39).A prospective birth cohort study originating from Brazil has unveiled that breastfeeding is positively correlated with superior performance in intelligence assessments administered three decades later, potentially having a significant impact on real-life outcomes, including the augmentation of educational attainment and income during adulthood (40).Furthermore, a Polish Mother and Child Cohort Study failed to identify any significant association between the duration of breastfeeding and child development (41).A study conducted among Chinese toddlers aged 1-3 years revealed a negative correlation between feeding difficulty and their overall health and development (42).For this reason, breastfeeding during infancy is highly emphasized as a pivotal strategy to foster growth and development, even among healthy young children (43).
Concurrently, this study also noted a weak association between the mode of delivery and the incidence of stunting.Prior research has identified certain adverse outcomes associated with cesarean delivery.Specifically, it has been linked to a heightened risk of Attention Deficit Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) (44,45).Despite this, the current research evidence remains inconclusive in determining the precise effect of cesarean section on child health outcomes.For example, certain studies have suggested that cesarean delivery does not markedly increase or decrease the likelihood of suspected developmental delay in children compared to vaginal delivery (46)(47)(48).In addition, this study showed that maternal educational level is also associated with stunt.A study conducted in Uganda revealed that children whose mothers possessed secondary education exhibited lower odds of stunting and underweight compared to children whose mothers had no formal education (49).A comparative cross-sectional study has revealed that the prevalence of stunting and wasting is comparatively lower among children of employed mothers than among those of unemployed women (50).In view of this, the reasons for this difference require more intensive research.
Finally, our study also found that reading time and interacting time with others children was significantly asscociated with the occurrence of stunt, which is consitent with previous research findings.A comprehensive study has revealed a significant positive association between augmented reading time across multiple time intervals and enhanced ASQ-3 scores, encompassing fine motor, gross motor, personal-social, and comprehensive development domains, over a protracted period (26).A prospective longitudinal cohort study in Canada has uncovered that toddlers' exposure to informal play opportunities, reading picture books, and supervision in childcare centers emerges as protective factors mitigating the risk of delayed speech development (51).The results from the China ECD Program have additionally demonstrated that a deficiency in books and toys (52), coupled with inadequate learning activities (53), substantially elevate the risk of developmental delays in children aged 0-35 months residing in rural areas.Furthermore, a longitudinal birth cohort study has revealed that responsive caregiving and learning opportunities serve as protective factors for young children, mitigating the negative impacts of early adversities on their adolescent human capital development (54).
Constructed through regression analysis, the nomogram integrates diverse predictive indicators to visually represent the inter-variable relationships within the predictive model.The visualization is accomplished via the placement of scaled line segments on a unified plane, adhering strictly to a predetermined ratio.This serves as a predictive instrument that forecasts the likelihood of a specific clinical outcome by aggregating the scores attributed to individual predictors, thereby yielding an overall score.The development of a prediction model for stunting among children under three years of age represents a novel contribution of this study.Notably, the nomogram quantitatively translates hazard ratios into scores, facilitating the straightforward calculation of outcomes.This approach enables an individualized risk assessment for each individual, thereby enhancing both relevance and accuracy.
This study had several limitations.Firstly, the employment of a cross-sectional design in this study limited the capacity to establish definitive causal relationships.Secondly, the nomogram devised in this study is tailored specifically to Chinese data, and its applicability to other regions and countries necessitates further determination through external validation.The data originates from the verbal reports of the infants' parents; however, the parents may not be fully aware of their child's actual condition, which could result in discrepancies between the data and the actual situation.
This study had some strengths.Firstly, this study has a large and representative sample size.Secondly, we explored the relationship between gestational diseases and developmental delay.

Conclusion
Our stunt risk prediction model provides a reliable and accurate tool for children under the age of three in Shenzhen, China.This model serves as a valuable asset to clinical practitioners by furnishing a theoretical foundation and a point of departure for the development of preemptive prevention and intervention strategies.

FIGURE 1
FIGURE 1Flowchart of the study.

FIGURE 4 ROC
FIGURE 4 ROC curves of the study's generated nomogram in the study: (A) for the training set; and (B) for the internal validation set.

FIGURE 5
FIGURE 5 Calibration curves of the nomogram in the study: (A) for the training set; and (B) for the internal validation set.

FIGURE 6
FIGURE 6 Decision curve analysis (DCA) for the study's nomogram: (A) for the training set; and (B) for the internal validation set.

TABLE 1
Sample characteristics and prevalence of stunting.

TABLE 2
Results of binary logistic regression analysis.

TABLE 3
Nomogram of relevant factors in the assignment method.