Identifying major predictors for parenting stress in a caregiver of autism spectrum disorder using machine learning models

Introduction Previous studies have investigated predictive factors for parenting stress in caregivers of autism spectrum disorder (ASD) patients using traditional statistical approaches, but their study settings and results were inconsistent. Herein, this study aimed to identify major predictors for parenting stress in this population by developing explainable machine learning models. Methods Study participants were collected from the Department of Child and Adolescent Psychiatry, Severance Hospital, Yonsei University College of Medicine, Seoul, the Republic of Korea between March 2016 and October 2020. A total of 36 model features were used, which include subscales of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) for caregivers’ psychopathology, Social Responsiveness Scale-2 for core symptoms, and Child Behavior Checklist (CBCL) for behavioral problems. Machine learning classifiers [eXtreme Gradient Boosting (XGBoost), random forest (RF), logistic regression, and support vector machine (SVM) classifier] were generated to predict severe total parenting stress and its subscales (parental distress, parent-child dysfunctional interaction, and difficult child). Model performance was assessed by area under the receiver operating curve (AUC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. We utilized the SHapley Additive exPlanations tree explainer to investigate major predictors. Results A total of 496 participants were included [mean age of ASD patients 6.39 (SD 2.24); 413 men (83.3%)]. The best-performing models achieved an AUC of 0.831 (RF model; 95% CI 0.740–0.910) for parental distress, 0.814 (SVM model; 95% CI 0.720–0.896) for parent-child dysfunctional interaction, 0.813 (RF model; 95% CI 0.724–0.891) for difficult child, and 0.862 (RF model; 95% CI 0.783–0.930) for total parenting stress on the test set. For the total parenting stress, ASD patients’ aggressive behavior and anxious/depressed, and caregivers’ depression, social introversion, and psychasthenia were the top 5 leading predictors. Conclusion By using explainable machine learning models (XGBoost and RF), we investigated major predictors for each subscale of the parenting stress index in caregivers of ASD patients. Identified predictors for parenting stress in this population might help alert clinicians whether a caregiver is at a high risk of experiencing severe parenting stress and if so, providing timely interventions, which could eventually improve the treatment outcome for ASD patients.


Introduction
Autism spectrum disorder (ASD) is one of the neurodevelopmental disorders that is characterized by two core symptoms: difficulties with social communication and interaction and the presence of repetitive and restricted behaviors or interests (American Psychiatric Association, 2013).Parents of ASD patients were found to experience greater parenting stress than typically developing individuals and even other disabilities (Hayes and Watson, 2013).It is an important issue because high-level parenting stress is associated with the lower effectiveness of parent-mediated intervention (Osborne et al., 2008).Therefore, helping stressed parents can be beneficial in improving the outcome of treatment for ASD patients.
Numerous studies have explored associated factors for parenting stress in caregivers of ASD patients, and personality traits and mood problems of caregivers (Falk et al., 2014;Leonardi et al., 2021), ASD core symptoms (Miranda et al., 2019;Scibelli et al., 2021), and behavioral problems of ASD patients (Yorke et al., 2018;Miranda et al., 2019;Siu et al., 2019;Scibelli et al., 2021) were found to be significantly linked with parenting stress.Recently, associated factors for each dimension of parenting stress (parental distress, parent-child dysfunctional interaction, and difficult child) were investigated (Mello et al., 2022).However, previous studies have utilized different combinations of study variables and statistical tests, leading to inconsistent results and difficult interpretations (Voliovitch et al., 2021;Mello et al., 2022).Moreover, some studies did not address the study variables at the same level; for example, the total score was used for ASD core symptoms, but subscales for behavioral problems (Siu et al., 2019;Mello et al., 2022).Lastly, none of them have attempted to apply machine learning (ML) methods which offer distinct advantages over traditional approaches since ML can handle multi-dimensional and non-linear relationships (Schwalbe and Wahl, 2020).
Herein, our study aimed at identifying predictive features for parenting stress (parental distress, parent-child dysfunctional interaction, difficult child, and total parenting stress) in caregivers of ASD patients by developing explainable ML models.Additionally, we included only subscales of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) for caregivers' psychopathology (Graham, 1990;Han et al., 2006), Social Responsiveness Scale-2 (SRS-2) for ASD core symptoms (Constantino, 2012;Chun et al., 2021), Child Behavior Checklist (CBCL) for behavioral problems (Han and Yoo, 1995;Achenbach, 1999), and other additional features as model features.We expected that the identified predictive features would help alert clinicians to whether a caregiver is at a high risk of severe parenting stress and provide timely interventions to stressful parents, which would eventually enhance the treatment outcome for ASD patients.

Materials and methods
We followed the STROBE guideline (Supplementary material, pp.6-7) (von Elm et al., 2007).The present study was approved by the Institutional Review Board of the Severance Hospital of Yonsei University, Seoul, the Republic of Korea.Informed consent was waived since we used retrospective and deidentified patient data (IRB number: 4-2022-0803).The overall process of ML models is displayed in Figure 1.

Participants recruitment
Study participants were retrospectively collected from the Department of Child and Adolescent Psychiatry, Severance Hospital, Yonsei University College of Medicine, Seoul, the Republic of Korea, between March 2016 and October 2020.Child and adolescent psychiatrists conducted a semi-structured interview to confirm ASD based on DSM-5.Patients under 19 who were identified as having ASD and their primary caregiver were included.
Patients under the following conditions were excluded: those who did not report SRS-2, CBCL, or MMPI-2; those who had organic brain diseases (e.g., epilepsy, encephalitis, and demyelinating disease); and those who had a comorbid mental disorder (e.g., bipolar and related disorders and schizophrenia spectrum and other psychotic disorders).

Outcome variables
The outcome of interest was "parenting stress" in the primary caregiver of an ASD patient.Parenting stress was assessed by the Parenting Stress Index-Short Form (PSI-SF), which contained 36 items (Abidin, 1990;Lee et al., 2008).A total of four scales (parental distress, parent-child dysfunctional interaction, difficult child, and total parenting stress) were set as outcome variables for prediction.As the models were designed to distinguish those with severe and mildto-moderate levels of parenting stress, we established the threshold for severity at the 80th percentile following the formal documentation (Abidin, 1990).

Model features
Our dataset included a total of 36 variables which are listed in Table 1 as model features.Both SRS-2 and CBCL were rated by a caregiver.We noted that subscales of the CBCL were different by age (1.5-5 versus 6-18); hence we used only common ones (anxious/ depressed, aggressive behavior, attention problems, somatic complaints, withdrawn, and other problems) when using the overall sample.The definitions of each model feature are provided in Supplementary material, pp.8-10.

Data pre-processing
Missing value imputation was performed using k-nearest neighbor imputation with k = 5 for continuous features and mode

Model development
The datasets were randomly partitioned into two groups: a training set (80%) and a test set (20%).To avoid data shifting between two subsets, random data split was stratified with respect to the outcome variable.Four supervised ML classifiers-eXtreme Gradient Boosting (XGBoost), random forest (RF), logistic regression, and support vector machine (SVM) classifier-were generated for each outcome (parental distress, parent-child dysfunctional interaction, difficult child, and total parenting stress), that is, 16 models in total were developed.Hyperparameter optimization was performed by random grid search of 200 different combinations with 10-fold cross-validation (Bergstra and Bengio, 2012).We assessed the model performance with the area under the receiver operating curve (AUC) and selected the best-performing model (i.e., the model that presented the largest AUC).Then, we validated the model with the remaining 20% test set.
We performed the subgroup analyses for different forms of CBCL (1.5-5 and 6-18) and comorbid attention-deficit/hyperactivity disorder (ADHD) status (with and without ADHD).We utilized the SHapley Additive exPlanations (SHAP) tree explainer method for RF and XGBoost classifiers to investigate major predictors (Lundberg et al., 2020).

Statistical analysis
We utilized the t-test for continuous variables (e.g., age, subscales of CBCL, SRS-2, and MMPI-2) and the χ 2 test for categorical variables (e.g., sex, assistant caregiver status, and Overall process of the ML models.AUC, area under the receiver operating curve; LR, logistic regression; ML, machine learning; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.psychotropic medication status) to assess statistical differences of included variables between subgroups.Multicollinearity refers to a condition in which two or more variables show a strong correlation, which can be problematic in some ML models since it hinders the ability of models to distinguish their individual impacts on the dependent variable.We calculated variance inflation factors for each continuous variable in each sample to detect whether multicollinearity exists (Neter et al., 1996).Conventionally, a variance inflation factor greater than 5 is considered indicative of a problematic level of multicollinearity.Model performance was assessed by AUC, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy.The formulas for each metric are displayed in Supplementary material, p. 13.The 95% confidence intervals (CIs) for each estimate were obtained using a bootstrap of 10,000 resamples.Bootstrap is a statistical method that involves drawing multiple random samples with replacements from the original data to create new datasets, allowing us to estimate the uncertainty related to a point estimate.
Statistical analyses were two-tailed, and p < 0.05 was deemed to indicate statistical significance.Statistical analyses were performed using R software (version 4.1.3),and all ML models were implemented using Python (version 3.8.1).

Study dataset
A total of 496 ASD patients and their caregivers were included [mean age of ASD patients 6.39 (standard deviation 2.24); 413 men (83.3%)].Detailed participants' information is displayed in Table 2. None of the included variables showed significant multicollinearity (Supplementary material, p. 14).

Model performance and major predictors
Among the total 496 participants, 396 (80%) were assigned to the training set and 100 (20%) to the test set.Receiver operating characteristic curves for parental distress, parent-child dysfunctional interaction, difficult child, and total parenting stress for the test set are presented in Figure 2.
For predicting total parenting stress, the performance of the RF model (AUC 0.862, 95% CI 0.783-0.930;sensitivity 0.708, 95% CI 0.578-0.833;specificity 0.865, 95% CI 0.764-0.951)was the best on the test set.The top 5 predictors of the RF model were ASD patients' CBCL scores of aggressive behavior and anxious/depressed, and caregivers' MMPI-2 scores of depression, social introversion, and psychasthenia (Table 3 and Figure 3).
Detailed results for each estimate on both training and test sets and the SHAP summary plots for RF and XGBoost are provided in Supplementary material, pp.15-26.

Results of CBCL subgroups
In both subgroups of CBCL 1.5-5 and 6-18, the prediction of parental distress and parent-child dysfunctional interaction was unsuccessful, showing low sensitivity (ranging from 0.000 to 0.679) and high specificity (ranging from 0.606 to 1.000), while the model performances were retained for difficult child and total parenting stress.There seemed to be a difference in the trend of major predictors for total parenting stress between CBCL 1.5-5 and CBCL 6-18.For CBCL 1.5-5, caregivers' MMPI-2 scores including psychasthenia, depression, social introversion, and schizophrenia were given high priority.However, in the case of CBCL 6-18, ASD patients' CBCL scores of aggressive behavior and anxious/depressed were shown to be more critical than the caregivers' MMPI-2 scores (Table 4 and Figure 3).
Detailed results for each estimate on both training and test sets and the SHAP summary plots for RF and XGBoost are displayed in Supplementary material, pp.27-50.

Results of ADHD subgroups
For the sample with ADHD, the performance was only retained for total parenting stress: the RF model showed an AUC of 0.865 (95% CI 0.726-0.969),sensitivity of 0.882 (95% CI 0.706-1.000),and specificity of 0.706 (95% CI 0.474-0.917).Notably, ASD patients' SRS-2 scores of social communication arose in the top 5 predictors of the RF model in this population.For the sample without ADHD, the performance was only retained for difficult child: the RF model showed an AUC of 0.854 (95% CI 0.755-0.933),sensitivity of 0.714 (95% CI 0.538-0.875),and specificity of 0.842 (95% CI 0.714-0.949)(Table 5 and Figure 3).
Detailed results for each estimate on both training and test sets and the SHAP summary plots for RF and XGBoost are displayed in Supplementary material, pp.51-74.

Discussion
We evaluated the ML models predicting severe parenting stress and its components (parental distress, parent-child dysfunctional interaction, and difficult child) in caregivers of ASD patients and investigated major predictors.Our key findings were that our ML models could predict severe parental distress, parent-child dysfunctional interaction, difficult child, and total parenting stress with AUC values greater than 0.80.Moreover, we also identified major predictors for each outcome of interest by utilizing explainable ML models, which provided valuable insights into the underlying factors contributing to severe parenting stress in caregivers of ASD patients.
Parental distress measures a parent's experiences of their role as parents (Abidin, 1990).Among the top 10 predictors for parental distress, seven were associated with the personality traits of caregivers [depression (code 2), schizophrenia (code 8), psychopathy (code 4), psychasthenia (code 7), paranoia (code 6), social introversion (code 0), and hypomania (code 9)], which means that caregivers' perceived hardship related to the role as parents may be determined primarily by their psychopathology.However, an observational study using regression analysis reported that ASD patients' emotional problems (regression coefficient = 0.31) may also play a significant role in parental distress (Mello et al., 2022).Since our main ML models using the overall sample only included common subscales of CBCL between 1.5-5 and 6-18, and thereby scores of emotional problems were excluded, the potential impact of patients' behavioral problems on caregivers' parental distress should not be ignored.Indeed, our subgroup analysis that used CBCL 6-18 sample showed that the patients' behavioral problems were also essential predictors in predicting parental distress in caregivers of ASD patients.Nevertheless, when considering that most previous studies only utilized ASD patient factors as associated/predictive factors for parental distress (Scibelli et al., 2021;Mello et al., 2022), our study provided a new insight into the understanding of parental distress by utilizing explainable ML models with model features related to caregivers' psychopathology.However, interpretation needs caution since MMPI-2 clinical scales should not be independently addressed (Levak et al., 2012).A comprehensive approach for significant MMPI-2 clinical scales might be appropriate.For example, we may expect that a caregiver of 2-4-8 code type (the top 3 predictors for parental distress) would experience substantial difficulties in their role as parents as to the vulnerability to substance abuse, poor impulse control, emotional dysregulation, posttraumatic stress disorder symptoms, thought disorder, and borderline personality disorder, which might represent moderate to severe psychopathology and require major psychiatric interventions (Archer et al., 1995;Bell-Pringle et al., 1997;Donovan et al., 1998).
Parent-child dysfunctional interaction measures parents' feelings about the interaction with their child with ASD (Abidin, 1990).Interestingly, our findings implied that behavioral problems of children (withdrawn, aggressive behavior, and attention problems) contributed more to caregivers' negative feelings on their interaction with the children compared to ASD core symptoms, even though the latter ones seemed to be more directly associated with their interaction.Furthermore, a cross-sectional study observed that the correlation coefficient (r) between total ASD core symptoms and parent-child dysfunctional interaction (0.305) was attenuated when considering its individual subscales.Specifically, the correlations with parent-child dysfunctional interaction were estimated to be 0.184 for reciprocal social interaction, 0.212 for social communication, and 0.288 for repetitive and restricted behaviors (Scibelli et al., 2021).However, a separate study that employed a regression analysis highlighted the significance of ASD core symptoms on parent-child dysfunctional interaction (ASD core symptom severity: regression coefficient = 0.48, p < 0.001) (Mello et al., 2022).This indicates that the impact of ASD core symptoms should not be underestimated and remains a significant concern.In fact, two of the top 5 predictors were related to ASD core symptoms (social communication and cognition) in the subgroup analysis using the CBCL 1.5-5 sample.Taken together, our results suggested that both ASD core symptoms and behavioral problems of ASD patients are important in predicting parent-child dysfunctional interaction.
Difficult child measures a parent's perception of whether the child is easy or difficult to nurture (Abidin, 1990).Our models found that ASD patients' behavioral problems (aggressive behavior, anxious/ depressed, other problems, attention problems, and withdrawn), caregivers' psychopathology [social introversion (code 0) and depression (code 2)], and ASD patients' core symptoms (autistic mannerisms and social cognition) mainly contributed to the caregivers' perception that their child was difficult to care.These results are consistent with the previous studies that have reported Receiver operating characteristic curves on the test set for subscales of parenting stress index (A -parental distress, B -parent-child dysfunctional interaction, C -difficult child, D -total parenting stress).AUC, area under curve; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.(Scibelli et al., 2021).Mello et al. (2022) have also reported significant impacts of ASD core symptoms (regression coefficient = 0.46) and aggressive behavior (regression coefficient = 0.36) on difficult child.Together with our findings, this suggested that the severity of core symptoms and behavioral problems seemed to directly contribute to the difficulties faced by caregivers in their upbringing responsibilities.However, our study was the first to report that caregivers' personality traits were also major predictors for difficult child, even outranking ASD core symptoms.That is, the caregivers' personality traits also significantly affect how they perceive difficulty in raising their child.When considering a caregiver of the 2-0 code type, for example, one may be challenged by their child's aspects that make one feels tough to raise, such as aggressive behavior, because these people tend to represent chronic depression, guilty, social withdrawal, and lack of confidence (Levak et al., 2012).Additionally, higher scale scores of 2-0 code type are clinically associated with unipolar depression or remitted depression, and pessimism, the negative cognition of which might lead to a bias that affects measuring a child as a more difficult child (Wetzler et al., 1995;Suzuki et al., 2014).
Total parenting stress measures the overall stress level in the role of parents (Abidin, 1990).The main contributors to the overall stress of caregivers were the ASD patients' behavioral problems (aggressive behavior, anxious/depressed, attention problems, withdrawn, and other problems) and caregivers' psychopathology [depression (code 2), social introversion (code 0), psychasthenia (code 7), and schizophrenia (code 8)], whereas ASD core symptoms showed less predictive power.This suggested that individualized interventions for caregivers targeting their mental health in the context of their psychopathologic profile might be helpful in alleviating parenting stress and subsequently improving treatment outcomes for ASD patients (Osborne et al., 2008).Indeed, the current interventions of mainstream generally aimed at maximizing ASD patients' functioning by improving their core symptoms or specific behavioral problems (Lai et al., 2014).Psychological interventions targeting parenting stress were also suggested, of which the efficacy was confirmed by a metaanalysis of 16 randomized controlled trials with moderate certainty of evidence (standardized mean difference −0.33, 95% CI −0.46 to −0.19) (Kulasinghe et al., 2022).However, none of the meta-analyzed trials have included caregivers' psychopathology as a treatment target.Our findings that caregivers' psychopathology was also a reliable predictor of total parenting stress suggested that interventions may benefit from including caregivers' psychopathology as a novel therapeutic target.
in the CBCL 6-18 sample.This might result from the tendency that the severity of behavioral problems tended to be higher in the CBCL 6-18 sample than in CBCL 1.5-5.For subgroup analysis by comorbid ADHD, the model features related to ASD core symptoms rose in rank in the sample with ADHD compared to the main analysis.This may indicate that addictive deficits in the social domain of both ASD and ADHD have more contributed to parenting stress than the ousted predictors (Mikami et al., 2019).
This study has some limitations.First, the performance of ML models was not confirmed by the external validation set.Second, given that generated models did not show the perfect performance, failing to achieve an AUC larger than 0.90, it could be hypothesized that some potential predictors of parenting stress, such as family income, may have been missed.Third, the thorough interpretation of the MMPI was not possible because the MMPI should be addressed in the context of validity scales, but only clinical scales were employed as model features.Fourth, the SRS-2 and CBCL were rated by a caregiver, possibly leading to the overestimation of the ASD patients' status, especially in those with high parenting stress (Schwartzman et al., 2021).Lastly, the study period was insufficient to investigate the impact of the COVID-19 pandemic on the parenting stress of this population, which calls for further studies.
In conclusion, we identified major predictors for each component of parenting stress in ASD patients' primary caregivers using explainable ML models.This study revealed specific components of caregivers' psychopathology, ASD patients' core symptoms, and behavioral problems which mainly contribute to parenting stress.Our ML models and the identified predictors would be helpful in alerting physicians whether a caregiver is at a high risk of experiencing severe parenting stress and if so, providing timely interventions, which could eventually improve the treatment outcome for ASD patients.

TABLE 1
The list of model features.

TABLE 2
Sample information.

TABLE 2 (
Continued) a p-value for group difference and p < 0.05 indicates statistically significant between-group difference.

TABLE 3
Model performances on the test set with the top 10 major predictors for subscales of parenting stress index (n = 496).aResults of the model (RF or XGBoost model) with the higher ROC AUC on test set were presented. b