Multivariate Analysis and Machine Learning in Cerebral Palsy Research

Cerebral palsy (CP), a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML) approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

neurodevelopment, and detect or predict CP. Neuroimaging such as magnetic resonance imaging (MRI) and cranial ultrasound are useful to detect structural changes [intraventricular hemorrhage, periventricular leukomalacia (PVL), etc.] in the newborn brain, monitor lesion progression, and assess treatment effects, although compared with MRI, cranial ultrasound is less sensitive to lesions in the gray matter or malformations. Severe CP (caused by severe brain lesions such as PVL) can be identified by MRI or cranial ultrasound as soon as the lesions become recognizable on imaging after birth. However, 12-14% of children with CP have negative MRI scans due to subtle lesions in the brain (6). Thus, an integrated approach (imaging, motor assessment and neurological examinations) is needed to predict mild or moderate CP.
To predict CP in infants, Prechtl has described a general movement assessment method as a clinical assessment approach to identify CP motor impairments in infants by evaluating their spontaneous general movements (7). In particular, two atypical motor development features [(1) the presence of crampedsynchronized general movements at a preterm or term age and (2) the absence of small smooth movements or fidgety movements at 3-5 months] have been defined (8)(9)(10), which can identify CP in high-risk infants reliably (11). However, only well-trained physicians can perform such assessment, and general movement assessment based on visual observation by physicians is often influenced by subjective impressions and observer fatigue. Therefore, there is growing interest in developing multivariate and machine learning (ML)-based movement assessment tools for a more objective and quantitative motor assessment to detect movement impairments in high-risk infants (12,13).
Multivariate analysis is a statistical analytical approach that simultaneously evaluates multiple variables, which compared with univariate analysis, may have more advantages (e.g., free from restrictions of various assumptions in univariate analysis) in identifying the associations between multiple data variables (e.g., variables associated with CP outcomes), grouping data into different groups or subgroups (e.g., different CP subtypes), and developing new diagnostic tests (e.g., differentiate CP subtypes with key feature variables). Multivariate analysis includes statistical methods such as principal components analysis (PCA), canonical variate analysis, independent components analysis, and multivariate regression. ML (or statistical learning) is a group of multivariate analytic methods that first identify the most significant data features or patterns that can best separate the data into different classes in the training dataset, and then apply these data features or patterns to the test dataset for data classification or prediction. ML has been increasingly applied to the biomedical field (14,15), and examples of ML methods include linear discriminate analysis (16), support vector machine (SVM) (17), artificial neural networks (ANN) (18), random forest (19), and cluster analysis (20).
With growing interests, multivariate analysis has been increasingly employed in CP research in recent years, and research with multivariate analyses in CP is in infancy (14). To provide an overview of this relatively young field, PubMed search was performed with keywords "multivariate analysis cerebral palsy pediatric, " "machine learning cerebral palsy, " or "multivariate analysis cerebral palsy imaging. " The search yielded 126 articles. Articles were excluded if their subjects were not pediatric or the statistical methods used were not multivariate or the article was published before year 1990. This paper assessed the studies that used multivariate analysis in CP research and found that multivariate studies in CP are mainly in four categories: (1) risk factor identification; (2) detection of CP and identification of CP abnormalities; (3) movement assessment for CP prediction; and (4) outcome evaluation.

MULTivARiATe AnALYSiS in RiSK FACTOR iDenTiFiCATiOn
Early work on CP risk factor identification started from birth certificates. In a large population-based cohort study, data from birth certificates for 192 children with CP in four counties in California were compared with 155,636 healthy children in the same regions and the study found that low birth weight and (early or late) gestational age at birth were associated with high prevalence of CP, but early prenatal care and delivery at a hospital (for low birth weight neonates) were not associated prevalence of CP (21). Using multivariate analysis on clinical data of 113 CP infants (identified from 1,105 infants), Pinto-Martin et al. found that in low birth weight infants, cranial ultrasound imaging abnormalities such as parenchymal echodensities/lucencies (or ventricular enlargement) and germinal matrix/intraventricular hemorrhage were strong risk factors for disabling CP, but factors such as birth weight, gestational age, and Apgar score were not associated with it (22). In addition, a multicenter, large sample study (across eight European study centers, n = 585) revealed that there was a high rate of infection in mothers of CP children during their pregnancy and major CP abnormalities on structural MRI included white-matter damage due to immaturity (e.g., PVL) (42.5%), lesions in the basal ganglia (12.8%), cortical or subcortical lesions (9.4%), and malformation (9.1%) (23). A number of studies that identified CP risk factors have performed both univariate and multivariate analyses (24)(25)(26) where the risk factors identified by multivariate analyses were a subset of those identified by univariate analyses (24,25), and the results of multivariate analyses were more rigorous and valid.
Further, the risk factors for CP revealed by the multivariate studies are useful to prevent CP. Several CP risk factors such surface-based approach processed more subjects' data (87%) than the voxel-based approach (65%), generated more coherent tractography Surface-based approach revealed more significant correlations between DTI metrics and five clinical scores as brain injury and infection can be managed and avoided by preventing their causative mechanisms, and preventive efforts such as rubella vaccination, anti-D vaccination, and preventing methylmercury contamination are effective in preventing CP (2). In addition, meta-analysis has indicated that CP may be reduced by 30% in premature infants (<32 weeks gestation) by providing mothers of imminent-labor with magnesium sulfate for neuroprotection of their babies (43,44). Further, early interventions such as hypothermia have prevented CP in 12.5% of infants with neonatal encephalopathy following an acute intrapartum hypoxic event (45). Since, currently, there is no cure for CP, CP prevention is critical to reduce the prevalence of CP and save children from CP and CP-caused life-long disabilities. Multivariate analysis may help identify significant risk factors and early interventions in order to prevent CP. Taken together, multivariate analysis is important in identification of risk factors for CP and the risk factors identified such as premature birth and abnormal (cranial ultrasound or structural MRI) imaging findings are useful not only for CP cause identification and diagnosis but also for CP prevention. Further research is needed to identify more manageable and avoidable risk factors and early interventions (such as neuroprotective drugs or therapies) to prevent CP and reduce CP morbidity rate.

MULTivARiATe AnALYSiS in DeTeCTiOn OF CP AnD iDenTiFiCATiOn OF CP AbnORMALiTieS
Since the current average age for diagnosis of CP is around 2 years and infants have higher potential for neural recovery (1, 2), early detection of CP is critical to make early intervention possible. To identify CP in high-risk infants, neuroimaging such as cranial ultrasound and MRI is important for lesion detection and deciding the timing of the lesion. Studies applied multivariate analytic methods to detection of CP and identification of CP abnormalities are summarized in the latter part of Table 1 (32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42). A multicenter study of very-low birth weight infants (n = 381, survival rate = 87%, 36, or 9.4% with CP) indicated that cranial ultrasound findings such as grade 3-4 intraventricular hemorrhage and PVL were useful in predicting CP; in particular, PVL and ventriculomegaly were related to high detection rate (≥30%) for CP (24). Further, to differentiate CP subtypes, Griffiths et al. examined the T2 MRI images of children with spastic or dyskinetic CP (n = 20 in each group), and found that patients with spastic CP had more severe injury to white matter near the paracentral lobule, while patients with dyskinetic CP had more injury to the subthalamic nucleus (STN) (33). Multivariate logistic regression further identified the associated factors (i.e., lesions in distinctly different anatomical locations) for differentiation of spastic and dyskinetic CP (33). When brain injuries in patients with CP are subtle, advanced imaging such as diffusion tensor imaging (DTI) and diffusionweighted imaging is useful to detect CP injuries with subtle abnormalities. DTI metrics such as mean diffusion (MD) and fractional anisotropy (FA) are often used to identify injuries in white matter tracts. The value of DTI in identifying degenerative changes in patients with spastic CP due to periventricular white matter injury has been demonstrated by an early study, which reported that children with spastic hemiparetic CP (caused by periventricular white matter injury) had reduced DTI fiber count on the ipsilateral (the same side as the lesion) side of the corticospinal tract (CST), corticobulbar tract (CBT), and superior thalamic radiation, and had MD and FA changes reflected neurodegeneration of the motor and sensory pathways (n = 5) (46). Further DTI studies found that white matter damage in the posterior thalamic radiation pathways was more severe than that in the CSTs in children with CP (n = 28) (47), and DTI abnormalities in several white matter tracts such as posterior thalamic radiation pathways or superior regions of the thalamocortical and corticomotor tracts correlated with motor function measured by, e.g., Gross Motor Function Classification System (GMFCS) level (n = 28-34) (34,41,47). A review paper summarized the results of 22 DTI studies in CP and reported common findings of decrease FA (or increased MD) in the corticomotor and sensorimotor pathways, which correlated with clinical measures (48). Some research findings suggested that the CP injury in the somatosensory circuits might be more severe than that in the motor circuits, which may contribute more to motor impairment in CP (49). Further research is needed to unfold the mechanisms underlie sensorimotor impairment in CP and to improve detection of CP through neuroimaging.
Apart from neuroimaging, the abnormalities of CP have been identified via multivariate analysis using patients' clinical data (perinatal history, CP type, CP frequency, motor function, degree of disability, etc.) and data from other sources such as electromyography (EMG) and bone mineral density (BMD) (32,(35)(36)(37)(38)(39)(40)42). For example, multivariate analyses have indicated that 85% of CP patients have oropharyngeal dysphagia (36), 13% have sleep disturbance (37), 70% have abnormal optic nerve head (40), and 50% have low BMD (42). Factors associated with these CP abnormalities have also been identified (35,37,39). In addition, since neonatal encephalopathy can cause CP, detection of neonatal encephalopathy helps detect potential CP. Structural brain connectivity networks of infants with neonatal encephalopathy have been examined using diffusion tractography extracted from DTI images, and ML methods such as SVM have been applied to structural connectivity features to detect neonatal encephalopathy (50). Moreover, since epilepsy and seizure disorders are common in children with CP, electroencephalography (EEG) is used to detect co-occurring seizures in high-risk infants or children. ML approaches such as SVM and ANN have been applied to EEG features to identify ictal and interictal spikes and achieved high detection rate for seizures (51). Further, multivariate analysis has found that children with CP after perinatal or neonatal stroke are more likely to have severe disability, cognitive impairment or epilepsy than CP children after delayed stroke (32).
Taken together, multivariate analytic studies in CP detection have identified imaging markers such as intraventricular hemorrhage and PVL on cranial ultrasound (24), injury to white matter near the paracentral lobule or to the STN on T2 MRI images (33), and injury in the CSTs and the posterior thalamic radiation pathways on DTI images (34,41,46,47), which are useful in detecting CP and differentiating CP subtypes. Multivariate analyses have also identified CP abnormalities and their associated factors from non-imaging data (32,(35)(36)(37)(38)(39)(40)42). Further research is needed to identify biomarkers at the early stages of the disease to improve the diagnosis of CP, reduce diagnosis delay, and allow early identification and intervention for CP.

ML in MOveMenT ASSeSSMenT FOR CP PReDiCTiOn
Studies reported applications of ML in movement assessment for CP prediction are summarized in Table 2 (52)(53)(54)(55)(56)(57)(58)(59)(60)(61). Early identification of motor impairments in high-risk infants enables early detection of CP. The two atypical movement features (related to the cramped-synchronized general movements and the absence of fidgety movements) in general movement assessment are strong predictors for CP diagnosis (8)(9)(10). Based on these key motor impairment features, their movement characteristics and associated movement variables have been identified to detect movement impairments in high-risk infants (53)(54)(55)(57)(58)(59). ML approaches have made it possible to analyze recorded movement data and identify motor impairments automatically.
As a pioneer study, Meinecke et al. analyzed the 3D movement data of infants (n = 22, seven with CP) from video recordings, extracted an optimal combination of movement features with cluster analysis, and identified CP motor impairments with quadratic discriminant analysis (overall detection rate: 73%) (52). Further, the characteristics of fidgety movements and associated movement measurements have been identified to distinguish infants with movement impairments from those without (53,54,58). For example, extracting motion features related to fidgety movements (such as motion distance and relative frequency) from video recordings with an optical flow-based method, Stahl et al. examined the motion patterns of 82 infants (15 with CP), applied SVM classifier to detection of CP movement impairments, and achieved a good classification accuracy (93.7 ± 2.1% with features of relative frequency; sensitivity: 85.3 ± 2.8%; specificity: 95.5 ± 2.5%) (58).
Apart from video recordings, other movement recording systems such as accelerometers and electromagnetic movement tracking system have been employed for movement assessment and CP prediction. Heinze et al. examined the general movements of a group of newborns and infants (n = 23, 4 with CP) with accelerometers, selected optimal (combined) movement features with genetic algorithm, classified CP motor impairments with a decision tree-based classifier, and obtained overall detection rates of 88-92% (55). To distinguish the gait patterns between patients with CP (n = 4), patients with multiple sclerosis (n = 4), and healthy controls (n = 12), Alaqtash et al. extracted gait features from 3D ground reaction force data, compared the gait patterns of the three groups, and applied nearest-neighbor classifier and ANN to gait feature classification, which led to overall classification accuracies of 85% (ANN, with a combination of gait features) and 95% (after optimizing the gait features to an optimal set of six gait features) (56). Moreover, using electromagnetic movement tracking recordings, Karch et al. studied the general movements from 63 infants (10 with CP), extracted movement features such as joint centers, and computed stereotype scores with dynamic time warping, yielding a high CP classification accuracy with stereotype score of upper lime movement (sensitivity: 90%; specificity: 96%) (57). For a review on movement recognition techniques in general movement assessment for CP prediction in high-risk infants, see Ref. (12).
In addition, multivariate and ML approaches have been used in the assessment of physical therapy, and the effect of orthotic devices such as ankle foot orthosis on CP patients (55,56). To evaluate the quality of exercises in CP physical therapy, Parmar and Morris (n = 5) applied four classifiers (SVM, neural networks, AdaBoosted decision tree, and dynamic time warping) to movement feature (joint and angle data in time or frequency domain) classification to identify correct or wrong exercise, and found that among the four classifiers, AdaBoosted decision tree performed the best with high classification accuracies (94.68% for joint data; 90.3% for angle data) (56).
However, these machine-learning-based movement assessment studies are at the early stage of research. For example, the subjects in the study of Alaqtash et al. (56) were healthy subjects (n = 5), the movement data of wrong exercises (with errors) were simulated data, thus, the results were preliminary. Further research is needed to apply machine-learning methods to real movement data of patients with CP. In addition, the sample sizes of patients with CP in these machine-learning-based   speech-language development of new-born babies and infants at risk and automatically detects neurodevelopmental disorders such as CP by multidimensional data analysis and machinelearning approaches (13). Although it is challenging (in technical details), the fingerprint model enables neurological assessment of at-risk infants in an objective and quantitative manner and facilitates early detection of CP and other neurodevelopmental disorders, which may be the future direction of pediatric clinical practice.

MULTivARiATe AnALYSiS in CP OUTCOMe evALUATiOn
Although there is no cure for CP, currently, treatment effects and outcomes in CP patients have been studied extensively. Multivariate approaches have been applied to outcome assessment (including survival analysis) in CP, and Table 2 (the latter part) provides a summary of these studies (62)(63)(64)(65)(66)(67)(68)(69)(70)(71)(72)(73)(74)(75)(76)(77)(78). The majority of the outcome studies employed a two-step approach: first, univariate analysis is used to identify variables that are associated with outcome; second, multivariate analysis is used to further examine the variables indicated by the univariate analysis and identify outcome predictors. A large sample of children with CP (n = 4,007) in UK were studied, and multivariate survival analysis indicated that the death rate was ~8, 85-94% of the children survived to age 20 years old, and the best predictors of CP survival were the number and severity of impairments (62). The multivariate outcome studies in CP fall into three categories: (1) outcome evaluation of medication and supportive treatments; (2) surgical outcome evaluation; and (3) quality of life (QOL) evaluation.

Outcome evaluation of Medication and Supportive Treatments
The effect of commonly used medication oral baclofen on children with CP has been assessed with a multivariate model for the population pharmacokinetics analysis (n = 61), and it has been found that baclofen dosage based on body weight was appropriate to treat patients (≥2 years old), and determinants of apparent clearance in these children included body weight, a possible genetic factor, and age (70). Plasticity (shown as increased FA in the CSTs on DTI and improved motor function measures) induced by combined therapy (botulinum followed by physiotherapy) in children with spastic quadriplegia (n = 8) has been reported (80), while a later DTI study indicated that the addition of botulinum to physiotherapy did not influence the outcome at 6 months in children with spastic diplegic CP (n = 18) (81). DTI has also been used to evaluate the motor function outcomes of hemiplegic CP patients after rehabilitation treatment and DTI measurements such as the fiber number and FA of bilateral CSTs were correlated with functional level of hemiplegia scale (82). The quality of exercises in CP physical therapy has been evaluated with several classifiers, and AdaBoosted decision tree obtained good detection rate of exercise errors (61). In addition, multivariate outcome evaluation of therapies such as transcranial direct current stimulation (tDCS) has been performed (75).
Grecco et al. investigated the functional outcome of tDCS in children with CP (n = 56), and multivariate logistic regression analyses identified that the presence of motor evoked potential was a predictor for walk test and gait speed, subcortical injury was a predictor for gross motor function, and both of them were predictors of motor function gain arise from tDCS combined with gait training in these patients (75). However, there are few outcome studies of medications and supportive treatments in CP using multivariate analyses, and multivariate analysis may play a bigger role in such outcome evaluation to reveal the true therapeutic effects of these treatments and their outcomes in CP patients. and found that 6.4% of patients developed deep wound infection following surgery, and presence of a gastrostomy/gastrojejunostomy tube was the factor associated with infection (69). In addition, Kato et al. investigated cervical spine in patients with athetoid CP who underwent posterior decompression surgery (n = 31) and multivariate analysis showed that pedicle sclerosis associated with a higher risk of breach of cervical pedicle screws (71). Further, Minhas et al. evaluated the effect of body mass index class on complications after orthopedic surgery in children with CP (n = 1,746) and multivariate logistic regression analysis revealed that underweight status was the risk factor for complications in osteotomies and spine surgery (76). The risk factors identified by these studies are helpful to avoid the surgical complications and improve surgical treatments in CP.

Surgical Outcome evaluation
In children with CP who underwent surgery, intraoperative neuromonitoring (IONM) often fail (failure rate 61%) (74). Mo et al. studied IONM in children with CP who underwent surgical scoliosis correction (n = 206) and multivariate logistic regression analysis revealed that PVL, hydrocephalus, and encephalomalacia were the predictors of poor IONM signals, while moderate or marked hydrocephalus and encephalomalacia were the predictors of no signals (74). Further, outcome prediction of CP surgical procedures has been explored in a recent study. Galarraga et al. examined children with CP who underwent (hip, ankle, foot, etc.) surgery (n = 115), and multi-regression analysis revealed that preoperative and surgical data could predict postoperative kinematics, and mean prediction errors (varying from 4° to 10°) were smaller compared with the variability of gait parameters (77). These results are encouraging because they indicated that the postsurgical kinematics of patients with CP could be predicted (relatively accurately with small mean prediction errors) using presurgical and surgical data, which allows an estimate of postsurgical outcome ahead of time.

QOL evaluation
Quality of life in physical ability, intellectual ability, self-care, and other aspects of life is an important outcome in CP. Multivariate analysis has been frequently used to assess QOL in patients with CP (65,66,68,72,78), and factors associated with physical QOL and self-care have been identified. For example, a multivariate analysis on QOL data of infants with CP (n = 92) identified GMFCS and intellectual capacity as the associated factors of self-care activity development, and GMFCS as the associated factors of mobility activities development (72). Further, a recent multivariate analysis showed that physical activity was positively associated with physical and total QOL in patients with CP (n = 128), and walking performance was positively associated with physical QOL (78). The factors identified by these studies may improve the QOL of patients with CP.
Taken together, since there is no cure for CP yet, and the death rate of CP is high (~8%), there is much to do to improve the outcomes of CP, and multivariate analytic approaches may play a bigger role in meeting such clinical demands. Surgical outcome predictors and risk factors for complications in CP surgical treatments have been identified by a number of multivariate outcome studies (63,64,67,69,71,73,76), which are useful not only for outcome evaluation and prediction but also for avoiding complications and improving surgical treatments in CP. However, there are few outcome studies for medications and supportive treatments (such as physical therapy) in CP using multivariate analysis. Thus, further research is needed to evaluate the outcomes of medications and supportive treatments, and multivariate analysis may play a bigger role in such outcome evaluation to reveal the true therapeutic effects of these treatments and their outcomes in CP patients, and help improve the outcomes of these treatments for patients with CP.

SUMMARY
Multivariate analysis has been applied to several areas in CP research such as identification of risk factors for CP, detection of CP and identification of CP abnormalities, movement assessment for CP prediction, and outcome assessment. The studies reviewed in this paper have demonstrated that multivariate analytic and ML approaches have made it possible to analyze movement recordings and identify CP movement impairments automatically. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate analytic and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these methods in CP detection, movement assessment, and outcome evaluation/prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

AUTHOR COnTRibUTiOnS
JZ reviewed the multivariate analytic studies in cerebral palsy and wrote up the manuscript.

ACKnOwLeDGMenTS
This work was inspired by the clinical team led by Dr. Manish Shah at the University of Texas Health Science Center (in Houston) and affiliated hospitals who are dedicated to patient care for children with cerebral palsy. Proofread was kindly provided by Drs. Gary and Carla Brandenburger.