Prediction of Gait Impairment in Toddlers Born Preterm From Near-Term Brain Microstructure Assessed With DTI, Using Exhaustive Feature Selection and Cross-Validation

Aim To predict gait impairment in toddlers born preterm with very-low-birth-weight (VLBW), from near-term white-matter microstructure assessed with diffusion tensor imaging (DTI), using exhaustive feature selection, and cross-validation. Methods Near-term MRI and DTI of 48 bilateral and corpus callosum regions were assessed in 66 VLBW preterm infants; at 18–22 months adjusted-age, 52/66 participants completed follow-up gait assessment of velocity, step length, step width, single-limb support and the Toddle Temporal-spatial Deviation Index (TDI). Multiple linear models with exhaustive feature selection and leave-one-out cross-validation were employed in this prospective cohort study: linear and logistic regression identified three brain regions most correlated with gait outcome. Results Logistic regression of near-term DTI correctly classified infants high-risk for impaired gait velocity (93% sensitivity, 79% specificity), right and left step length (91% and 93% sensitivity, 85% and 76% specificity), single-limb support (100% and 100% sensitivity, 100% and 100% specificity), step width (85% sensitivity, 80% specificity), and Toddle TDI (85% sensitivity, 75% specificity). Linear regression of near-term brain DTI and toddler gait explained 32%–49% variance in gait temporal-spatial parameters. Traditional MRI methods did not predict gait in toddlers. Interpretation Near-term brain microstructure assessed with DTI and statistical learning methods predicted gait impairment, explaining substantial variance in toddler gait. Results indicate that at near term age, analysis of a set of brain regions using statistical learning methods may offer more accurate prediction of outcome at toddler age. Infants high risk for single-limb support impairment were most accurately predicted. As a fundamental element of biped gait, single-limb support may be a sensitive marker of gait impairment, influenced by early neural correlates that are evolutionarily and developmentally conserved. For infants born preterm, early prediction of gait impairment can help guide early, more effective intervention to improve quality of life. What This Paper Adds: • Accurate prediction of toddler gait from near-term brain microstructure on DTI. • Use of machine learning analysis of neonatal neuroimaging to predict gait. • Early prediction of gait impairment to guide early treatment for children born preterm.


INTRODUCTION
At near-term age, the infant brain is rapidly developing (Brody et al., 1987;Dubois et al., 2006;Huang et al., 2006;Oishi et al., 2011;Nossin-Manor et al., 2013;Rose et al., 2014). Brain microstructure abnormalities assessed at this age have been found to correlate to neurodevelopment in preterm children, suggesting potential as early biomarkers for neurodevelopmental impairment (Mukherjee et al., 2002;Arzoumanian et al., 2003;Rose et al., 2007Rose et al., , 2009Rose et al., , 2015Alvarez et al., 2011;Van Kooij et al., 2011;Woodward et al., 2012;Aeby et al., 2013). Although advances in neonatal medicine have improved outcomes among children born preterm, 40% of very preterm infants develop motor impairments such as cerebral palsy (CP) and developmental coordination disorder, rates that are substantially higher than the general population (Williams et al., 2010;Spittle et al., 2011). Neonatal identification of at-risk children could enable high-impact early intervention during optimal developmental periods of rapid growth and neuronal plasticity.
Diffusion tensor imaging (DTI) is a promising neuroimaging technique that reflects white matter (WM) microstructural injury and can be used to assess early brain development. DTI reveals the amount and direction of water diffusion. In the brain, water diffusion is restricted by neural development, in particular, the presence and isotropy of cellular membranes and myelination. Thus, brain DTI can be used as a metric of brain neurodevelopment and organization (Hüppi et al., 1998;Counsell et al., 2002;Basser and Pierpaoli, 2011). DTI quantifies fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), and axial diffusivity (AD). FA represents the anisotropy of diffusion, i.e., the extent to which water diffuses in one particular direction (Basser and Pierpaoli, 2011); in WM it is altered by fiber coherence, diameter, density, and myelination. MD is the average amount of water diffusion; AD is the amount of diffusion occurring in the dominant direction or primary axis; and RD is the amount of diffusion occurring perpendicular to the dominant direction. Generally, higher FA and AD indicate more developed microstructure, whereas higher MD and RD indicate less developed microstructure. Brain development alters the dynamics of diffusion, e.g., decreased water content, contraction of extracellular space, myelination, and increased coherence of axonal structures (Kinney et al., 1994;Dubois et al., 2008;Nossin-Manor et al., 2013). The DTI metrics of FA, MD, AD, and RD are affected by these changes and therefore reflect brain development and maturation.
For preterm infants, prognosis based structural brain MRI findings have demonstrated partial success (Miller and Ferriero, 2009;Spittle et al., 2011;Hintz et al., 2015;Anderson et al., 2017). Children at high risk, such as VLBW preterm infants, generally undergo neuroimaging as standard-of-care prior to discharge from the neonatal intensive care unit (NICU). Although currently, DTI is not routinely obtained in NICU clinical brain imaging assessment, it is a promising extension of neuroimaging techniques that may better identify WM microstructural injuries affecting early development (Arzoumanian et al., 2003;Rose et al., 2007Rose et al., , 2009Rose et al., , 2015. We previously reported on neonatal correlates of toddler gait, analyzing near-term DTI in six subcortical WM regions. We analyzed four bilateral regions and two regions of the corpus callosum (CC) which were selected based on previously reported relevance, using standard statistical techniques in the same cohort of VLBW preterm children (Rose et al., 2015). The current study aims to improve the predictive value of DTI assessment at this early age of brain development.
Here we include a broader set of brain regions that may more precisely predict gait impairments and ultimately, may inform neuroprotective treatment to improve outcomes for preterm children.
The current study applies exhaustive feature selection and leave-one-out cross-validation of WM in 99 brain regions, including 48 bilateral regions and three regions of the CC, in order to investigate the use of linear statistical models on DTI metrics for early prognosis of toddler gait impairment. A prior study of this cohort employed similar statistical learning methods to investigate prediction of cognitive and motor neurodevelopment, as measured by the Bayley Scales of Infant Development-Third Edition (Schadl et al., 2018). In this study, we employed a supervised statistical learning approach to determine the predictive value of near-term WM microstructure in VLBW preterm neonates in relation to temporal-spatial gait metrics at 18-22 months adjusted age. We hypothesized that applying a more comprehensive approach using DTI metrics of FA, MD, AD, and RD in a subset of three near-term brain regions, identified using statistical learning approach of exhaustive feature selection and cross-validation, would demonstrate higher predictive value for gait impairment at 18-22 months adjusted age, compared to using standard techniques.

MATERIALS AND METHODS
Participants born with VLBW (birth weight ≤1500 g), gestational age at birth ≤32 weeks, and no evidence of genetic disorder or congenital brain abnormalities were recruited. 102 infants treated at Lucile Packard Children's Hospital (LPCH) NICU from 1/1/10-12/31/11 participated, representing 76% of eligible infants admitted over the 2-year period. All parents of eligible infants were approached prior to scheduled routine MRI and written informed consent was obtained for this IRB-approved prospective cohort study. 66 of 102 neonates had successful DTI scans at near-term age, collected at end of routine MRI scan, prior to discharge from the NICU.
Of the 66 neonates who had both near-term MRI and DTI, 52 completed follow-up gait assessment at 18-22 months of age, adjusted for prematurity ( Table 1). Gait was assessed for 2-3 walking trials on an instrumented mat, as described previously (Cahill-Rowley and Rose, 2016a). Walking trials included at least four consecutive footfalls with at least one foot always touching the ground. Temporal-spatial gait metrics included walking velocity, step length, step width, and singlelimb support as a percent of the gait cycle (SLS). These temporalspatial parameters are accurately assessed at toddler age, reflect different aspects of gait function such as symmetry, single limb balance, dynamic postural balance, and overall gait function, and are sensitive to differences in gait pattern and impairment (Cahill-Rowley and Rose, 2016a,b). The Toddle Temporal-spatial Deviation Index (Toddle TDI), an assessment which quantifies deviation of temporal-spatial gait parameters from normal and is sensitive to gross motor function in toddlers (Cahill-Rowley and Rose, 2016b), was calculated. Gait impairment was defined as having a gait outcome score worse than one standard deviation from the mean value of a typically-developing cohort (n = 42) at 18-22 months adjusted age, previously published (Cahill-Rowley and Rose, 2016a,b).

Radiological Assessment
Structural MRI was assessed for degree of White Matter Abnormality (WMA) and significant cerebellar abnormality. Radiological evaluation was performed by an experienced pediatric neuroradiologist (XS) and confirmed by a second (KY), both were masked to all other participant data. A form validated for near-term neuroradiological assessment (Hintz et al., 2015) was used to score WMA (1-4) according to a widely used classification system Horsch et al., 2010Hintz et al., 2015): (i) extent of WM signal abnormality, (ii) periventricular WM volume loss, (iii) cystic abnormalities, (iv) ventricular dilation, and (v) thinning of the CC. High inter-rater agreement (96-98%) for moderate-severe WMA using this classification was reported Hintz et al., 2015. Significant cerebellar abnormalities included significant cerebellar lesions defined by Hintz et al. and/or significant cerebellar asymmetry of ≥3 mm in the anteriorposterior or medial-lateral direction (Hintz et al., 2015). The structural MRI findings in this cohort were previously reported (Rose et al., 2015). Diffusion tensor imaging was calculated based on diffusionweighted images (DWI) obtained along 25 orientations with slice thickness of 3 mm, matrix size of 128 × 128, and 90-degree flip angle on a 3T MRI (GE Discovery MR750, 8-Channel HD head coil) at LPCH at the end of routine MRI acquisition. A repetition of DTI sequence was successfully collected in 64 of 66 cases. Thus, in 64 subjects, a full scan was motion free. For 2 out of 66 cases, a composite image was generated by selecting the best slices out of two repetitions manually and combining them to a composite image. Infants were swaddled and fed and typically remained asleep during the scan. Sedation typically was not utilized for routine near-term MRI and was not utilized as part of the research protocol.

DTI Processing
A trained inspector selected the best DTI repetition to eliminate MRI scans with artifacts or evidence of motion. As noted above, for 2 out of the 66 cases, due to the lack of usable full repetition, a composite repetition was generated from the best image slices. Eddy current distortions were corrected by applying affine transformation. Skull stripping was performed based on B0 and trace (vectorial sum of diffusivity) maps using a ROI editor, and manually rotated to align with the JHU neonatal template, which is a template based on a neonatal brain atlas integrating DTI data with co-registered anatomical MRI (Oishi et al., 2011). Scans were analyzed in a semi-automated, atlas-based manner, using DTI-studio with settings detailed in Oishi et al. (2011).
Diffusion tensor imaging images were processed using DiffeoMap using FA and trace map to perform a large deformation diffeomorphic metric mapping transformation. Amplitude of trace >0.006 mm 2 /s and FA < 0.15 were considered cerebrospinal fluid (CSF) and gray matter, respectively, and were used to obtain the mask of WM regions. WM regions were then segmented into 126 regions based on the JHU parcellation atlas, and the average FA, MD, AD, and RD values were calculated for each region. The number of regions was then narrowed to apical regions ensuring quality of registration, resulting in 48 regions of both sides in addition to the splenium and genu of the CC, and the overall CC ( Table 2). Further examination was performed on the FA, MD, AD, and RD values of a total of 99 regions adjusted for postmenstrual age (PMA) at scan.

Statistical Analysis
For each temporal-spatial gait metrics, including velocity, step length, step width, SLS, and Toddle TDI, distinct linear models were generated to examine their correlations with DTI measures. Using an exhaustive search in the feature space, multiple linear regression models were evaluated with leaveone-out cross-validation, L2 regularization, and regularization strength 1.0 to find a set of 3 regions (features) most correlated with gait metrics. Logistic regression models were evaluated with leave-one-out cross-validation, L2 regularization and regularization strength 1.0 on DTI to find a set of 3 regions that best classified high-risk infants scoring worse than one standard deviation from typically developing mean values previously reported (Cahill-Rowley and Rose, 2016a). Best models were selected based on leave-one-out crossvalidated, adjusted coefficient of determination (R 2 ) for linear regression, and leave-one-out cross-validated area under the curve (AUC) of the receiver operator characteristic (ROC) for the binary classification with logistic regression. Logistic regression models were also evaluated on the structural MRI for presence of WMA, cerebellar signal abnormality, cerebellar asymmetry, and intraventricular hemorrhage (IVH, grade 3 or 4) on structural MRI. Diffusion tensor imaging scalars were adjusted for PMA at scan and normalized to have zero mean and unit variance. To ensure model generalization and robustness, and avoid overfitting, performance measures, i.e., adjusted R 2 and AUC, were evaluated with leave-one-out cross-validation (LOOCV), such that for both regression and classification tasks, N distinct models (N = number of subjects) using the same set of features were evaluated leaving out the n-th subject during the determination of the model parameters. Ultimately, each model was evaluated on the left-out subject, and the N distinct results in pair with their ground truth values were used to calculate the cross-validated performance metrics. For the classification tasks (high-risk vs. low-risk), balanced sensitivity and specificity were determined by maximizing the sum of the squares of sensitivity and specificity (sensitivity 2 + specificity 2 ). Coefficients of logistic regression reported in Table 3 determine the magnitude and direction with which a feature contributed to the probability of considering a subject as high risk. A positive coefficient implies, that a higher feature value increases the risk, whereas a negative coefficient indicates that a higher feature value lowers the risk, and shifts the evaluation of the logistic function toward the normally developing range. Results were obtained using Scikit-learn (Pedregosa et al., 2011) and Statsmodels (Seabold and Perktold, 2010).

RESULTS
Near-term MRI and DTI were collected at 36.6 ± 1.8 weeks postmenstrual age in 66 children born preterm (28.9 ± 2.3 weeks postmenstrual age) with very-low-birth-weight (1090 ± 266 g). Follow-up gait temporal-spatial parameters were collected in 52 children at 20.2 ± 1.0 months adjusted age; all participants had complete neuroimaging and gait data sets. Table 3 reports the prediction of gait impairment classification based on DTI and MRI using logistic regression with exhaustive feature selection and cross-validation.
Clinical findings on structural MRI were also evaluated for their predictive value ( Table 4) using logistic regression with LOOCV. Presence of WMA, cerebellar signal abnormality, cerebellar asymmetry, and IVH (grade 3 or 4) identified on MRI did not correctly classify children with impaired gait compared to children with typical gait based on TDI; no clinical findings on structural MRI correctly identified more than one child with a particular gait metric abnormality. Logistic regression of near-term structural MRI results did not correctly classify infants as high-risk for impaired velocity, step width, SLS, or step length. Cross-validation revealed that models built on structural MRI assessments, which only included six metrics, were not sufficiently robust to maintain findings with crossvalidation. Thus, impairments in gait were not predicted from traditional MRI findings.
Gait temporal-spatial values were predicted using crossvalidated linear regression on near-term DTI with exhaustive feature selection of three brain regions ( Table 5). The three most predictive brain regions for gait explained 22% of variance in velocity; 34% in step width; 16 and 15% in right and left step length, respectively; 19 and 16% in right and left SLS, respectively; and 16% of variance in Toddle TDI.

DISCUSSION
Statistical learning is an area in statistics, referring to a set of tools for modeling complex datasets. It blends parallel developments in computer science, in particular to machine learning, and has been successfully applied to numerous fields. It is also a promising method to improve prognostic accuracy and guide early treatment of preterm infants. We built supervised statistical models using exhaustive feature search applied on near-term brain microstructure assessed on DTI to predict temporal-spatial gait in preterm toddlers at 18-22 months adjusted age. Due to the multiple comparisons inherent to exhaustive search, leaveone-out cross-validation was applied to reduce over-fitting and optimize robustness of generalization. Application of exhaustive feature search with cross-validation on DTI generated relatively high predictive values, compared to standard techniques using structural MRI at near-term age.
Infants were classified with high sensitivity and specificity as high-risk for gait impairment based on near-term WM microstructure (Table 3). Most commonly, the genu of the CC contributed to best performing logistic and linear models (for 3/6 gait parameters and 4/6 gait parameters, respectively) as one of the three selected features, suggesting its strong predictive value for gait metrics. The CC has been previously found to be associated with neurodevelopmental outcome. Anderson et al. (2006) examined 61 VLBW infants and found that poor growth of the CC length was associated with severe motor delay and cerebral palsy by age 2. Mathew et al. (2013) found associations between WM microstructure of the CC as assessed on DTI and motor function in eight very preterm infants. Rose et al. (2008) examined 23 preterm infants and found reduced FA mainly within the posterior regions of the CC. Malavolti et al. (2017) FIGURE 4 | (A) Receiver Operating Characteristic curve of leave-one-out cross-validated classification of toddlers having step width above one standard deviation of the mean. (B) Balanced confusion matrix of leave-one-out cross-validated classification of toddlers having step width above one standard deviation of the mean.
found that adverse motor outcome at 18 months corrected age was associated with smaller neonatal CC size in the posterior subdivision (p = 0.003).
Both the hippocampus and the inferior frontal gyrus contributed to several best performing logistic and linear models. The hippocampus contributed to logistic regression of 4/6 gait  (Table 3) and is involved in working memory. The inferior frontal gyrus contributed to logistic regression of 3/6 gait parameters (Table 3) and to linear regression of 3/6 gait parameters (Table 5), and has previously been shown to control motor responses (Swick et al., 2008).
Toddler's with the gait impairment of SLS time were most accurately predicted from near-term brain microstructure.
This may be explained because as a toddler learns to walk, achieving sufficient SLS requires single limb strength and balance as well as bilateral stability and symmetry. As a fundamental element of human biped gait, SLS may be a sensitive marker of toddler gait impairment influenced by early neural correlates that are evolutionarily and developmentally conserved. Classification with the logistic function fitted on the right fusiform gyrus FA, splenium AD, and genu FA predicted right SLS with 100% sensitivity and 100% specificity; logistic function fitted on the right middle frontal gyrus RD, right superior occipital RD, and left lateral fronto-orbital gyrus FA predicted left SLS with 100% sensitivity and 100% specificity ( Table 3). In addition, linear regression of left anterior limb of the internal capsule MD, genu RD, and right inferior frontal gyrus RD was most predictive of right SLS; and right retrolenticular part of the internal capsule RD, genu RD, and right superior occipital gyrus FA were most predictive of left SLS ( Table 5).
Step width was also well predicted in the present study, the linear regression with exhaustive feature search and crossvalidation found that the left and right globus pallidus, along with the right tapetum, were predictive of step width, a gait metric that typically reflects development of postural balance.

Velocity
Step width Step Logistic regression with exhaustive feature selection was able to predict gait impairment, while standard MRI findings were not. Single-limb support (SLS) was perfectly predicted by logistic regression, bilaterally.
Coefficients of logistic regression models ( Table 3) reinforce prior findings, that in white matter regions with negligible crossing fibers, as compared to gray matter regions, fiber coherence is well measured and reflects neurodevelopment. The Step length (R) L cingulum cingular part (AD) 0.28 0.16 L cuneus (FA) L superior occipital gyrus (FA) Step direction of the DTI metrics of FA, MD, and AD values of the CC were as expected in affecting the probability of being high risk. Gray matter features provide less ease of interpretation due to higher cortical connectivity, relatively later development, and associated variability in direction of DTI metrics. We found that prediction by the models using DTI outperformed models using structural MRI (Table 4), consistent with prior studies that found DTI provided higher predictive value for neurodevelopmental outcome compared to structural MRI for Arzoumanian et al. (2003), Rose et al. (2007), De Bruïne et al. (2013. In our comparison, however, we used metrics that were derived by manual inspection from structural MRI. In further studies, we encourage comparing and examining structural MRI that is segmented and assessed on a regional basis similar to our approach with DTI. Analysis with DTI using the statistical learning approach of exhaustive feature selection and cross-validation has potential to improve prognostic accuracy of neonatal neuroimaging. The clinical feasibility of using DTI is increased by advances in automated data processing that improve its ease of use, repeatability, and thus prognostic value. In this study we used a linear model logistic regression and therefore both its implementation and interpretation are relatively straightforward. These methods could be implemented clinically to improve prognostic accuracy, if replicated in a larger population. Individual patient DTI metrics of most predictive brain regions could be input into a simple spreadsheet to identify infants at high risk for cognitive and motor impairment. We previously reported data from the same cohort, and evaluated velocity and SLS with respect to WM and cerebellar abnormality as assessed on structural MRI, and in 6 different subcortical WM regions assessed on DTI, using standard partial correlation analyses (Rose et al., 2015). The MRI findings did not correlate with velocity or SLS; genu DTI did correlate with both velocity and SLS. DTI metrics of the other five regions (splenium, anterior limb of the internal capsule, posterior limb of the internal capsule, thalamus, and globus pallidus) did not.
In the current study, the genu and splenium of CC, as well as fusiform gyrus, superior-occipital gyrus, lateral fronto-orbital gyrus, and right middle frontal gyrus contributed to 100% accurate prediction of SLS impairment. Further, the anterior limb of the internal capsule and retrolenticular part of the internal capsule and inferior frontal gyrus also contributed to explaining approximately 15% of the variation of toddler SLS, a sensitive gait metric that reflects gait stability, weight bearing, and symmetry. These findings suggest that a set of brain regions taken together may be more sensitive to outcome than a single region in isolation.
To ensure ease of interpretability we used linear models, which limit accuracy due to the highly non-linear nature of the solution space. Study limitations also include that we analyzed a relatively small cohort which requires the use of statistical tools that are less robust compared to state-of-theart machine learning approaches (i.e., deep learning), and that classification on imbalanced dataset can be biased, even when using ROC-AUC or precision and recall as performance metrics. Furthermore, evaluation of cortical WM can be confounded by imaging resolution and signal-to-noise ratio. Methods outlined in this study need to be validated on larger preterm populations.
Applying machine learning algorithms on near-term regional WM microstructure may help identify risk of neurodevelopmental delay, guide early intervention and ultimately, may inform neuroprotective treatment to improve quality of life for preterm children. In this study, we employed an exhaustive feature selection algorithm to identify a set of 3 brain regions that best predicted outcomes. Results indicate a relatively high prognostic value for temporal-spatial gait parameters, in particular SLS, and warrant further investigation in larger preterm populations.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was approved by the Stanford University Institutional Review Board and consent was obtained from parents or guardians.

AUTHOR CONTRIBUTIONS
KC-R, RV, JR, and KS contributed to the concept, data collection, data analysis and interpretation, and writing the manuscript. KS, KY, and DS contributed to the concept, data analysis and interpretation, and writing the manuscript.