Defining a universal measurement unit and scale for gross motor development

Introduction The ability of children to accomplish progressively more difficult gross motor tasks follows a predictable sequence that has been well documented as part of development. Current existing instruments were developed independently using classical test theory methods which led to the lack of a universal measurement scale and unit. The purpose of this study was to test a specification equation, anchored to commonly accepted and reproducible tasks in gross motor development, to generate a universal measurement scale and unit of measurement, called the Gross Motor (GM) unit. Methods We rated component measures for each of the gross motor development tasks on the Gross Motor Function Measure-66 (GMFM). The GMFM is a gross motor development measure created with Rasch measurement theory to quantify observed difficulty levels measured on an interval scale. Component measures for body position, movement, and support were based on hypothesized contributions to gross motor development based on theory. Forward stepwise linear regression was used to test a specification equation. The specification equation was anchored to reference points to fix a unit size. Results Our specification equation explained 87% of the variance in observed gross motor task difficulty. Predicted difficulty for gross motor tasks was strongly associated with observed task difficulty (r = 0.94, p < 0.0001). Our specification equation was anchored to 1) lying supine (0 GM units) and 2) walking unsupported (100 GM units) setting the size of the GM unit to 1/100 of the distance between lying supine and unsupported walking. Discussion Our specification equation allows for experimental testing of gross motor development theories. This approach provides a framework for refining our understanding and measurement of gross motor development and creates a universal scale and unit. We expect that this will facilitate placing many, if not all, current gross motor development instruments on the same measurement scale.


Introduction
The progression of gross motor development during childhood is well known, in part, because it is an observable, global human experience.Gross motor development typically occurs through predictable sequencing and patterns including proximal to distal stability, cephalic to caudal progression of movement against gravity, movement of the body in stationary positions which allows for the development of balance and coordination skills, and finally manipulation of objects within the environment for further exploration (1).Identifying a child's location along the development progression is important because early detection and prevention of gross motor delays have proven to be effective and efficient in maximizing physical health (2).Measurement of gross motor development allows for quantifying a child's location and is imperative to identify any delays.Currently, there are more than 200 pediatric assessments or instruments used to measure a child's gross motor development (3)(4)(5).There are several challenges that can arise from having multiple measurement instruments (6, 7) but the greatest is arguably the inability to easily equate measures across tools for meaningful decision making (8,9).
Measures from existing instruments cannot be equated because of two critical problems; (1) they have been developed from data rather than theory, and (2) they lack a universal measurement unit that is standardized and reproducible (10)(11)(12)(13)(14). Developing instruments from data sets with group-level analytical approaches will always cause measures to be explicably tied to the sample studied (i.e., sample dependent) (15).Sample dependent measures are very common in social and behavioral sciences because of classical test theory methodology for instrument development (10,12).Classical test theory places an emphasis on the development of test items that describe an underlying construct and analyzing the items' performance collectively by examining the relationship between total scores and other metrics across groups to establish validity (15).This methodology causes scores or tests to be the focus of measurement rather than the underlying constructs that contribute to the observable phenomenon of interest (10).The contrasting approach would be to develop measures from mathematical models built from testing underlying constructs informed by an overall theory for explaining the phenomenon of interest (13,(16)(17)(18)(19).
The physical sciences provide an ideal example of this theory-driven approach for measurement development (10).An example of this approach is how temperature is measured.Physicists initially observed temperature by the expansion of liquid in glass tubes (20).This then led to a series of experiments by Daniel Fahrenheit that showed temperature manipulation yielded a reliable change in mercury's expansion (21).The amount of chemical expansion could then be mathematically modeled to derive a formula, or specification equation, for quantifying temperature (21).This methodological approach demonstrates how construct theory for an observed phenomenon (i.e., a chemical's expansion is explained by temperature) can be exposed to falsification.Karl Popper's Falsification Principle states that a theory is falsifiable or refutable if it can be disproven by experiments and empirical evidence (22).Falsification can be accomplished by testing the accuracy of a specification equation to predict changes in the construct.When contributing components to a specification equation are not able to accurately predict changes in the construct then the theory underlying the specification equation has been falsified or disproved.Daniel Fahrenheit demonstrated this by testing whether mercury expanded by the mathematically predicted amount when exposed to a specific temperature (21).The ability to expose theory to falsification provides a better understanding of the construct theory and a clear path for how to improve the instrument and theory (16).
Another critical problem with measures for gross motor development is the lack of a universal measurement unit (10,11,16,17).Measurement units are defined as a "scalar quantity, defined and adopted by convention, with which any other quantity of the same kind can be compared to express the ratio of the two quantities as a number" (23).Existing instruments are reliant on total scores that are not composed of linear or interval unit which has led to measurement scales that are independent from one another without comparable quantities (15).As a result, meaning of measurement values must be derived through relation to normative data (e.g., T-scores), predictive ability, or showing differences across groups (e.g., normal children vs. individuals with disability) rather than through a comparable unit connected to theory (10).The emergence of item response theory and Rasch measurement theory has been able to create linear and interval measurement units for many existing scales (12).However, the measurement scales derived from item response theory models are limited to local objectivity and measurement units are not analogous because of sample dependency (10).The physical sciences have been able to achieve general objectivity through imposing anchors and standardized unit sizes (10,13).For example, the Celsius measurement scale is anchored to the temperature that water freezes (0°C) and boils (100°C) with 1°C (i.e., unit size) equal to 1/100 (20).Stenner and colleagues (10,16,17) have argued that anchoring measures provides general objectivity because the unit size and scale are based on theory rather than observations from a specific sample of people (i.e., local objectivity).Anchor points that are reproducible and widely recognized provide valuable reference points for interpreting measurement values across the scale (10).They also allow for consolidating instruments by placing them on the same scale (10,11).
Given the proliferation of gross motor development measures and critical concerns related to instrument development there is an explicit need to develop an explanatory model based on theory and create a universal measurement unit of gross motor development.An explanatory model will facilitate exposing the theory of gross motor development to falsification through the creation and testing of a specification equation (10,13,16).Additionally, developing a universal measurement unit and imposing general objectivity will equate existing measures to a standardized scale and provide meaning to units that is reflective of gross motor development theory (10,11).The best example of this measurement approach being applied in the social and behavioral sciences is the Lexile measurement scale and unit (Lexile; L) for reading ability and passage difficulty developed by Stenner (10).The Lexile was developed from theory by testing the relationship between reading difficulty and various hypothesized variables thought to contribute to how challenging a passage of text is to comprehend.The Rasch measurement model calibrated passage difficulty and an individual's reading ability onto a linear, interval scale allowing the research team to quantify the phenomenon of reading comprehension.Stenner then used linear regression to derive a specification equation for testing the relationship between hypothesized variables explaining passage difficulty with actual passage difficulty (10).This process exposed their theory to falsification by testing which hypothesized variables accounted for the most variance in passage difficulty (10).Stenner demonstrated that a semantic component (mean log of word frequencies in a passage) and syntactic component (log mean sentence length in a passage) were all that was required to explain the variance in passage difficulty and, by extension, persons reading ability (R 2 = 0.85) (10).Stenner then established general objectivity through a universal unit of reading ability by algebraically anchoring the specification equation to derive Lexiles.The specification equation was anchored to the difficulty of a set of basal primer texts (200 Lexiles) and encyclopedia texts (1,200 Lexiles) with one Lexile equal to 1/ 1,000 of the distance between texts (10).The Lexile measurement scale has since equated reading comprehension tests and provided meaning for interpreting the difficulty of a book or passage and reading ability of a person (10).The Lexile scale is recognized as the standard for matching readers with texts, being reported for 65 popular reading assessments/programs (24).Over 35 million students per year receive a Lexile measure allowing them to be matched with over 100 million articles books and websites (24).
Stenner and colleagues have long advocated for the application of the methodology used in creating the Lexile to be used in social and behavioral sciences (10,17).Recently, other research groups have applied a similar methodology to Stenner.These groups include Stenner and Smith (11), Fisher (25), Hong and colleagues (26), Adroher and Tennant (27), and Melin and Pendrill (28)(29)(30)(31)(32)(33).In all cases, the research groups created a specification equation using linear, principal component regression, or linear logistic test models composed of component measures to explain the majority of variance accounted for in the observed construct of interest.Constructs of interest were quantified on linear interval scales using Rasch measurement theory models and included visual attention and short term memory with items from the Knox Cube Test (16,32,33) (31), auditory learning with items from the Rey's Auditory Verbal Learning Test (30), and balance with items from the Berg Balance Scale (28).Component measures consisted of continuous or ordinal variables that represent underlying constructs hypothesized to causally contribute to the observed phenomenon.The observed phenomenon was represented and quantified by the item difficulty hierarchies for each construct.The majority of the tested component measures were logically related to task characteristics of the items (16,(25)(26)(27)(28)(29)(30)(31)(32)(33).Recently, Melin and Pendrill have also included entropy and measurement uncertainty into their specification equations to account for additional variance (28)(29)(30)(31)(32)(33).While each group has been able to account for significant variance (77.5%-94%) (16,(25)(26)(27)(28)(29)(30)(31)(32)(33) with their specification equations, which is comparable to Stenner's Lexile work (85%) (10), there have not been any attempts to anchor their specification questions to create universal measurement units.
The body of literature above demonstrates that a theory-based approach can be used to derive mathematical measurement models in the social and behavioral sciences.Furthermore, the Lexile methodology demonstrated the ability to create a universal unit and measurement scale with general objectivity by imposing recognizable anchor points.Application of this methodology to gross motor development should provide a similar quantification of theory and a universal unit for the field.The purpose of this study was to develop a specification equation and a universal measurement unit for gross motor development anchored to well-recognized reference points.We hypothesized that our specification equation would explain significant variation in observable gross motor development.

Methods
Our methodological approach to develop a specification equation and universal measurement unit for gross motor development included the following steps: (1) selection of gross motor development tasks with observable difficulty levels that are measured on an interval scale, (2) development and scoring of component measures based on hypothesized contributions to gross motor development, (3) testing a specification equation using linear regression modeling, and (4) imposing reference points and unit size by anchoring the specification equation.All analyses were completed in SAS version 9.4.

Selection of gross motor development tasks with observable difficulty levels
We used the Gross Motor Function Measure-66 (GMFM) as a set of gross motor development tasks with observable difficulty levels.The GMFM is a clinician-observed outcome measure for quantifying child motor development (35).Items were developed with respect to theoretical motor development milestones based heavily on clinical observation of normal child development from 0 to 5 years old (35).Items include tasks that span movements in supine through jumping and hopping.Avery and colleagues (36) reduced the original GMFM from 88 to 66 items by using Rasch measurement theory to identify the set of items that best contributed to unidimensionality of gross motor development.Avery and colleagues (36) used the partial credit model to quantify observed item difficulty along a linear, interval scale using GMFM-66 data from a sample of 537 children with cerebral palsy.We defined gross motor development by increasing difficulty of gross motor tasks.Observed item

Development and scoring of component measures for motor development tasks
We identified body position, movement, and support as potential component measures for creating a formula to measure gross motor development.We developed an ordinal rating system for each component based on theoretical concepts of gross motor development and control.Each ordinal rating system was created using task analysis to facilitate reproducibility of component measure scores for raters with knowledge of movement (i.e., task) requirements.Since the tasks on the GMFM-66 are reflective of typical development (35), we selected and rated our component measures based on task characteristics performed by healthy children.Evidence to support our ordinal rating system was based on published observational studies of typical human development and pediatric rehabilitation (35,(37)(38)(39)(40)(41)(42).Body position was rated with respect to the theoretical concept that gross motor task difficulty increases as head position and a person's center of mass are further from the ground or are over a smaller base of support (37)(38)(39).Movement was rated with respect to established motor development milestones and task difficulty (37,38,40).Support was rated based on the amount of support involved in completing the task (35).We quantified support based on the concept of proximal to distal progression of motor control development and considering that more support makes motor tasks easier (41-43).Each member of our authorship team gave each item on the GMFM-66 a rating for one of the three component measures.Three of the authors reviewed the component measure ratings and identified items that did not have consensus for discussion.Items without consensus were discussed and a final decision was made.A full description of the ordinal ratings for each component are presented in Table 1.

Testing a specification equation using linear regression modeling
We used linear regression to create a specification equation for measuring gross motor development (10,16,25,26).We used all items on the Gross Motor Function Measure-66 (GMFM) as dependent variables and tested the component measures' (i.e., body position, movement, and support) ability to explain the variance in gross motor task difficulty.Pearson's correlation was used to quantify the linear association between observed task difficulty and each component measure to screen for collinearity and inform a forward stepwise approach for the regression analysis.Collinearity was quantified using the Variance Inflation Factor (VIF).We considered VIF greater than or equal to 10 as the threshold for determining whether two variables were collinear (44).If collinearity was found, one of the two colinear variables would be removed after evaluating each variable for statistical significance and theoretical implications.Forward stepwise linear regression was used to quantify the amount of  variance in observed task difficulty of items on the GMFM-66 using component measures for body position, movement, and support for each item.The final regression model was used to calculate predicted difficulty for each GMFM-66 item based on component measures.We used Pearson's correlation to quantify the agreement between observed and predicted task difficulty of each item.

Measurement unit and scale anchoring with reference points
We used the final regression model as a specification equation to define the gross motor development measurement scale and finalize a formula for measuring the difficulty of any gross motor task.We imposed reference points on the measurement scale by identifying two anchor points.We selected the item "supine brings hands to midline" as the low anchor and "walking with hands free" as the high anchor.The specification equation was used to calculate the predicted task difficulty measures for the low and high anchors.These predicted task difficulty measures were used to algebraically anchor the specification equation to the points 0 ("supine brings hands to midline") and 100 ("walking with hands free") establishing a unit size equal to 1/ 100 of the distance between lying supine and walking unsupported.

Results
Table 2 presents the component measure ratings for all items on the GMFM.All component measures have a positive, significant linear association with observed rank difficulty.Figure 1 presents scatterplots for the ratings of each component measure, body position (Figure 1A), movement (Figure 1B) and support (Figure 1C), against the item difficulty for each item on the GMFM.All correlations were significant (p < .001);body position and movement correlated with item difficulty at 0.87 and support ratings correlated at 0.62) (Figure 1).A linear regression model with all three component measures explained the most variance in GMFS item observed rank order difficulty (adjusted R 2 = 0.87; F-value = 147.01,p < 0.0001, RMSE = 6.09).None of the VIF values exceeded our threshold of collinearity.The estimated effect and standard error of each component measure on gross motor task difficulty were body position; β = 1.23,SE = 0.59 (p = 0.0415), movement; β = 1.21,SE, 0.30 (p = 0.001), support; β = 4.93, SE = 0.74 (p < 0.0001).
The final specification equation is: Predicted task difficulty has a positive, significant linear association with observed rank difficulty (r = 0.93 p < 0.001).Figure 2 presents a scatterplot of predicted task difficulty against observed task difficulty for each item on the GMFM.

Gross motor development measurement scale and unit
The predicted task difficulty for our low anchor ("supine brings hands to midline") was 14.13.We assigned a value of 0 to the low anchor.We used the following equation to derive a constant that can be used to anchor the specification's predicted task difficulty of "supine brings hands to midline" to 0: 0 þ (constant low anchor þ 14:13) ¼ 0 Gross Motor (GM) units where constant low anchor ¼ (À14:13) The predicted task difficulty for our high anchor ("walking with hands free") was 61.88.We assigned a value of 100 to the high anchor to set our unit size to 1/100.We used the following equation to derive a constant that can be used to anchor the specification's predicted task difficulty of "walking with hands free" to 100: The final equation that places predicted (i.e., theoretical) gross motor task difficulty on the measurement scale in Gross Motor units can be written using the high and low anchor constants as follows: (Task Difficulty À 14:13)Ã2:09 ¼ Gross Motor (GM) units where task difficulty is the predicted task difficulty from the specification equation.Table 3 presents the GM units for each item on the GMFM.

Discussion
We demonstrated that component measures of body position, movement, and support during gross motor tasks can explain the majority of variance in gross motor development (adjusted R 2 = 0.87) using linear regression.The findings from our analysis were concordant with previously published results where linear regression models were used; Stenner (10) 16), Fisher (25), and Hong and colleagues (26).Our explanation of 87% of the variance of our model was similar to the 85% of variance explained by the Lexile linear regression model (10) and the 83% of the variance explained by the ICF-GUE linear regression model (26).
The linear regression for the Knox Cube Test ( 16) and physical function ( 25) explained considerably more variance, 95% ( 16) and 94% (25).This may suggest that there is a better understanding of, or ability to quantify, the components accounting for task difficulty in visual attention and short-term memory, and physical function compared to our understanding of gross motor task difficulty.While linear regression is a popular method for deriving specification equations, there are groups testing other methods such as a linear logisitic model, partial least squares model, and principal component regression.Work by Adroher and Tennant (27), Pendrill (32), and Melin and colleagues (28)(29)(30)(31)33) have used these models to address limitations of linear regression when using ordinal variables or to account for more complex variables such as measurement uncertainty or entropy.
Our ability to establish a universal unit for gross motor development applied the same methodology used by Stenner (10) to create the Lexile for reading.We anchored our equation and measurement scale to lying supine (0 GM units, low level) and walking (100 GM units, high level) because, like basal level and encyclopedia level texts, these tasks are widely recognized and easily reproducible (37,(40)(41)(42).Lying supine is widely recognized as the first developmental task a child can do after birth while walking is a critical childhood developmental milestone and universal characteristic of the human experience.We created a 100-point gross motor development scale since 100 point scales are commonly understood and easily communicated between healthcare providers and patient or patient families (12).It is important to recognize that measurement scales extend to infinity in both directions despite anchoring.For example, temperature can still be measured on the Celsius scale below 0°a nd above 100°.The gross motor development scale begins at lying supine (which we identified as the lowest level of gross motor development) but extends beyond walking to more difficult gross motor tasks.This is apparent on our measurement scale with tasks like jumping and stair navigation receiving measures of 110 and 120 GM units, respectively.
The process of imposing anchor points and unit size to create universal units also provides frames of reference for interpretation of measures (11,32).Currently, interpretation of measurement values from existing instruments typically requires normative data and large group comparisons.Anchored measurement scales to well-known references and universal units removes this barrier.For example, Lexile measures are interpreted with respect to the anchor texts and Lexile unit size; "a passage with a Lexile of 1,000 is much more difficult than a text with a Lexile of 600", or "a passage with a Lexile of 800 is 600 Lexiles away from the difficulty of a basal reading text and 400 Lexiles away from the difficulty of an encyclopedia passage".Additionally, Lexiles have an interval, fixed unit size which means a change from 200 to 400 is the same "distance" and comparable to a change from 600 to 800.The gross motor development measurement scale is also linear and interval which allows for the same interpretation of GM units.For example, a child who can sit unsupported (46 GM units) is 54 GM units away from walking and a child who can walk with handheld support (79 GM units) is 21 GM units away from walking unsupported.This is also similar to the way a ruler is used to measure length and then compare the length of two objects or a change in an object's length.Theoretically, the Gross Motor unit also has a degree of general objectivity since the unit size and measurement scale is not derived from a specific sample (local objectivity) but rather the theoretical model (specification equation).This should allow GM units to be interpreted the same regardless of whether a child is healthy or has a condition that results in developmental delays.Future research is needed to confirm this capability.
The theory behind the construct of gross motor development is a narrative about how humans gain the ability to do more difficult tasks based on evidence and observation.Our work demonstrates that movement along the developmental progression can be mathematically modeled using a regression formula informed by theory that accurately predicts the gross motor task difficulty hierarchy.The extent to which the formula can accurately predict informs the theory's adequacy by testing and exposing the theory to falsification (10,16).We tested aspects of gross motor development theory by including body position, movement, and support component measures

Implications and future research
Creation of specification equations holds three key benefits for social and behavioral research (16,11).First, it states theory in a way that makes falsification possible.Second, component measures in a specification equation can often be measured with more precision ultimately reducing error as equations are refined (16).Third, experimental manipulation is feasible because items or tasks can be derived from the specification equation and their   difficulty can be tested (19).It is important to remember that as theories are exposed to falsification, specification equations can be improved through the refinement or removal of component factors and with the inclusion of new ones as greater understanding of constructs are uncovered.Furthermore, exposing constructs to falsification is the "challenge research" that should focus and accelerate future theory-based discovery (45).
The implications of creating a universal measurement unit and scale in the social and behavioral sciences using theoretical considerations independent of sampling are far reaching.First, and most important, is that a specification equation provides a means to calibrate most, if not all, items measuring a construct onto the same scale.Future research will need to test and validate our specification equation against findings from 1) other regression modeling methods, 2) other populations and 3) independent item sets like the Peabody Developmental Motor Scales and Denver Developmental Screening Test.This would establish whether all gross motor development items can be placed onto the same measurement scale in GM units.This should allow for most, if not all, existing gross motor development instruments to be anchored on a single measurement scale with linear, interval units derived from theory.Second is the ability to facilitate efficient measurement for children's gross motor development.Children can be given a GM unit measure based on logically selecting a few tasks to see if the child can accomplish the tasks or observing the child completing tasks in a natural play setting.Additionally, we no longer need to be concerned with item banks or completing all items on an instrument because any gross motor task can be created and rated using the component measures and the specification equation.Furthermore, additional component measures should be explored like measurement uncertainty, entropy, other hypothesized contributing factors to development to refine our understanding and ability to measure gross motor development (32,33).Lastly, future research should aim to identify objective or continuous variables for component measures.

Limitations
Our study has several limitations.First, our regression analysis and specification equation was derived from observed item difficulty for children with cerebral palsy (36).Future studies should observe item difficulty in a sample of typically developing children to determine if disease condition influences the regression findings.Second, we did not compare our results from linear regression to those from other regression models.Although linear regression can accommodate ordinal independent variables, the beta estimates cannot be interpreted on an interval scale (46).Researchers should be aware of this limitation when predicting changes in gross motor task difficulty from beta estimates for component measures.Third, our specification equation accounted for 87% of the variance, which indicates that there is still 13% variance unexplained.While no equation can account for 100% of the variance due to error, additional components of gross motor development theory can be tested which may improve the accuracy of the specification equation.Fourth, we did not account for measurement

Conclusion
We have demonstrated that the methodology to develop anchored specification equations, like the Lexile measurement scale, can be applied to gross motor development.We have shown that a specification equation for gross motor development can account for the majority of variance in task difficulty.Additionally, we showed that anchoring specifications algebraically can achieve general objectivity to create a universal unit of gross motor measurement (i.e., GM unit).Equipped with a measurement equation and universal unit of measurement, most, if not all, existing gross motor development instruments should be able to be calibrated to the same scale with a linear, interval fixed unit.
Table 3 presents the observed task difficulty and measure in Gross Motor (GM) units for each item on the GMFM.Items are ordered based on GM units.Less GM units indicate easier task difficulty and more GM units indicate harder task difficulty.Reference points for anchors are denoted with *.GM units are derived from predicted task difficulty calculated by the specification equation based on theory.

Figure 1
Figure 1 presents the linear association between observed task difficulty and each component measure for each item on the GMFM.Each circle represents an individual item on the GMFM.The dashed line represents the best fit linear model.(A) presents the relationship for body position rating, (B) for movement rating, and (C) for support rating.

Figure 2
Figure 2 Presents the linear association between observed task difficulty and GM units for each item on the GMFM.Each circle represents an individual item on the GMFM.The dashed line represents a line of equality (or identity line) where r = 1.0.The correlation between observed task difficulty and GM units is r = 0.93.

TABLE 1
Component measure rating system.
Table1Presents the rating system for each component measure.Gross motor tasks receive a rating for each component measure.Values for each component measure are used in the specification equation to calculate a measure in Gross Motor (GM) units for the task.

TABLE 2
, Stenner and Smith Component measure ratings for each item on the GMFM.

Table 2
(32)ents the consensus component measure ratings for each item on the GMFM.Items are listed in order of item number on the GMFS with their short name reported in Avery et al., 2003.Our equation's ability to accurately predict developmental progression demonstrates that several theoretical aspects of the gross motor development narrative and construct hold up to falsification.This methodology to measure gross motor development centers on elucidating the relationships between item characteristics and gross motor task difficulty as this provides a more thorough understanding of what critical components underly gross motor development and it's variations(32).

TABLE 3
Gross motor (GM) units for each item on the GMFM.
Seamon et al. 10.3389/fresc.2024.1243336Frontiers in Rehabilitation Sciences 10 frontiersin.orguncertainties or entropy in our specification equation.This is an emerging methodology and may allow for deeper understanding of constructs in social and behavioral sciences (28-33) Fifth, future research is needed to evaluate our measurement theory's ability to hold up to falsification.This future research should further our understanding of gross motor development and improve the accuracy and precision of our measurement specification equations.