The influence of sociodemographic factors and response style on caregiver report of infant developmental status

Caregiver report is the most feasible way to assess early childhood development but is susceptible to the influences of response style and sociodemographic factors. In a sample of 571 caregiver-infant dyads (47.8% female; 48% White), we compared caregiver reports on the Ages and Stages Questionnaire-Third Edition (ASQ-3) with reports on a novel, web-based assessment, PediaTrac™. Ratings on PediaTrac correlated with ratings on the ASQ-3 at all time points (2, 4, 6, and 9 months). Caregiver age, response style, and sociodemographic factors accounted for significant variance on both measures. Developmental reporting of early childhood skills is influenced by caregiver response style and sociodemographic factors. These influences must be considered in order to ensure the accurate identification of infant developmental status.


Introduction
While there is widespread consensus that the identification of and early intervention for childhood developmental delays can improve outcomes and prevent long-term problems (1,2), there are limited reliable and valid methods available for detecting these delays within the first two years of life (3). Reliable methods of assessing children during infancy and early toddlerhood are needed to improve developmental surveillance and better target interventions (4,5).
Tests of early developmental skills with strong psychometric properties do exist (6,7). While these performance-based tests are often considered to be the "gold standard," they are costly, time and resource-intensive and require specialized personnel for administration and interpretation (8). Typically, these tests are not feasible in the context of well child medical visits, routine developmental monitoring programs, and large-scale research projects. Moreover, they are often only available to children already identified as "at risk" and to children living in more urban, or highly resourced areas.
A more feasible method of developmental monitoring and assessment is caregiver (e.g., parent) report. Caregiver report is commonly used in pediatric well child visits and developmental outcome research (4,5). Research outcomes and conclusions, as well as decisions by primary care providers to refer children for further assessment or interventions, are frequently based on caregiver reports and governmental and educational institutions utilize these types of metrics for financial and programming decisions (9). However, these tools are not without limitations, and these shortcomings are frequently underestimated, underappreciated or not considered in clinical and policy decision making and research, leading to increased measurement error at the individual level and group misassignment in research designs (8).
While caregiver reports can assess a wide spectrum of child behaviors observed in daily life by adults who spend substantial time with the child, these reports can be subject to bias. Factors contributing to bias include differing levels of caregiver knowledge about typical and atypical development (9,10), concerns about stigmatization (11,12) and cultural differences in what constitutes normal and abnormal development (12)(13)(14). High levels of caregiver stress can also influence reports of a child's developmental status (15,16). Some caregiver report measures available for children over 2 years of age include embedded measures of response style (17,18). However, these metrics are not routinely available and, to our knowledge, have not been included in any caregiver report of infant and toddler development (6,19,20).
In addition to response style, caregiver reports are also influenced by sociodemographic factors. While one might anticipate that children with less socioeconomic advantage [e.g., lower levels of caregiver education or other indices of socioeconomic status (SES)] would receive less favorable developmental ratings than those from more advantaged backgrounds, findings from several studies fail to confirm this expectation. Two studies found that caregivers with lower levels of education tended to rate their children higher on a developmental language inventory than those with higher levels of education (21,22). Several other studies have also shown that caregivers with lower SES may tend to rate their child's abilities more favorably than those from higher SES backgrounds (23)(24)(25)(26)(27). These findings are theorized to be due to concern about the child "failing" the measure, avoidance of stigmatization, differing levels of knowledge about early development or developmental expectations, or cultural differences in interpretation of item content. Despite these findings, there is no early developmental assessment tool that is capable of systematically accounting for sociodemographic factors in score interpretation.
Some researchers have suggested that caregiver report is not sufficiently sensitive or specific to early signs of delayed development to justify its widespread use (8,28,29); however, a recent investigation suggests that caregivers of term and preterm infants can report newborn motor abilities with high reliability (30). Data that are inaccurate can lead to diagnostic errors, the over-or under-identification of children in need, or misleading data for financial or policy programming. However, as caregiver report is likely the most feasible, cost and time efficient way to identify children early, as well as the vital nature of early identification and intervention, further efforts are warranted to improve the accuracy of this method of early developmental assessment.
PediaTrac™, is a multi-dimensional, online survey tool constructed with contemporary item response theory (IRT) modeling methods to monitor and track infant and toddler development (30). PediaTrac queries caregivers about their child's development at multiple time points, measures caregiver response style, and gathers additional information known to influence caregiver report, including sociodemographic factors.
The current study focused on assessment of early social, communication, and cognitive skills. These domains were selected based on their greater sensitivity to caregiver characteristics and response style. Specifically, research has shown that caregiver report of language and cognitive functioning may be more subjective than more directly observable domains, such as motor skills (31)(32)(33).
In order to examine the impact of response style and sociodemographic characteristics on caregiver report, we first attempted to establish that PediaTrac, a recently developed measure, was measuring early childhood developmental constructs in the same manner as another established and widely used caregiver report measure. To establish convergent validity, we compared the PediaTrac Social/Communication/ Cognition domain (SCG) to comparable scales of the Ages and Stages Questionnaire-Third Edition (ASQ-3; Personal-Social, Communication, and Problem Solving). We hypothesized that the convergent validity of the PediaTrac SCG domain would be demonstrated by its association with caregiver ratings on the three ASQ-3 scales. We then hypothesized that significant variance in caregiver report on both measures would be accounted for by sociodemographic factors (i.e., maternal age and education level) and caregiver response style.

Participants
This investigation is part of the larger PediaTrac study, a prospective, longitudinal investigation of a sample of 571 caregivers of infants (48% female) who were born either at term (n = 331; 49% female) or preterm (n = 240; 46% female) (30). The sample was recruited from three sites that included academic medical centers and a local community clinic: 100 from Site #1, 239 from Site #2, and 232 from Site #3. Site #1 did not have access to a preterm sample and recruited only term caregiver/infant dyads from an urban academic medical center and a socio-demographically high-risk community clinic to ensure representation of systematically excluded communities in our larger term sample. Sites #2 and #3 were large academic medical centers from which both term and preterm infants were recruited, the latter from which also a socio-demographically high-risk population was recruited. Term infants had a gestational age (GA) of ≥37 weeks at birth and a minimum birth weight of 2500 g, with no history of prenatal or intrapartum complications, neonatal abstinence syndrome, neurological injury/disease, or known genetic disorder. Preterm infants had a GA of <37 weeks. Birth weight was allowed to vary, but exclusions from the preterm group included neonatal abstinence syndrome and Down syndrome. A single infant from each multiple birth was randomly selected for enrollment. Caregivers were a minimum of 18 years old and had access to a personal device such as a smartphone, tablet, or computer. Ninety-eight percent of the respondents were biological mothers. Englishlanguage proficiency was required for participation. All American Psychological Association (APA) ethical guidelines were followed and Institutional Review Board (IRB) approval was obtained.

Study procedures
Caregivers were recruited in their last trimester of pregnancy, after their infants' birth in the hospital, or at their first newborn visit, with consent obtained after birth. Primary caregivers of term infants completed PediaTrac soon after birth, whereas caregivers of preterm infants completed it when their infants reached a postmenstrual age of 39 weeks. Sampling periods were thus based on the corrected age for preterm infants.

Study measure and variables
PediaTrac v3.0 is a web-based survey comprised of between 511 and 558 unique items covering the age range from birth to 18 months (30). Information describing the original item bank and domain development, expert panel reviews, interviews with caregivers, and the pilot validation results of PediaTrac 2.0 have been previously published (34). In PediaTrac v3.0, caregivers complete subsets of the survey ranging from ∼220-340 items, depending on the sampling period of the assessment. PediaTrac queries multiple developmental domains, including Feeding/Eating/Elimination, Sleep, Motor, Social/ Communication/Cognition (SCG), Early Relational Health and Social/Sensory Information Processing, at each of 8 sampling periods [newborn (NB) and 2, 4, 6, 9, 12, 15, and 18 months].
PediaTrac has been developed using item response theory (IRT) methodology, which is a measurement framework that uses mathematical models to explain the relationship between latent traits (attributes) and their observed outcome. IRT models the likelihood of a given response to an item as a probabilistic function of the individual's score on a latent trait of interest, referred to as theta (35). IRT offers benefits over classical test theory (CTT) including sample-invariant parameter estimates (i.e., assuming no differential item functioning across populations) to metrics of reliability at both the item and test level (36)(37)(38). It is an item-oriented rather than a test-oriented test construction method. As such, it lends itself to an individualized medicine approach in assessment and subsequent care.
The focus of this investigation is on the SCG domain across 2, 4, 6, and 9 month sampling periods. IRT modeling was used to estimate theta (θ), an index of the latent trait of social/ communication/cognition, for each infant at each period using the SCG domain items (35,39). Mean theta values are on a scale similar to a z-score; a distribution centered at zero with a standard deviation metric (40). Reliability estimates ranged from .97 to .99 across all time periods and the dimensionality of the items at each sampling period has been established via exploratory factor analyses (under review).
Survey questions about sociodemographic characteristics, including maternal age and level of education, were completed during the NB period, with relevant information updated at all subsequent assessments. The degree of neighborhood deprivation was also calculated for all participants using the 2018 Area Deprivation Index (ADI), which is a validated, neighborhood-level composite of 17 education, employment, housing-quality, and poverty variables extracted from the American Community Survey and US Census Survey data (41). The ADI is represented as a state decile ranking score ranging from 1 to 10, with the least resourced neighborhoods, or census block groups, characterized by higher scores and the most resourced by lower scores.
Thirty-two validity items were specifically developed for PediaTrac to help assess response style and ensure that any variabilities in responding, if present, are accounted for in the prediction models. Validity items target three potential sources of distortion: atypical responding (ATP; previously referred to as random or RND), positive (PRS) and negative (NRS) response style. Atypical responding is operationalized as unusual endorsement of statements that have an obvious answer. The logic behind the ATP scale is that all bona fide examinees who are literate, proficient in English and attend to item content should be able to choose the one correct option. The NRS scale consists of items that provide an evaluative statement of the infant in the negative direction (i.e., indicating harsh judgment or an overly pessimistic outlook on the child's future). Conversely, the PRS scale consists of items that provide an evaluative statement of the infant in the positive direction (i.e., indicating unrealistically positive opinion or an overly optimistic outlook on the child's future). Caregivers were required to respond to a 5-point Likert scale with response anchors as follows: 1 = never; 2 = rarely; 3 = sometimes; 4 = often; 5 = always. Whether a given validity item endorsement was indicative of noncredible responding was determined based on the frequency of that response in the sample. If <10% of the individuals endorsed the item, it was considered invalid (i.e., strong evidence of non-credible responding). Items were scaled such that higher scores represented increased deviation from a more typical response pattern (42).
The Ages and Stages Questionnaire (ASQ-3) is a caregiver report for children 1-66 months of age and is one of the most widely used developmental caregiver instruments (20). The ASQ-3 is comprised of five scales (Communication, Gross Motor, Fine Motor, Problem Solving, and Personal-Social) as assessed using 21 questionnaires, one for each 2 to 3 month age interval. For the purpose of the current project, only Communication, Problem Solving, and Personal-Social scales were utilized in the analyses given that the content of these scales most logically maps to the SCG domain of PediaTrac. The ASQ-3 provides cut-off scores that designate either no concerns, the need for monitoring, or the need for further assessment. However, for the purposes of the current study, ASQ-3 scores were converted to z-scores based on means and standard deviations reported in the administration manual. For infants born preterm, caregivers completed the age-corrected ASQ-3 measure.

Statistical analyses 2.4.1. Preliminary analyses
Descriptive and exploratory analyses were conducted to examine the impact of covariates such as demographic (i.e., caregiver age at study enrollment) and sociodemographic characteristics (i.e., caregiver education, ADI), and social/ communication/cognition theta values and ASQ z-scores for Communication, Problem Solving, and Personal-Social scales at the 2, 4, 6, and 9 month sampling periods.

Main analysis
First, to demonstrate convergent validity between the established scales of the ASQ-3 and the PediaTrac SCG domain, Pearson correlation coefficients were computed to examine the relationships between caregiver-reported social/ communication/cognitive skills using the PediaTrac SCG domain thetas and z-scores for the ASQ-3 scales at each sampling period. This was a necessary first step in order to ensure that the possible effects of sociodemographic and response style could be validly interpreted as impacting related outcomes and caregiver report generally.
Second, and the primary aim of the investigation, to examine the role of sociodemographic variables and caregiver response style on SCG domain theta and ASQ-3 scales, separate cross-sectional linear regression models were conducted at each sampling period. Caregiver response styles (positive, negative or random) were the predictors in the regression models and the SCG domain thetas and z-scores for the ASQ-3 scales were the outcomes. All models were adjusted for sociodemographic characteristics (i.e., caregiver age, infant age, ADI), as well as caregiver race, which was dichotomized given that it was a categorical variable and the nature of the analyses (Black vs. non-Black and White vs. non-White). Infant age in weeks (uncorrected for premature infants) was used in analysis of age effects to obtain a more precise estimate than would be possible using sampling period as a proxy for age. Post hoc partial correlations were used to examine whether the relationship between caregiver response style and PediaTrac (SCG domain theta) outcomes were moderated by SES, for linear regression models that showed significant main effects.

Sample characteristics
The current study sample included 571 caregivers-infant dyads. Caregivers had a mean (standard deviation) age of 30.1 (6.04) years; 53.5% were married; and 76.7% had some college or higher education. Forty-eight percent of the infants were female, 58% were born full-term, and 34% were African American/Black. See Table 1 for descriptive statistics for the full sample. Table 2 reports descriptive statistics of the SCG domain theta scores and ASQ-3 domain z-scores. Regarding interpretation of the response style scales, the cutoff associated with the top 5% of the distribution (i.e., most deviant scores) for the PRS, NRS, and ATP response styles are ≥7, 5, and 5, respectively. The base rates of "failure" (%) on the PRS, NRS, and ATP scales at these cutoffs was 9.1, 5.8, and 8.0, respectively.

Bivariate Relationship between SCG domain theta and ASQ-3 domain z-score
SCG domain theta scores were significantly positively correlated across all sampling periods with z-scores for ASQ-3 Communication (rs range .41-.53), Personal-Social (rs range .33-.44), and Problem Solving (rs range .34-.36) scales. See Table 3.

Multivariable linear regression models
Separate multivariable linear regression models were run to examine the impact of sociodemographic variables and caregiver response style on SCG domain theta and ASQ-3 scales at each sampling period. All models included maternal education, maternal age, maternal race, infant age, and ADI state rank, in addition to the three caregiver response styles (NRS, PRS, ATP). Overall model results and significant associations are detailed below, full results are in Table 4.

Caregiver response styles
NRS and ATP were not significantly associated with the SCG domain thetas at any sampling period. However, at 9 months for ASQ-3 Communication and Problem Solving, overall models indicated that NRS and ATP explained a significant moderate to large proportion of variance, R 2 = .17, F(9, 425) = 9.66, p < .001, and R 2 = .04, F(9, 423) = 2.17, p = .02, respectively, with maternal age also explaining a significant proportion of variance in ASQ-3 Problem Solving at 9 months. NRS was significantly negatively associated with ASQ-3 Communication and Problem Solving at 9 months (b = −.08, t = −.30, p = .003 and b = −.07, t = −2.49, p = .01, respectively), indicating that higher negative perceptions of infants were associated with lower reported ASQ-3 communication and problem solving abilities.
In addition, at 9 months, the overall model for ASQ-3 Personal Social indicated that ATP and maternal age explained a moderate significant proportion of variance, R 2 = .05, F(9, 424) = 2.4, p = .01. As such, at 9 months, ASQ-3 Communication, Personal Social and Problem-Solving z-scores were positively significantly associated with ATP (b = .05, t = 2.25, b = .07, t = 2.76, and b = .06, t = 2.15, respectively; all ps < .01), indicating that higher random responding was related to higher reported communication, personal-social and problemsolving skills at 9 months. See Table 4 for details.
The overall model for ASQ-3 Communication at 4 months indicated that both ATP and PRS at 4 months explained a significant moderate proportion of variance, R 2 = .07, F(9, 472) = 4.11, p < .001. Here, ATP was inversely significantly associated with ASQ-3 Communication at 4 months (b = −.07, t = −2.92, p = .004), suggesting that higher random responding was related to lower reported communication at 4 months. All SCG domain models indicated that PRS explained a significant moderate to large proportion of variance, R 2 range = .16-.22, ps < .001. In addition, overall models indicated that PRS explained a small but significant proportion of variance in ASQ-3 Personal-Social model at 2 months, R 2 = .04, F(9, 482) = 2.45, p = .01, and significant moderate proportion of variance in ASQ-3 communication at 4 months R 2 = .07, F(9, 472) = 4.11, p < .001. In these models, PRS was significantly positively associated with SCG domain thetas at all sampling periods (bs range .06-.10, ps < .001), ASQ-3 Personal-Social at 2 months (b = .04, t = 2.03, p = .04) and ASQ-3 Communication at 4 months (b = .06, t = 3.16, p = .002), indicating that higher positive perceptions of their infants were related to higher reported PediaTrac social/communication/cognition abilities and ASQ-3 personal-social and communication skills at 2 and 4 months.
With respect to other sociodemographic factors, results of adjusted multivariable regression models indicated caregiver race, along with ATP and NRS, accounted for a significant large proportion of variance in ASQ-3 Communication scores at 9 months, R 2 = .17, F(9, 425) = 9.66, p < .001. Here, non-White caregivers reported higher ASQ-3 Communication scores at 9 months compared to White mothers only (b = −.38, t = −2.21, p = .03).
Finally, preterm birth (i.e., as measured by weeks since date of birth), along with PRS, explained a significant large proportion of variance in SCG theta at 9 months, R 2 = .16, F (9, 475) = 10.10, p < .001. Preterm birth was related to lower SCG theta values at the 9 month sampling period (b = −.0005, t = 3.21, p = .001) in the adjusted regression model. That is, the more preterm the infant was at testing, the lower their SCG abilities were rated, despite administration of the agecorrected version of the test. Preterm birth was not associated with lower ASQ-3 domains in any sampling period.

Post-hoc analyses
To further explore the significant relationship between SCG domain thetas, PRS and sociodemographic characteristics, posthoc partial correlations were computed for models with significant main effects from the multivariate linear regression analyses.
At all sampling periods, a significant amount of variance (rs range = . 30-.35) remained between SCG domain theta and PRS, after accounting for the effects of sociodemographic variables (i.e., caregiver education, ADI, infant age). However, the correlation between PRS and infant age (uncorrected for premature infants) at the 9 month sampling period was not significant when accounting for the main effect of SCG domain theta (partial r = −.06, p = .15); and PRS and infant age were not correlated when accounting for the relationship between PRS and SCG domain theta.
Significant main effects for caregiver response patterns and sociodemographic characteristics were found for the ASQ-3 Communication domain at 9 months. Therefore, partial correlations were computed to examine the associations of ASQ-3 Communication at 9 months with NRS and ATP, accounting for the main effects of maternal race (White vs. non-White (i.e., Black or African American, multiracial) and ADI. Caregiver race (White vs. non-White) was unrelated to ASQ-3 Communication domain after removing the effects of NRS (partial r = −.33, p = .11). Similarly, ADI was unrelated to ASQ-3 Communication at 9 months after removing the effects of NRS (partial r = .31, p = .09). Additionally, maternal race (White vs. non-White) and ADI were unrelated to ASQ-3 Communication after accounting for the effects of ATP at 9 months (partial rs = −.31 and.30, ps < .10 for maternal race and ADI, respectively). Associations of ASQ-3 Communication with NRS or ATP at 9 months remained significant even after accounting for maternal race or ADI in respective partial correlations.

Discussion
As expected, convergent validity was established between the Social/Communication/Cognition domain of the recently developed PediaTrac, with theta scores consistently and positively correlated with the derived z-scores of the Personal-Social, Communication, and Problem Solving scales of the more widely used and established ASQ-3 at the 4, 6, and 9 month sampling periods. Caregiver response style, maternal age, and level of neighborhood deprivation, accounted for significant variance in outcomes on both measures, demonstrating the importance of inclusion of these variables on newly developed measures, as well as for measures already in widespread use.
While the base rates of caregiver positive, negative or atypical response styles were low in the overall sample, these response tendencies accounted for significant variance in the ASQ-3 and PediaTrac caregiver ratings of infant development. PRS accounted for significant variance in caregiver ratings on the PediaTrac SCG domain at all sampling periods and on one ASQ-3 scale at a single sampling period. In contrast, although NRS and ATP were unrelated to PediaTrac caregiver-reported social/communication/cognitive development, both measures of response style were associated ASQ-3 at one or more sampling periods. Maternal age and area deprivation were also associated with PediaTrac and ASQ-3 scores, such that being a younger caregiver and living in an area with more socioeconomic deprivation were associated with caregiver ratings of better infant developmental ability. ASQ-3 Communication scores were also higher for infants of caregivers who identified as non-White compared to those identified as White but only at one sampling period.
Some studies have found that performance-based assessments yield higher rates of developmental problems than caregiver report, suggesting under-identification of these problems based on caregiver report only (43). Prior research has identified at least two factors that can compromise the validity of caregiver report including response style (44) and sociodemographic factors (10, 12, 45). However, to our knowledge no existing caregiver report of early development considers these factors in interpreting caregiver ratings. The findings of this study demonstrate the importance of considering these factors into caregiver report and accounting for them in drawing conclusions about children's actual level of developmental functioning.
Because response style and sociodemographic information are not typically incorporated into screening measures of early infant development, normative reference groups may not provide accurate estimates of an individual child's abilities. The absence of objective metrics on response style may also lead to the misidentification of children and to inaccurate estimated of the base rates of developmental problems in young children. The possibility of misidentification of developmental problems is particularly urgent in view of the fact that caregiver report is typically the only feasible method for formally assessing an infant's developmental status and milestone acquisition as part of pediatric well child visits. Integrating reporter response style and sociodemographic characteristics into the most commonly used assessments is thus well justified as a means for improving the accuracy of these reports. Inclusion of measures of response bias is also consistent with the need for accurate and systematic early childhood developmental surveillance systems (4,5).
The present study has several noteworthy limitations. First, although caregiver reports were collected longitudinally through 24 months of age, the current investigation focused on cross-sectional findings during early infancy, limiting our ability to examine potential differences in the nature and implications of response styles and sociodemographic factors across a wide span of early childhood development (46). Additionally, the study design of PediaTrac excluded caregivers under 18 years of age, which may have limited our ability to capture the full range of influence of younger caregiver age on caregiver report. Lastly, restriction of the study sample to caregivers with English language proficiency and functional literacy skills may also have restricted our ability to identify the full range of sociodemographic influences on caregiver reporting.

Future directions
The present study demonstrates the need to take caregiver response styles and sociodemographic factors into account in interpreting caregiver ratings of infant development. Although additional investigation of the effects of these factors on caregiver ratings is planned as part of the larger PediaTrac study, further research is needed to better understand the factors that contribute to variations in response style. A deeper understanding of the reasons for the different ways in which caregivers approach child developmental reporting and an understanding of how persons from various backgrounds approach caregiver questionnaires would allow for more inclusive screening of infants and improve the accuracy of these data.
The development of PediaTrac continues as an ongoing longitudinal, multi-site investigation. To reduce burden and enhance utilization and clinical acceptance, we anticipate designing an adaptive [i.e., computer adaptive test (CAT)] version of PediaTrac that caregivers can complete on an iPad or other digital source that can be integrated into the electronic medical record (EMR) system, and for which trajectories of development can be visualized in real-time. The investigators are currently using a host of data analytic methods to develop complex algorithms for which sociodemographic and response style sources of variation can be corrected or systemically accounted for at the time of assessment so that more precise and individualized estimates of infant or toddler developmental status (e.g., motor, social/communication cognition, sleep, etc.) can be obtained.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by Multi-site reliant IRB at the University of Michigan. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.