A Researcher’s Guide to the Measurement and Modeling of Puberty in the ABCD Study® at Baseline

The Adolescent Brain Cognitive Development℠ (ABCD) Study is an ongoing, diverse, longitudinal, and multi-site study of 11,880 adolescents in the United States. The ABCD Study provides open access to data about pubertal development at a large scale, and this article is a researcher’s guide that both describes its pubertal variables and outlines recommendations for use. These considerations are contextualized with reference to cross-sectional empirical analyses of pubertal measures within the baseline ABCD dataset by Herting, Uban, and colleagues (2021). We discuss strategies to capitalize on strengths, mitigate weaknesses, and appropriately interpret study limitations for researchers using pubertal variables within the ABCD dataset, with the aim of building toward a robust science of adolescent development.


INTRODUCTION
Pubertal measures provide critical information about maturation beyond chronological age, and there is substantial variation in the timing of pubertal milestones (1,2). The Adolescent Brain Cognitive Development (ABCD) Study measures puberty across adolescence in a large and diverse sample (baseline ages = 9-10 years; annual sampling of pubertal measures for 10 years (ongoing); 21 research sites across the United States; N = 11,880; 48% female at baseline (3); see (4) for recruitment details). This represents an unprecedented opportunity to better understand relationships between puberty, sociodemographic variables, neurodevelopment, and health. The purpose of this article is to support scientists in planning and interpreting research about puberty from this dataset, particularly in light of what is known from recent analyses of the baseline data by Herting, Uban, and colleagues (5). We-a collaboration of puberty researchers including authors within and external to the ABCD consortium-seek to provide a balanced presentation of the study's strengths, limitations, opportunities (areas of growth/ innovation), and threats (issues endangering the validity of potential interpretations) as they pertain to puberty. We further explore practical considerations for planning investigations with these data, including a brief discussion of longitudinal analyses.

ABCD STUDY DESIGN: IMPLICATIONS FOR PUBERTAL RESEARCH
This section addresses aspects of the study design that are relevant to studying puberty, including sample composition and assessment frequency [for information about specific measures in the ABCD Study, see the section on Pubertal measures in the ABCD dataset; for general recommendations regarding pubertal measurement/modeling, see (6,7)]. Puberty is comprised of distinct (albeit temporally overlapping) hormonal processes that drive specific physical changes. Adrenarche involves a rise in adrenal hormones [testosterone (T); dehydroepiandrosterone (DHEA) and its sulfate], while gonadarche involves a rise in gonadal hormones [estradiol (E2) and progesterone from the ovaries; T from the testes; for a review, see (8)]. As the suffix -arche refers to first occurrences, we use the terms adrenal and gonadal processes to refer to the multiyear maturation of each endocrine axis (hypothalamicpituitary-adrenal and hypothalamic-pituitary-gonadal). The maturation of the growth (hypothalamic-pituitary-somatic) axis is associated with increases in growth hormone and regulation of overall growth and metabolism (9).

Strengths
The large sample size and narrow age range facilitate well-powered investigations related to pubertal development that can be contextualized using sociodemographic variables (5) and will likely provide a basis for normative pubertal development in the USA population at the beginning of the 21st century. By following participants over a decade, growth curves will describe average growth and typical variation across adolescence. Analyses of the baseline data suggest that there is sufficient variance to relate individual differences in pubertal development to other measures (5). Furthermore, the field of pubertal research has largely focused on girls of European descent, and examination of puberty in males and in racially/ethnically diverse participants fills in critical knowledge gaps (10,11).

Limitations
Due to participant ages at recruitment, the study is unable to capture many adrenal processes for nearly all participants and early gonadal processes for many participants-especially females (5). This limitation is more pronounced for groups that start puberty earlier relative to the average, including Black and non-White Hispanic participants (12,13), those from lower levels of socioeconomic status (14), and those with higher BMI (15) [trends replicated in the ABCD data at baseline (5)]. Puberty is classically considered complete following development of secondary sexual characteristics and capability for reproduction (roughly mid-adolescence; later for boys relative to girls). This prevailing definition is complicated by the fact that hormone levels (16) and body composition (17) mature into the late teens and 20s. The study plans to follow participants to 19-20 years of age, but this would fall short of addressing questions about how late hormonal changes may affect processes like risky decision-making at the ages when binge drinking (18) and other health risk behaviors peak (19). Additionally, pubertal development is non-linear with meaningful changes occurring at sub-annual time scales (e.g., growth spurts) that are not captured (20).

PUBERTAL MEASURES IN THE ABCD DATASET
Puberty-related variables in the ABCD Study are listed in Table 1. For descriptive statistics of these variables in the baseline data, see (5).

Strengths
The study assesses physical development using the Pubertal Development Scale (PDS), a minimally invasive, text-only measure designed for ease of administration (21). On the PDS, individuals rate their own/their child's development on a fourpoint Likert scale from "had not begun" to "already complete" with respect to specific physical characteristics (e.g., skin changes, breast development; a subset of the items was administered based on sex). The PDS is versatile, as researchers might use the mean PDS score, converted scores made to be more comparable to Tanner staging (22,23), derived scores intended to reflect adrenal and gonadal processes separately, and/or focus on a specific item (e.g., age at menarche). Collecting data from multiple informants allows researchers to use caregiver reports at earlier ages and adolescent reports at later ages [for examples, see (24,25)]. Prioritizing caregiver reports may be useful at baseline considering the large number of "I don't know" responses to several items (notably 34-42% for growth spurts; 10% for menstruation) (5); at these ages, caregivers may have greater knowledge of where adolescents are in the process of change. Adolescent reports may better reflect intimate experiences with body changes over time, particularly at later ages (26). Adolescent self-report may be an ideal measure for studies focused on the consequences of puberty for social-or self-related processes (21), and is sometimes as closely or more closely associated with hormone levels than Tanner staging via clinical examination (27,28).  Adapted from ABCD data release 2.0.1, last updated 05/06/2020 (doi: 10.15154/1506087; wave 01/baseline). The ABCD data repository grows and changes over time. The contents under NDAR Element Name (Alias) refer to abbreviations in the ABCD dataset codebook, and may be useful only to those who have successfully applied for (free) access to the data. *Sex-specific scores were calculated based on responses to the sex_at_birth variable, which only had binary response options; we note that there are various sex and gender related variables, and that a separate scale was administered to address gender identity (not described above); ** Throughout, "y"; denotes youth and "p"; denotes parent/ caregiver versions; *** Number mismatch is because youth and caregiver measures were in a different order; **** Due to space constraints, only abbreviations for repetition 1 are shown. 1 Only boys responded to these questions. This limits the documentation of girls with androgen excess who may also experience facial hair and deepening of the voice. 2 The lower limit of sensitivity is reported for each hormone in Herting, Uban, and colleagues (2021). 3 Converted Pubertal Development Scale values are not provided and must be calculated by researchers [see (5) for more information on possible calculations].

Limitations
When using the PDS and its derived scales, consider the following limitations: First, the PDS does not evaluate pubertal stage directly, and a description of the construct is better reported as "perceived pubertal stage," although it is uncommon to do so (29). While there are conversions transforming the PDS to values more comparable to Tanner staging, some recommend that the PDS should not be used when precise Tanner staging is of interest (20). Second, the PDS does not cover the full range of puberty equally well (solicits less information about earlier changes) (29). Third, the PDS exhibits systematic discrepancies with clinician ratings. Consistent with desirability effects, relatively less advanced adolescents tend to overestimate their PDS score, while more advanced adolescents tend to underestimate (27,30).

Strengths
The study provides objective measures of the quantity of biologically available DHEA, T, and E2 [in girls only; for details on hormone methods see (5)] at remarkable scale via salivary measures. From gonadarche onwards, increases in DHEA and T in males reflect adrenal and gonadal processes, respectively. In females, DHEA and T largely reflect adrenal processes, while E2 levels reflect gonadal ones. Hormone levels are not redundant with information about physical maturation. In the ABCD data at baseline, PDS summary scores were modestly correlated with hormone levels [0.12-0.20 in males; 0.10-0.34 in females (5)]. In another study of 9-14 year-olds, PDS-derived scores accounted for 35-40% of the variance in DHEA and T levels in males and 15-27% of DHEA, T, and E2 levels in females (27). Associations between circulating hormone levels and physical changes vary by factors including sex, race/ethnicity, and body mass index (5,15,31). Examining hormone levels may be particularly useful during the earliest and latest pubertal stages: Adrenal hormone levels rise prior to physical changes associated with adrenarche (32), and circulating levels of DHEA, T, and E2 rise after the ages at which adolescents reach Tanner stage V (sometimes considered the last pubertal stage) (16). Data for potential hormone quality confounds were recorded at the time of sampling, including time of day, caffeine consumption, and medication use (for a list, see Table 1); linear effects on hormone levels have been estimated at baseline (5). We encourage researchers to use transparent and reproducible processing procedures (i.e., following a pre-specified decision-tree as implemented in publicly available scripts by Herting, Uban, and colleagues (5); https:// figshare.com/articles/software/R_scripts/12673754).

Limitations
Due to feasibility issues, saliva was sampled once per visit and time of day varied widely [7 am-7 pm (5);]. Hormone levels fluctuate dynamically and non-linearly over various time-scales, and reliance on a single biospecimen renders researchers unable to account for momentary, daily/diurnal, or monthly hormonal fluctuations. Time of day is a major source of variability; early in puberty, each of these hormones exhibit non-linear diurnal rhythms with peaks in the morning, and these diurnal patterns further vary across pubertal development (33)(34)(35). Another major source of complex unmeasured variability is menstrual cyclicity in females: Even prior to menarche, cyclic changes in hormone levels can be detected, and variability in the menstrual cycle persists almost 2 years following menarche (7), with diurnal E2 rhythms attenuated approximately a year after menarche (35). Including time of day as a linear covariate may not sufficiently account for such effects. Sensitivity analyses within participants sampled during a restricted time window (and that reflect a random subsample across sites and demographics) may improve the validity of investigations employing these measures. Additionally, estimates from salivary assays are known to be less reliable at the extremes. We further note that other pubertal hormones were not assessed (notably progesterone, luteinizing hormone, and follicle stimulating hormone).

Opportunities to Advance Our Understanding of Puberty
The ABCD Study presents the opportunity to parse the relative contributions of puberty and sociodemographic variables to adolescent development. Another opportunity, arising from the narrow age range and large sample size, is to disentangle effects of puberty and age (also see the section on Considerations for Building Models Incorporating Puberty). The study may also advance the development of multimethod pubertal measurement approaches. Herting, Uban, and colleagues (5) implemented group factor analyses with the ABCD baseline data to identify latent pubertal factors while accounting for method-related variance. They found a two-factor structure accounting for a combined 47.4% of the variability in pubertal measures in females and 38.6% in males. Researchers have typically focused on physical maturation or hormone levels separately, and novel multimethod approaches may approaches may contribute to longstanding questions in the field. (However, combining methods may not be necessary, and single-method approaches may be preferable for targeted research questions-for more on variable selection, see the section on Strategizing for open and reproducible analyses). For example, one question is the extent to which obesity itself is directly linked to early pubertal development rather than systematic measurement error, particularly overestimation of breast development in girls (15,36). Group factor analyses with the ABCD baseline data found that greater body mass index was associated with higher-than-average hormone and physical maturation levels (higher Factor 1 scores), as well as more advanced physical maturation relative to hormone levels, compared to the sample average (higher Factor 2 scores). Advancement along both of these axes, relative to overall sample, is consistent with earlier pubertal development and suggests that measurement error of physical characteristics may not fully account for associations between obesity and more advanced puberty.

Threats Endangering the Validity of Potential Interpretations
Conclusions drawn from the ABCD Study will undoubtedly carry impact. However, we should be cautious when developing major conclusions regarding certain aspects of puberty. The lack of pre-pubertal female participants is more pronounced in groups found to start puberty earlier (5). These sociodemographic differences must be carefully considered in work addressing pubertal onset and/or the very earliest stages of puberty. Early pubertal onset and timing contribute to risk for health problems (37,38), and this issue may bias estimates of pubertal timing-related effects. Its extent will remain unknown until scientists and funding agencies invest widely in puberty-related research at earlier ages, particularly for Black and non-White Hispanic girls.
To mitigate general design and measurement limitations and to guard against misinterpretation, we recommend the following minimum standards for puberty research using this dataset: (a) When using physical maturation data, language reflects that the PDS is a measure of self-or parent-perceived pubertal maturation, (b) when using salivary hormone data, data are processed using a standardized and/or publicly available analysis pipeline and sensitivity analyses consider the effect of time of day on hormone levels, and (c) exude caution when drawing conclusions regarding precise pubertal stages and hormone levels, in part by acknowledging measurement limitations (see Pubertal Measures in the ABCD Dataset for details on strengths and limitations).

Strategizing for Open and Reproducible Analyses
Like other complex constructs, puberty can be operationalized and modeled in ways that impact conclusions about its effects. For example, conclusions as to whether early pubertal maturation affects adult height in females differ when maturation is defined in terms of menarche versus breast development (39). This enhanced analytic flexibility (40) can have untoward effects in null-hypothesis significance testing, some of which can be mitigated by preregistration (41). While not comprehensive, potential a priori justifications for variable selection are presented in the section on PUBERTAL MEASURES in the ABCD DATASET. Decisions might also be informed by psychoneuroendocrinology, and neuroimaging analyses might consider what is known about hormone receptor types and density in brain tissue (42). Otherwise, Specification Curve Analysis, also known as multiverse analysis, can facilitate reporting of results from multiple model specifications that are consistent with the underlying theory; this approach can estimate the robustness of effects across numerous operationalizations of pubertal development (43)(44)(45) [for an example using pubertal variables, see (46)]. Pre-specifying the smallest effect size of interest (47) and/or employing the use of discovery samples for model-fitting and replication or holdout samples for model testing might also be useful (48), especially because small effects are likely to be significant when sample sizes are large.

Considerations for Building Models Incorporating Puberty
There is no ideal or standard modeling approach, as decisions should reflect the research questions at hand. Drawing from our own work, we provide a few examples of possible approaches and highlight relevant considerations.
In one study, Ladouceur and colleagues (49) used measures of physical maturation (standardized composite scores) to separately consider effects of adrenal and gonadal processes. These scores, as well as hormone levels, were used as predictors in separate multiple linear regression models examining effects of puberty (controlling for age) on neural indices of reward processing in 10 to 13 year-olds (N = 79). This approach separately considered adrenal and gonadal processes, as well as physical and hormone measures, in order to disentangle various effects. Another study by Whittle and colleagues (50) showcased the use of exploratory factor analyses across multiple age-adjusted questionnaire items (including parent and child report PDS) to create a standardized pubertal timing measure (calculated at approximately age 12; N = 155). This measure was used in longitudinal analyses examining associations with pituitary volume and depressive symptoms. Longitudinal work by Vijayakumar et al. (51) compared nested mixed effects models (ages 9-18; N = 82) that predicted signal from a functional neuroimaging task from linear and quadratic effects of maturation (age and self-reported PDS scores converted into a Tanner-like scale). Analyses used both hormone and questionnaire measures by examining the impact of including T in best-fitting PDS models.
We recommend that researchers account for age when examining puberty (despite the narrow age window) because age was associated with each pubertal measure in the baseline data (6). Each of these studies highlights a different approach to disentangling puberty and age effects, specifically by including age as a covariate (49), creating age-adjusted pubertal indices (50), or by comparing age and puberty models (51).
Selecting and discretizing sex and gender variables requires careful theoretical and ethical consideration. Researchers should take care to use the terms sex and gender appropriately (52), to consider that they operate on developmental outcomes via different mechanisms (53), and to avoid essentializing either one as strictly binary (54). Youth and caregivers report on gender identity (55), but information about sex chromosomes or endocrine disorders is not formally acquired. In analyses highlighted above, researchers used composite or factor scores computed within sex, included sex as a covariate, and tested for interactions between sex and conditions of interest (49,50). When evaluating hormones as putative sex-specific mechanisms, authors ran separate models by sex (49,51).
Finally, physical maturation as measured by the PDS is ordinal, but many analyses (including those highlighted above) treats it as continuous; this specification is flawed because it implies that differences between all pubertal stages are equally meaningful or spaced with respect to some outcome. Some developmental shifts, e.g., in human face perception, are associated with transitions between certain pubertal stages only (56).

Opportunities and Challenges for Longitudinal Analyses
As more waves of ABCD data are released, there will be opportunities to describe sample norms and normed relative categorizations (e.g., describing timing as early, typical, or late) with accompanying hormone trajectories. Longitudinal data will also allow for new variables such as age at menarche or peak height velocity (7) and modeling techniques such as non-linear latent growth modeling (57). Challenges include measurement invariance, as pubertal measures may change in their accuracy (e.g., parent versus child reports) and/or substantive meaning (e.g., E2 reflecting diurnal versus monthly patterns). Another documented phenomenon is that some adolescents will regress on pubertal maturation as measured by the PDS (21). While commonly associated with measurement error and the coarseness of the measure (21), further investigation of regression within the ABCD Study is warranted, including consideration that adolescents' self-perceptions may change over time. Finally, as the number of self-identified sexual and gender minority youth increase longitudinally (55) there will be opportunities to study puberty in gender diverse youth, and these groups should be consulted when conducting and interpreting this research.

Conclusion
This article outlines critical considerations for investigations using pubertal data from the ABCD Study. Major strengths include the size and demographic diversity of the sample and the use of both questionnaires and salivary hormones. Limitations include the inability to investigate earlier aspects of puberty and an inability to account for multiscale hormone fluctuations, such that smaller longitudinal studies are still needed to answer open questions about puberty. Overall, we recommended that researchers using these data describe the PDS as perceived rather than objective pubertal development, use sensitivity analyses to account for diurnal hormone cycles, prioritize open and reproducible science, and account for age in their models despite the narrow age band. We hope that facilitating a better understanding of the ABCD Study's design and measures will ultimately support a stronger science of adolescent development.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Researchers can apply for access to the data at https://nda.nih.gov/abcd/request-access.