Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol., 06 February 2026

Sec. Pediatric Endocrinology

Volume 17 - 2026 | https://doi.org/10.3389/fendo.2026.1741927

This article is part of the Research TopicMusculoskeletal Markers of Healthy Development in YouthView all 4 articles

Automated bone age assessment in rare pediatric growth disorders: a comparative study using Deeplasia

Kyra Skaf&#x;Kyra Skaf1†Minu Fardipour&#x;Minu Fardipour1†Philipp SchmidtPhilipp Schmidt1Eike BolmerEike Bolmer2Alexandra KellerAlexandra Keller3Christina LampeChristina Lampe4Julian JurgensJulian Jurgens5Mona LindschauMona Lindschau6Katja PalmKatja Palm1Sophie RuckdeschelSophie Ruckdeschel1Behnam JavanmardiBehnam Javanmardi2Klaus Mohnike*Klaus Mohnike1*
  • 1Medical Faculty, Otto-Von-Guericke-University Magdeburg, Magdeburg, Germany
  • 2Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
  • 3Kinderzentrum Am Johannisplatz, Leipzig, Germany
  • 4Centre for Rare Diseases, University Hospital of Giessen, Giessen, Germany
  • 5Division of Pediatric Radiology, Department of Radiology, University Hospital Hamburg, Hamburg, Germany
  • 6International Center for Lysosomal Disorders (ICLD), Department of Pediatrics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

Objective: Bone age (BA) assessment is essential for monitoring growth and maturation and guiding therapeutic interventions. While deep learning (DL) models offer high-speed automated BA prediction, their generalizability to rare pathological and diagnostically complex populations remains a significant concern. This study aims to validate the open-source DL system Deeplasia on external data from pediatric patients with various syndromic, endocrine, and lysosomal storage disorders (LSDs) and to compare its accuracy and consistency against multiple expert human raters.

Methods: We retrospectively assembled 1,138 hand radiographs from multiple centers, including patients with SHOX deficiency; Noonan syndrome; Silver–Russell syndrome; Ullrich–Turner syndrome; pseudohypoparathyroidism; congenital adrenal hyperplasia (CAH); precocious puberty and precocious pseudopuberty (cohort 1); mucopolysaccharidosis types I, II, III, IV, and VI; alpha-mannosidosis; and unclassified LSDs (cohort 2). For each radiograph, BA was evaluated using the Greulich and Pyle method by two to five human experts to obtain a mean BA reference. Model performance was assessed using the mean absolute error (MAE), root mean squared error (RMSE), and 1-year accuracy for each cohort and underlying conditions, sex, and age groups. Furthermore, Deeplasia’s performance was compared with that of individual raters by testing each rater and the model against the remaining experts.

Results: Deeplasia achieved a mean MAE of 5.95 months, an RMSE of 8.01 months, and a 1-year accuracy of 89.9% for cohort 1 (endocrine and syndromic conditions). For cohort 2 (lysosomal storage disorders), Deeplasia achieved a mean MAE of 7.13 months, an RMSE of 9.56 months, and a 1-year accuracy of 81.2%. In direct comparisons between Deeplasia and individual raters tested against the remaining experts, Deeplasia outperformed all human raters.

Conclusion: Deeplasia was validated as a highly consistent, robust, and reliable tool for BA assessment in complex cases. It demonstrated superior accuracy compared with individual human raters and may assist clinicians in BA evaluation.

1 Introduction

Bone age (BA) assessment is a fundamental tool in pediatrics, orthopedics, and forensic medicine, providing a proxy for skeletal maturity crucial for diagnosing growth disorders, monitoring pubertal development, and guiding therapeutic decisions such as growth hormone treatment (114). Delayed ossification may indicate underlying nutritional deficiency, endocrine dysfunction, or developmental disorders, whereas accelerated maturation may signal precocious puberty or other endocrine pathologies. As such, this method may serve as a practical screening tool for pediatricians to identify children requiring further evaluation or referral. Reliable and reproducible methods for BA assessment have been pursued for decades (1518). Traditionally, BA is evaluated using left-hand and wrist radiographs via atlas-based methods such as the Greulich–Pyle (GP) atlas (19) or scoring-based methods like the Tanner–Whitehouse (TW) system (17, 18, 20). While the GP method is widely used due to its relative simplicity, it is inherently subjective and prone to significant inter- and intraobserver variability (9). TW methods are also time-intensive, limiting their efficiency in busy clinical workflows. The rapid evolution of deep learning (DL) and convolutional neural networks (CNNs) has since transformed the field, especially following the 2017 RSNA Pediatric Bone Age Challenge (21). Modern CNN-based approaches consistently achieve mean absolute errors (MAEs) often below 6 months, in many cases outperforming traditional manual scoring methods (3, 4, 14, 22, 23). Crucially, the fully automated nature of these systems offers a substantial gain in clinical workflow efficiency, reducing assessment time from minutes to seconds. The intense drive for automated, objective assessment has fostered the development of a wide range of specialized deep learning architectures (2428). Beyond simply automating existing methods, research has explored advanced CNN applications to maximize accuracy and robustness: Architectures such as MABAL (machine-assisted bone age labeling) (28) and models utilizing regression CNNs (24) have been employed for direct age estimation. Other advanced methods have focused on improving performance by mimicking the radiologist’s diagnostic workflow using stacked neural networks (26) or by optimizing feature extraction from the entire radiograph (27). This collective effort, including performance evaluations across large datasets (28), highlights the field’s vigorous pursuit of reliable and scalable AI solutions. Despite these advances, most AI models for BA assessment are predominantly trained and validated on standardized datasets such as the RSNA challenge set, which primarily includes images from healthy populations (2123, 29). Consequently, these models may exhibit reduced generalizability when applied to real-world clinical radiographs, particularly those from pediatric patients with underlying pathological skeletal changes (30), growth disorders (22, 23, 31), or dysplasias (32, 33). To address this critical limitation, Rassmann et al. (34) introduced Deeplasia, an open-source deep learning-based BA assessment system specifically validated on skeletal dysplasias. In the present work, we evaluate the performance of Deeplasia on two external, diagnostically complex clinical cohorts with various syndromic, endocrine, and lysosomal storage disorders. By comparing Deeplasia against multiple expert human raters, this paper aims to rigorously assess the system’s robustness, accuracy, and reliability under real-world clinical conditions and to demonstrate its potential for effective and efficient deployment in specialized pediatric radiology.

2 Materials and methods

2.1 Study design and cohorts

The study included two pediatric cohorts: cohort 1, comprising syndromic and endocrine disorders, and cohort 2, comprising lysosomal storage disorders. Radiographs were evaluated by human raters using the Greulich–Pyle method. In cohort 1, each radiograph was independently assessed by up to three raters, while in cohort 2, evaluations were performed by five raters. The panel of raters intentionally covered a broad spectrum of clinical expertise, ranging from medical students and residents in pediatrics to senior pediatric endocrinologists with more than 30 years of clinical experience. This design allowed for the assessment of interrater variability across different levels of training. In parallel, BA was also predicted by Deeplasia, a deep learning-based algorithm, which provided an independent reference estimate of skeletal maturity. For comparative analyses, patients were stratified by sex and further subdivided into four age groups based on chronological age at the time of imaging: 1) younger than 48 months (<4 years), 2) 48–96 months (4–8 years), 3) 96–144 months (8–12 years), and 4) older than 144 months (>12 years). This stratification was chosen to reflect clinically relevant developmental stages of childhood and adolescence, thereby facilitating both sex-specific and age-specific comparisons of bone maturation patterns across diagnostic groups. Because patient age at the time of image acquisition was not available for all images, complete age group assignment was not possible; therefore, all patients with unknown chronological age were assigned to an additional group 5.

2.1.1 Cohort 1

This cohort contains 950 left-hand radiographs from patients with congenital adrenal hyperplasia (n = 95), precocious pseudopuberty (n = 11), precocious puberty (n = 96), pseudohypoparathyroidism (n = 30), Noonan syndrome (n = 116), SHOX deficiency (n = 292), Silver–Russell syndrome (n = 69), and Ullrich–Turner syndrome (n = 241) diagnosed and treated at the University Hospital of Magdeburg and a specialized pediatric endocrinology consultation at Leipzig, Germany. Radiographs were acquired between 2006 and 2024. Patient ages at the time of imaging spanned from 1.1 to 18.5 years. The sex distribution was 580 female patients and 370 male patients. The distribution in age groups was n = 51 for group 1, n = 216 for group 2, n = 261 for group 3, and n = 220 for group 4.

2.1.2 Cohort 2

This cohort comprises 186 hand radiographs from patients with mucopolysaccharidosis (MPS) I (n = 31), MPS II (n = 19), MPS III (n = 24), MPS IV (n = 28), MPS VI (n = 15), alpha-mannosidosis (n = 26), and unclassified lysosomal storage diseases (n = 43). Pseudonymized radiographs were collected from the pediatric departments in Hamburg-Eppendorf (2009–2022), Münster (2010–2015), Giessen (2016–2022), and Bochum (from 2014). In addition, a large collection was provided by Prof. J. Spranger, whose archive contains historic patient records from 1961 to 1988. Patient ages at the time of imaging spanned from 0 to 40 years. The sex distribution was 69 female patients and 117 male patients. The distribution in age groups was n = 46 for group 1, n = 48 for group 2, n = 26 for group 3, and n = 30 for group 4.

2.2 AI system: Deeplasia

Deeplasia (34) is a state-of-the-art prior-free deep convolutional neural network developed for the fully automated estimation of skeletal maturity from pediatric hand radiographs (34). In contrast to traditional atlas-based or explicit region-of-interest (ROI) approaches, the system utilizes an end-to-end learning strategy that enables it to capture both global and local image features in a data-driven manner, without relying on handcrafted priors. This design choice is critical for generalizing to atypical skeletal morphology, frequently encountered in patients with syndromic and dysplastic disorders. The core models were trained on the Radiological Society of North America (RSNA) Pediatric Bone Age dataset, which comprises over 14,000 radiographs, utilizing the Greulich–Pyle method as the consensus reference standard. The network architecture is based on EfficientNet variants. To account for known sex-specific maturation patterns, Deeplasia integrates sex information as a binary covariate, embedded and concatenated with the image features prior to the final regression. The system operates as a robust ensemble of three deep CNN variants derived from the described architecture. The final prediction is generated as the mean of these three independent model estimates. The process includes an initial automated hand-segmentation step to normalize image input and minimize the influence of artifacts, labels, or borders. As an open-source AI, the entire pipeline, including preprocessing masks and source code, has been made publicly available (34) to ensure transparency and reproducibility. Deeplasia has been previously validated on external clinical cohorts, demonstrating high test–retest precision suitable for longitudinal applications and high accuracy in patients with genetically confirmed skeletal dysplasias (32, 34).

2.3 Statistical analysis

Accuracy was assessed using the MAE, root mean squared error (RMSE), and the proportion of predictions within ±12 months (1-year accuracy). The relationship between Deeplasia and the reference standard was graphically assessed using scatter plots and Bland–Altman analysis (see Figures 1-3). Subgroup analyses were conducted based on the diagnosis, sex, and age group definitions, outlined in Section 2.1 (see Tables 1, 2). For cohort 1, while a complete review by three human raters was available for the subset of n = 849 images, the remaining 101 images were assessed by two raters, and all 950 images were included in the overall statistical analysis by using the mean BA of the available raters as the reference. To enable a fair comparison of the performance of Deeplasia with that of individual human raters, we employed a “leave-one-out” cross-validation procedure. Specifically, the mean BA assessed by the other experts served as the reference standard. We then computed the MAE, RMSE, and 1-year accuracy for both the excluded rater and Deeplasia, relative to this specific mean BA. This procedure was repeated for every individual rater in both cohorts (see Tables 3, 4). All images and their corresponding BA estimations were included in the statistical analysis. Outliers, as indicated in Figures 13, were defined as images for which the absolute difference between the Deeplasia prediction and the mean expert consensus BA exceeded 30 months. These BA estimations and predictions were subsequently examined individually (see Table 5).

Figure 1
Scatter plots labeled A and B show bone age differences against mean bone age in months, each point representing various medical conditions.   In plot A, symbols denote different lysosomal diseases, while in plot B, they indicate congenital adrenal hyperplasia and types of precocious puberty. Horizontal lines mark mean differences and standard deviations. A legend provides symbol definitions.

Figure 1. Comparing the performance of Deeplasia and expert raters in cohort 1 “syndromic and endocrine disorders” for conditions with accelerated bone age. Panel (A) shows a scatter plot comparing Deeplasia bone age predictions with expert evaluations for disorders with accelerated growth, with some outliers. Panel (B) shows the corresponding Bland-Altman plot, displaying the difference between Deeplasia and the expert evaluations plotted against their average. Horizontal lines indicate the mean bias and limits of agreement, with a small number of outliers.

Figure 2
Chart A displays bone age differences between Deeplasia and expert mean, plotted against mean bone age in months for various diagnoses, with standard deviation lines. Chart B compares Deeplasia's bone age prediction to expert evaluations in months, showing data points for different medical conditions with a reference line of y=x.

Figure 2. Comparing the performance of Deeplasia and expert raters in cohort 1 “syndromic and endocrine disorders” for conditions with decelerated bone age. Panel (A) shows a scatter plot comparing Deeplasia bone age predictions with expert evaluations for conditions with delayed growth, with several outliers. Panel (B) shows the corresponding Bland-Altman plot, displaying the difference between Deeplasia and the expert evaluations plotted against their average. Horizontal lines indicate the mean bias and limits of agreement, with a small number of outliers.

Figure 3
Scatter plots compare DeepIASIA bone age predictions with expert evaluations. Panel A shows data for congenital adrenal hyperplasia, precocious pseudopuberty, and precocious puberty, with some outliers. Panel B includes Noonan syndrome, pseudohypoparathyroidism, SHOX deficiency, Ullrich-Turner syndrome, and other conditions, also noting several outliers. Both plots exhibit a strong correlation with the line y = x.

Figure 3. Comparing the performance of Deeplasia and expert raters in cohort 2 “lysosomal storage disorders”. Panel (A) shows a scatter plot comparing Deeplasia bone age predictions with expert evaluations in patients with lysosomal storage disorders, including different mucopolysaccharidosis subtypes and related conditions. A small number of outliers is visible. Panel (B) shows the corresponding Bland-Altman plot, displaying the difference between Deeplasia and the expert evaluations plotted against their mean. Horizontal lines indicate the mean bias and limits of agreement, with a small number of outliers.

Table 1
www.frontiersin.org

Table 1. Performance of Deeplasia across different subgroups in cohort 1 (syndromic and endocrine disorders).

Table 2
www.frontiersin.org

Table 2. Performance of Deeplasia on different disease subgroups in cohort 2 (LSDs).

Table 3
www.frontiersin.org

Table 3. Comparison of the performance of Deeplasia and individual raters in terms of mean absolute error (MAE), root mean squared error (RMSE), 1-year accuracy, and Pearson correlation coefficient, each calculated relative to the mean of the other four raters in cohort 1 (syndromic and endocrine disorders) for every testing combination.

Table 4
www.frontiersin.org

Table 4. Comparison of the performance of Deeplasia and individual raters in terms of mean absolute error (MAE), root mean squared error (RMSE), 1-year accuracy, and Pearson correlation coefficient, each calculated relative to the mean of the other four raters in cohort 2 (LSDs).

Table 5
www.frontiersin.org

Table 5. Deeplasia’s predictions and individual as well as mean rater bone age (BA) evaluations for all outlier cases in which the absolute difference between the mean expert BA evaluation and Deeplasia’s prediction exceeded 30 months.

3 Results

3.1 Cohort 1: syndromic and endocrine disorders

The radiographic images from the first cohort of patients were analyzed by subdividing them into two primary groups based on their clinical presentation: disorders typically associated with accelerated skeletal maturation (see Figure 1) and disorders leading to delayed skeletal maturation (see Figure 2). In conditions with accelerated growth, Deeplasia demonstrated strong agreement with the expert consensus. Plot showed tight clustering along the line of identity, and Bland–Altman analysis revealed no systematic bias. In disorders with delayed growth, Deeplasia also showed strong agreement, although the mean difference line in the Bland–Altman analysis lay slightly above zero. In total, a minor positive bias of 1.52 months is recorded for all radiographs (see Table 1). Outliers, as shown in Figures 1 and 2, were defined as cases with a BA difference greater than 30 months between the mean expert evaluation and Deeplasia’s estimation. In total, four outliers were identified: one case of precocious puberty, one of SHOX deficiency, and two cases of Ullrich–Turner syndrome. The absolute BA differences in these cases ranged from 30.4 to 57.6 months. Three of these had been rated by only two raters, the remaining one by all three raters. Moreover, interrater disagreement was observed in these images, with standard deviations of expert evaluations ranging from 25.5 to 82.7 months (almost 7 years). This results in standard error of the means ranging from 18.0 to 58.5 months (see Table 5).

Table 1 summarizes Deeplasia’s performance across diagnostic, sex-, and age-specific subgroups in cohort 1. Across diagnostic categories, MAE values ranged from 5.44 to 9.10 months, RMSE values from 7.21 to 11.11 months, and 1-year accuracies from 70.0% to 92.47%. The lowest MAE values were observed in SHOX deficiency (5.44 months), Noonan syndrome (5.47 months), Silver–Russell syndrome (5.76 months), and congenital adrenal hyperplasia (5.76 months); the highest MAE was seen in pseudohypoparathyroidism (9.10 months). The highest STD of rater BA evaluations was also found for pseudohypoparathyroidism, with 11.88 months.

When stratified by sex, MAE was 6.04 months in female patients and 5.80 months in male patients. Deeplasia’s performance varied across age groups. The model performed best in early childhood (<4 years), with an MAE of 3.76 months, an RMSE of 5.1 months, and a 1-year accuracy of 96.1%. Performance decreased during mid-childhood and early puberty (8–12 years), reaching a maximum MAE of 6.78 months, a corresponding RMSE of 9.02 months, and a minimum 1-year accuracy of 85.8%, before improving again in adolescence (>12 years). Similar trends were observed for the 1-year accuracy and Pearson correlation coefficient. The bias showed an inverse pattern, with the lowest absolute bias of 0.82 months recorded in the 8–12-year age group.

Against individual raters (Table 3), Deeplasia achieved lower MAE and RMSE values and higher 1-year accuracies and Pearson correlation coefficients than any single human observer.

3.2 Cohort 2: lysosomal storage disorders

Figure 3 illustrates the agreement between Deeplasia and expert consensus in the lysosomal storage disorder cohort. The scatter plot show a strong linear relationship across the entire BA range, with data points closely aligned along the line of identity and no systematic deviation. The Bland–Altman plot and Table 2 reveal minimal bias of +1.04 months. Only two outliers were identified in cases of MPS I (BA difference −32.47 months) and MPS II (BA difference −35.80 months). For both cases, Deeplasia’s estimation was outside the range of the five expert evaluations (see Table 5).

Table 2 summarizes Deeplasia’s performance across diagnostic, sex-, and age-specific subgroups in cohort 2. MAE values across diagnostic categories ranged from 3.87 to 9.53 months, RMSE values from 4.70 to 13.90 months, and 1-year accuracies from 63.16% to 100%. The lowest MAE and the highest accuracy were observed in MPS III (MAE 3.87), while MPS II showed the largest deviations (MAE 9.53).

When stratified by sex, MAE was 5.81 months in female patients and 7.90 months in male patients.

Across age groups, the best performance was observed in the youngest children (0–4 years), with an MAE of 5.22 months, an RMSE of 6.68 months, and a 1-year accuracy of 91.3%. After this age range, both MAE and RMSE increased, reaching their peak in children aged 8–12 years, with an MAE of 9.74 months, an RMSE of 13.88 months, and a corresponding minimum 1-year accuracy of 69.2%, before improving again in older age groups.

Table 4 presents the comparison between Deeplasia and individual human raters in the lysosomal storage disorder cohort. Across all raters, Deeplasia achieved lower MAE and RMSE values and higher 1-year accuracies and Pearson correlation coefficients than any single observer.

4 Discussion

Many artificial intelligence methods rely predominantly on the publicly available dataset released in 2017 by the RSNA for their pediatric BA challenge (21). Consequently, published AI methods (2830, 32) have been trained primarily on images from the general population. Our study, conversely, validates Deeplasia’s performance on these diagnostically complex subgroups.

In a study conducted in 2024, where Deeplasia was first tested on a dysplastic dataset, an MAE of 5.84 months was achieved (34). This generally aligns with the mean MAE over all subgroups of 6.24 months across all conditions in cohort 1 and 7.14 months in cohort 2, respectively. The MAE across all conditions of both cohorts ranged from 3.87 to 9.53 months, with a mean MAE of 6.66 months across all subgroups. The results reflected the diagnostic complexity of these cohorts. Compared to the RSNA dataset [MAE of 3.87 months (34)], Deeplasia achieved a lower level of accuracy on radiographs from children with syndromic and endocrine disorders (cohort 1); predicting bone age was an even greater challenge in patients with lysosomal storage disorders (cohort 2). This indicates that, similar to human raters, assessing bone age using the Greulich and Pyle method is more challenging for Deeplasia on pathological hands than on unaffected hands.

Nevertheless, in both cohorts, Deeplasia consistently achieved higher accuracy across all evaluated metrics (MAE, RMSE, 1-year accuracy, and Pearson correlation coefficients) compared to individual human raters when tested against the mean of other raters (Tables 3, 4)—notably in cohort 2, although a large portion of the radiographs were historical acquisitions (1961–1988), resulting in lower overall image quality that complicated the bone age assessment for all evaluators.

Based on these results, Deeplasia can be considered successfully validated for its generalizability to rare pathological cohorts, which are widely regarded as challenging for current AI-based bone age assessment systems.

Independent of specific radiographs or cohorts, a fundamental difference exists between BA assessment using the human traditional Greulich–Pyle method and Deeplasia. While the first relies on a stepwise, discontinuous process of visual atlas matching, Deeplasia provides a continuous BA prediction down to the monthly and decimal level. This continuous output, which is not bounded by the discrete stages of the GP atlas, constitutes a methodological advantage over the human reference standard. As a result, exact statistical agreement with stepwise human bone age ratings is rarely achievable.

4.1 Cohort 1

The subgroup analysis of cohort 1 (Table 1) reveals that Deeplasia maintains high and consistent accuracy across most of the syndromic and endocrine disorders, with the MAE tightly clustered in a narrow range of 5.44 to 6.50 months. There was no systematic difference in its performance between conditions characterized by accelerated or delayed skeletal maturation. The only significant deviation was the elevated MAE in pseudohypoparathyroidism (9.10 months), which is most likely attributable to the marked and specific skeletal atypia associated with this rare condition (35). Crucially, even this peak MAE value still represents a very high level of accuracy, confirming that the average prediction error remains well below 12 months. The statistical findings are visually corroborated by the Bland–Altman analysis (Figure 1B). This finding suggests that determining bone age based on the Greulich and Pyle atlas is particularly challenging in such cases, as reflected by the fact that this disorder also exhibited the highest standard deviation among rater bone age evaluations within this cohort. Notably, the model achieved excellent results in the remaining dysplastic diseases, despite the known presence of pronounced hand dysplasias in many of these disorders (36, 37).

The most striking observation in the age stratification was a U-shaped accuracy curve. This pattern mirrors clinical experience of human BA evaluation, as these transitional growth phases are methodologically the most challenging to assess.

Furthermore, sex stratification yielded balanced results, confirming the model’s robust performance irrespective of gender. The slightly worse performance for female patients for this cohort is mostly correlated to the large sample size of only female Ullrich–Turner syndrome patients (n = 241), which was more difficult to assess (MAE = 6.5 months), while the mean MAE of all eight classes is slightly smaller, with 6.24 months (Table 1).

The four outliers in cohort 1 (see Table 5), where Deeplasia’s error relative to the mean of the raters exceeded 30 months, are most likely attributable to failures in the human rating process. These could include errors in data entry (e.g., typographical mistakes) or other sources of human error during rating, which led to unusually high standard deviations among rater evaluations and SEMs for these cases. Given the large number of images evaluated in this cohort (n = 950), such extreme deviations are expected and do not undermine the overall robustness of the study.

A fast and reliable bone age assessment is highly valuable for the conditions in cohort 1 for several reasons. It can serve as a diagnostic tool, for example, to detect early skeletal maturation and predict adult height in cases of precocious puberty (38), or to identify a typical bone age delay, as seen in Noonan syndrome (37). Another important application is the estimation of remaining growth potential, as well as therapy monitoring during growth hormone treatment in SHOX deficiency (39, 40), Noonan syndrome (41), Silver–Russell syndrome (42), and Turner syndrome (43). In congenital adrenal hyperplasia, treatment with glucocorticoids may accelerate skeletal maturation, which needs to be monitored to optimize growth outcomes (44). Across all these cases, an improved, standardized, and accelerated bone age assessment could significantly enhance patient management and treatment planning.

4.2 Cohort 2

Performance across the diverse diagnostic subgroups in cohort 2 (LSDs) reflected the significant impact of phenotypic variability on model accuracy. The higher accuracy observed for MPS III (Sanfilippo syndrome) can be attributed to its relatively mild skeletal involvement and preserved bone proportions, which remain closer to the standard Greulich–Pyle reference morphology (19, 45). In sharp contrast, the lower accuracy for MPS II (Hunter syndrome) is consistent with its pronounced dysostosis multiplex and greater intradiagnostic heterogeneity, which substantially deviates from reference images (33). Similarly, the unclassified LSD group exhibited reduced precision, likely reflecting the broad radiographic and etiologic diversity within that category. Accuracy also showed a decline with age, a pattern associated with the lesser number of available images from older patients and the intrinsic difficulty of evaluating skeletal maturity in later stages of growth when deformities become more pronounced (33). The slight performance differences between sexes may also be explained by the genetic distribution of the included disorders: MPS II is inherited in an X-linked manner and therefore occurs almost exclusively in male patients (46, 47), while all other included lysosomal storage diseases, such as MPS I, III, IV, VI, and alpha mannosidosis (45, 4852), are autosomal recessive. As a result, the male subgroup contained a higher proportion of severely affected MPS II patients with pronounced skeletal deformities, whereas the female subgroup included a less heterogeneous and overall milder spectrum of diseases.

The outliers in cohort 2 represented instances of disagreement between Deeplasia and all five raters (Table 5). However, since Deeplasia overall achieved higher accuracy than any individual rater when compared to the mean of the other raters’ evaluations, this disagreement cannot be considered systematic (Table 4). These findings confirm that Deeplasia maintains consistent accuracy and reproducibility even under the challenging skeletal conditions typical of lysosomal storage disorders.

4.3 Limitations

A primary constraint is the geographic and ethnic homogeneity of the investigated cohorts, which predominantly reflect the demographic profile of Central Europe (Germany). While the AI model was trained on the globally used RSNA dataset, the generalizability of our specific validation findings to non-European or mixed-ethnicity populations remains limited, suggesting a need for future geographically diverse validation studies (23). Furthermore, the reliance on the Greulich–Pyle method (19) as the reference standard means the AI system inherits the inherent subjectivity and variability of this atlas-based approach, especially in transitional growth phases and in patients with pronounced dysostosis multiplex. Crucially, the absence of a true, independent ground truth means the consensus of human raters, inherently prone to variability, must serve as the reference for the AI system’s performance. Methodological limitations also include the uneven distribution and small sample size in certain rare diagnostic subgroups (e.g., precocious pseudopuberty and MPS VI). These small numbers increase the uncertainty of error metrics in these specific populations and may explain the observed performance drop compared to more prevalent disorders. Additionally, the multicenter nature of the data collection, spanning several decades, introduces unavoidable heterogeneity in image acquisition protocols. In particular, the inclusion of older radiographs in cohort 2 often resulted in image quality and resolution significantly inferior to modern clinical standards, which may attenuate the system’s and the rater’s performance compared to uniformly acquired datasets.

To reduce the impact of these conditions on the total statistics of a cohort, we tried to address this by reporting the mean statistical metrics—MAE, RMSE, 1-year accuracy, and bias—for all conditions in each cohort without disproportionately weighting any condition due to differences in sample size. Finally, as with all artificial intelligence models, there is a lack of explainability in the model’s assessments.

5 Conclusion

The present study successfully validated the open-source deep learning system Deeplasia for automated BA assessment on external cohorts of pediatric patients with rare and diagnostically complex growth disorders, including syndromic and endocrine disorders as well as lysosomal storage diseases disorders, and lysosomal storage diseases. From our study, one can conclude that, for Deeplasia, hand radiographs of unaffected children are easier to assess than those of patients with endocrine or syndromic conditions (MAE = 5.95 months across all 950 radiographs), which in turn are easier to evaluate than those of patients with lysosomal storage disorders (MAE = 7.13 months across all 186 images). The same trend was observed for human clinicians, as Deeplasia consistently outperformed individual experts in both cohorts.

While the reliance on the Greulich–Pyle method as the reference standard and the geographic homogeneity of the cohorts remain limitations, Deeplasia offers a promising path toward standardizing BA assessment in pediatric practice.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethik-Kommission, Med. Fakultät, OvG-Universität, Leipziger Str. 44, 39120 Magdeburg, Germany (22/27). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

KS: Writing – original draft, Data curation, Investigation. MF: Writing – original draft, Investigation, Data curation. PS: Methodology, Validation, Formal Analysis, Writing – original draft. EB: Software, Data curation, Writing – review & editing, Methodology. AK: Writing – review & editing, Data curation, Investigation. CL: Investigation, Data curation, Writing – review & editing. JJ: Data curation, Writing – review & editing, Investigation. ML: Writing – review & editing, Data curation, Investigation. KP: Data curation, Investigation, Writing – review & editing. SR: Data curation, Investigation, Writing – review & editing. BJ: Data curation, Methodology, Conceptualization, Software, Writing – review & editing, Project administration, Writing – original draft. KM: Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported federal grant from BM Forschung, Technologie und Raumfahrt (FKZ: 03LW0629K) and supported by Otto von Guericke University Magdeburg through coverage of the open access publication fee.

Conflict of interest

The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author KM declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Breen AB, Steen H, Pripp A, Niratisairak S, and Horn J. Accuracy of 4 different methods for estimation of remaining growth and timing of epiphysiodesis. J Bone Joint Surg Am. (2024) 106:1888–94. doi: 10.2106/JBJS.23.01483

PubMed Abstract | Crossref Full Text | Google Scholar

2. Satoh M and Hasegawa Y. Factors affecting prepubertal and pubertal bone age progression. Front Endocrinol (Lausanne). (2022) 13:967711. doi: 10.3389/fendo.2022.967711

PubMed Abstract | Crossref Full Text | Google Scholar

3. Blum WF, Ranke MB, Keller E, Keller A, Barth S, de Bruin C, et al. A novel method for adult height prediction in children with idiopathic short stature derived from a german-dutch cohort. J Endocr Soc. (2022) 6:bvac074. doi: 10.1210/jendso/bvac074

PubMed Abstract | Crossref Full Text | Google Scholar

4. Wang D, Zhang K, Ding J, and Wang L. (2020). Improve bone age assessment by learning from anatomical local regions, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Cham: Springer. pp. 631–40.

Google Scholar

5. Shim YS, Lim KI, Lee HS, and Hwang JS. Long-term outcomes after gonadotropin-releasing hormone agonist treatment in boys with central precocious puberty. PloS One. (2020) 15:e0243212. doi: 10.1371/journal.pone.0243212

PubMed Abstract | Crossref Full Text | Google Scholar

6. Wit JM, Kamp GA, Oostdijk W, and on behalf of the Dutch Working Group on T, Diagnosis of Growth Disorders in C. Towards a rational and efficient diagnostic approach in children referred for growth failure to the general paediatrician. Horm Res Paediatr. (2019) 91:223–40. doi: 10.1159/000499915

PubMed Abstract | Crossref Full Text | Google Scholar

7. Boeyer ME, Sherwood RJ, Deroche CB, and Duren DL. Early maturity as the new normal: A century-long study of bone age. Clin Orthop Relat Res. (2018) 476:2112–22. doi: 10.1097/CORR.0000000000000446

PubMed Abstract | Crossref Full Text | Google Scholar

8. Creo AL and Schwenk WF 2nd. Bone age: A handy tool for pediatric providers. Pediatrics. (2017) 140:e20171486. doi: 10.1542/peds.2017-1486

PubMed Abstract | Crossref Full Text | Google Scholar

9. Martin DD, Wit JM, Hochberg Z, van Rijn RR, Fricke O, Werther G, et al. The use of bone age in clinical practice - part 2. Horm Res Paediatr. (2011) 76:10–6. doi: 10.1159/000329374

PubMed Abstract | Crossref Full Text | Google Scholar

10. Cole TJ. The evidential value of developmental age imaging for assessing age of majority. Ann Hum Biol. (2015) 42:379–88. doi: 10.3109/03014460.2015.1031826

PubMed Abstract | Crossref Full Text | Google Scholar

11. De Sanctis V, Di Maio S, Soliman AT, Raiola G, Elalaily R, and Millimaggi G. Hand X-ray in pediatric endocrinology: Skeletal age assessment and beyond. Indian J Endocrinol Metab. (2014) 18:S63–71. doi: 10.4103/2230-8210.145076

PubMed Abstract | Crossref Full Text | Google Scholar

12. Martin DD, Schittenhelm J, and Thodberg HH. Validation of adult height prediction based on automated bone age determination in the Paris Longitudinal Study of healthy children. Pediatr Radiol. (2016) 46:263–9. doi: 10.1007/s00247-015-3468-8

PubMed Abstract | Crossref Full Text | Google Scholar

13. Lee SH, Modi HN, Song HR, Hazra S, Suh SW, and Modi C. Deceleration in maturation of bone during adolescent age in achondroplasia–a retrospective study using RUS scoring system. Skeletal Radiol. (2009) 38:165–70. doi: 10.1007/s00256-008-0544-2

PubMed Abstract | Crossref Full Text | Google Scholar

14. Salih RM and Basheer NM. Pediatric radiology: An analysis of AI-powered bone age determination methods. NTU J Eng Technol. (2025) 4:24–35. doi: 10.56286/ntujet.v4i1

Crossref Full Text | Google Scholar

15. Thodberg HH, Neuhof J, Ranke MB, Jenni OG, and Martin DD. Validation of bone age methods by their ability to predict adult height. Horm Res Paediatr. (2010) 74:15–22. doi: 10.1159/000313592

PubMed Abstract | Crossref Full Text | Google Scholar

16. Martin DD, Neuhof J, Jenni OG, Ranke MB, and Thodberg HH. Automatic determination of left- and right-hand bone age in the First Zurich Longitudinal Study. Horm Res Paediatr. (2010) 74:50–5. doi: 10.1159/000313369

PubMed Abstract | Crossref Full Text | Google Scholar

17. Tristan-Vega A and Arribas JI. A radius and ulna TW3 bone age assessment system. IEEE Trans BioMed Eng. (2008) 55:1463–76. doi: 10.1109/TBME.2008.918554

PubMed Abstract | Crossref Full Text | Google Scholar

18. Albanese A, Hall C, and Stanhope R. The use of a computerized method of bone age assessment in clinical practice. Horm Res. (1995) 44 Suppl 3:2–7. doi: 10.1159/000184665

PubMed Abstract | Crossref Full Text | Google Scholar

19. Greulich WW and Pyle SI. Radiographic Atlas of Skeletal Development of the Hand and Wrist. 2nd. Stanford, CA: Stanford University Press (1959).

Google Scholar

20. Tanner JM, Healy MJ, Goldstein H, and N. C. Assessment of Skeletal Maturity and Prediction of Adult Height (TW3 Method). 3rd. London: Saunders (2001).

Google Scholar

21. Halabi SS, Prevedello LM, Kalpathy-Cramer J, Mamonov AB, Bilbily A, Cicero M, et al. The RSNA pediatric bone age machine learning challenge. Radiology. (2019) 290:498–503. doi: 10.1148/radiol.2018180736

PubMed Abstract | Crossref Full Text | Google Scholar

22. Ruiz-Arana IL, Lechanteur V, Busiah K, Bouthors T, Antoniou MC, Stoppa-Vaucher S, et al. Comparison of boneXpert and IB-lab-PANDA automated bone age evaluation in children with growth and puberty disorders. J Endocr Soc. (2025) 9:bvaf122. doi: 10.1210/jendso/bvaf122

PubMed Abstract | Crossref Full Text | Google Scholar

23. Rassmann S, Abashishvili L, Melikidze E, Sukhiashvili A, Lartsuliani M, Chkhaidze I, et al. Population-specific calibration and validation of an open-source bone age AI. Sci Rep. (2025) 15:32673. doi: 10.1038/s41598-025-20148-w

PubMed Abstract | Crossref Full Text | Google Scholar

24. Ren X, Li T, Yang X, Wang S, Ahmad S, Xiang L, et al. Regression convolutional neural network for automated pediatric bone age assessment from hand radiograph. IEEE J BioMed Health Inform. (2019) 23:2030–8. doi: 10.1109/JBHI.2018.2876916

PubMed Abstract | Crossref Full Text | Google Scholar

25. Mutasa S, Chang PD, Ruzal-Shapiro C, and Ayyala R. MABAL: a novel deep-learning architecture for machine-assisted bone age labeling. J Digit Imaging. (2018) 31:513–9. doi: 10.1007/s10278-018-0053-3

PubMed Abstract | Crossref Full Text | Google Scholar

26. Koitka S, Kim MS, Qu M, Fischer A, Friedrich CM, and Nensa F. Mimicking the radiologists’ workflow: Estimating pediatric hand bone age with stacked deep neural networks. Med Image Anal. (2020) 64:101743. doi: 10.1016/j.media.2020.101743

PubMed Abstract | Crossref Full Text | Google Scholar

27. Spampinato C, Palazzo S, Giordano D, Aldinucci M, and Leonardi R. Deep learning for automated skeletal bone age assessment in X-ray images. Med Image Anal. (2017) 36:41–51. doi: 10.1016/j.media.2016.10.010

PubMed Abstract | Crossref Full Text | Google Scholar

28. Larson DB, Chen MC, Lungren MP, Halabi SS, Stence NV, and Langlotz CP. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. (2018) 287:313–22. doi: 10.1148/radiol.2017170236

PubMed Abstract | Crossref Full Text | Google Scholar

29. Maratova K, Zemkova D, Sedlak P, Pavlikova M, Amaratunga D, Krasnicanova H, et al. A comprehensive validation study of the latest version of BoneXpert on a large cohort of Caucasian children and adolescents. Front Endocrinol (Lausanne). (2023) 14:1130580. doi: 10.3389/fendo.2023.1130580

PubMed Abstract | Crossref Full Text | Google Scholar

30. Martín Pérez IMM, Bourhim S, and Martín Pérez SE. Artificial intelligence-based models for automated bone age assessment from posteroanterior wrist X-rays: ASystematic review. Appl Sci. (2025) 15:5978. doi: 10.3390/app15115978

Crossref Full Text | Google Scholar

31. Nilsson O. Aggrecanopathies highlight the need for genetic evaluation of ISS children. Eur J Endocrinol. (2020) 183:C9–C10. doi: 10.1530/EJE-20-0420

PubMed Abstract | Crossref Full Text | Google Scholar

32. Unger S, Ferreira CR, Mortier GR, Ali H, Bertola D, Calder A, et al. Nosology of genetic skeletal disorders: 2023 revision. Am J Med Genet A. (2023) 191:1164–209. doi: 10.1002/ajmg.a.63132

PubMed Abstract | Crossref Full Text | Google Scholar

33. Handa A, Grigelioniene G, and Nishimura G. Skeletal dysplasia families: A stepwise approach to diagnosis. Radiographics. (2023) 43:e220067. doi: 10.1148/rg.220067

PubMed Abstract | Crossref Full Text | Google Scholar

34. Rassmann S, Keller A, Skaf K, Hustinx A, Gausche R, Ibarra-Arrelano MA, et al. Deeplasia: deep learning for bone age assessment validated on skeletal dysplasias. Pediatr Radiol. (2024) 54:82–95. doi: 10.1007/s00247-023-05789-1

PubMed Abstract | Crossref Full Text | Google Scholar

35. Mantovani G, Bastepe M, Monk D, de Sanctis L, Thiele S, Usardi A, et al. Diagnosis and management of pseudohypoparathyroidism and related disorders: first international Consensus Statement. Nat Rev Endocrinol. (2018) 14:476–500. doi: 10.1038/s41574-018-0042-0

PubMed Abstract | Crossref Full Text | Google Scholar

36. Seki A, Jinno T, Suzuki E, Takayama S, Ogata T, and Fukami M. Skeletal deformity associated with SHOX deficiency. Clin Pediatr Endocrinol. (2014) 23:65–72. doi: 10.1297/cpe.23.65

PubMed Abstract | Crossref Full Text | Google Scholar

37. Papadopoulou A and Bountouvi E. Skeletal defects and bone metabolism in Noonan, Costello and cardio-facio-cutaneous syndromes. Front Endocrinol (Lausanne). (2023) 14:1231828. doi: 10.3389/fendo.2023.1231828

PubMed Abstract | Crossref Full Text | Google Scholar

38. Srilanchakon K, Supornsilchai V, Wacharasindhu S, and Savage MO. Precocious puberty: a comprehensive review of diagnosis and clinical presentation, etiology, and treatment. Asian BioMed (Res Rev News). (2025) 19:69–77. doi: 10.2478/abm-2025-0009

PubMed Abstract | Crossref Full Text | Google Scholar

39. Marstrand-Joergensen MR, Jensen RB, Aksglaede L, Duno M, and Juul A. Prevalence of SHOX haploinsufficiency among short statured children. Pediatr Res. (2017) 81:335–41. doi: 10.1038/pr.2016.233

PubMed Abstract | Crossref Full Text | Google Scholar

40. Shapiro S, Klein GW, Klein ML, Wallach EJ, Fen Y, Godbold JH, et al. SHOX gene variants: growth hormone/insulin-like growth factor-1 status and response to growth hormone treatment. Horm Res Paediatr. (2015) 83:26–35. doi: 10.1159/000365507

PubMed Abstract | Crossref Full Text | Google Scholar

41. Barbero AIS, Valenzuela I, Fernández-Alvarez P, et al. New insights into the spectrum of RASopathies: clinical and genetic data in a cohort of 121 spanish patients. Am J Med Genet A. (2024) 197:e63905. doi: 10.1002/ajmg.a.63905

PubMed Abstract | Crossref Full Text | Google Scholar

42. Wakeling EL, Brioude F, Lokulo-Sodipe O, O’Connell SM, Salem J, Bliek J, et al. Diagnosis and management of Silver-Russell syndrome: first international consensus statement. Nat Rev Endocrinol. (2017) 13:105–24. doi: 10.1038/nrendo.2016.138

PubMed Abstract | Crossref Full Text | Google Scholar

43. Blum WF, Ross JL, Zimmermann AG, Quigley CA, Child CJ, Kalifa G, et al. GH treatment to final height produces similar height gains in patients with SHOX deficiency and Turner syndrome: results of a multicenter trial. J Clin Endocrinol Metab. (2013) 98:E1383–92. doi: 10.1210/jc.2013-1222

PubMed Abstract | Crossref Full Text | Google Scholar

44. Troger T, Sommer G, Lang-Muritano M, Konrad D, Kuhlmann B, Zumsteg U, et al. Characteristics of growth in children with classic congenital adrenal hyperplasia due to 21-hydroxylase deficiency during adrenarche and beyond. J Clin Endocrinol Metab. (2022) 107:e487–99. doi: 10.1210/clinem/dgab701

PubMed Abstract | Crossref Full Text | Google Scholar

45. Wagner V and Northrup H. Mucopolysaccharidosis Type III. Seattle (WA): University of Washington (2019).

Google Scholar

46. Ayodele O, Muller K, Setayeshgar S, Alexanderian D, and Yee KS. Clinical characteristics and healthcare resource utilization for patients with mucopolysaccharidosis II (MPS II) in the United States: A retrospective chart review. J Health Econ Outcomes Res. (2022) 9:117–27. doi: 10.36469/jheor.2022.33801

PubMed Abstract | Crossref Full Text | Google Scholar

47. Martin R, Beck M, Eng C, Giugliani R, Harmatz P, Muñoz Rojas MV, et al. Recognition and diagnosis of mucopolysaccharidosis II (Hunter syndrome). Pediatrics. (2008) 121:e377–86. doi: 10.1542/peds.2007-1350

PubMed Abstract | Crossref Full Text | Google Scholar

48. DD S, Chandola S, Jain A, Gupta N, Kabra M, and Jana M. Hand radiographs in skeletal dysplasia: A pictorial review. Indian J Radiol Imaging. (2024) 34:291–308. doi: 10.1055/s-0043-1777320

PubMed Abstract | Crossref Full Text | Google Scholar

49. Lamichhane S, Sapkota A, Sapkota S, Adhikari N, Aryal S, and Adhikari P. Mucopolysaccharidosis type I Hurler-Scheie syndrome: a case report. Ann Med Surg (Lond). (2024) 86:588–93. doi: 10.1097/MS9.0000000000001557

PubMed Abstract | Crossref Full Text | Google Scholar

50. Donsante S, Pievani A, Palmisano B, Finamore M, Fazio G, Corsi A, et al. Modeling skeletal dysplasia in Hurler syndrome using patient-derived bone marrow osteoprogenitor cells. JCI Insight. (2024) 9:e173449. doi: 10.1172/jci.insight.173449

PubMed Abstract | Crossref Full Text | Google Scholar

51. Hwang-Wong E, Amar G, Das N, Zhang X, Aaron N, Gale K, et al. Skeletal phenotype amelioration in mucopolysaccharidosis VI requires intervention at the earliest stages of postnatal development. JCI Insight. (2023) 8:e171312. doi: 10.1172/jci.insight.171312

PubMed Abstract | Crossref Full Text | Google Scholar

52. Andreou T, Ishikawa-Learmonth Y, and Bigger BW. Phenotypic characterisation of the Mucopolysaccharidosis Type I (MPSI) Idua-W392X mouse model reveals increased anxiety-related traits in female mice. Mol Genet Metab. (2023) 139:107651. doi: 10.1016/j.ymgme.2023.107651

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: bone age, rare growth disorders, artificial intelligence, pediadtric radiology, deeplasia, Greulich and Pyle method

Citation: Skaf K, Fardipour M, Schmidt P, Bolmer E, Keller A, Lampe C, Jurgens J, Lindschau M, Palm K, Ruckdeschel S, Javanmardi B and Mohnike K (2026) Automated bone age assessment in rare pediatric growth disorders: a comparative study using Deeplasia. Front. Endocrinol. 17:1741927. doi: 10.3389/fendo.2026.1741927

Received: 07 November 2025; Accepted: 09 January 2026; Revised: 15 December 2025;
Published: 06 February 2026.

Edited by:

Sally Radovick, Rutgers, The State University of New Jersey, United States

Reviewed by:

Oscar Brunetto, Hospital Pedro de Elizalde, Argentina
Hsueh-Kuan Lu, National Taiwan University of Sport, Taiwan

Copyright © 2026 Skaf, Fardipour, Schmidt, Bolmer, Keller, Lampe, Jurgens, Lindschau, Palm, Ruckdeschel, Javanmardi and Mohnike. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Klaus Mohnike, S2xhdXMubW9obmlrZUBtZWQub3ZndS5kZQ==

These authors share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.