Criteria for First-Year Growth Response to Growth Hormone Treatment in Prepubertal Children With Growth Hormone Deficiency: Do They Predict Poor Adult Height Outcome?

Objective: Several criteria for first-year growth response (FYGR) to growth hormone (GH) treatment have been proposed. We explored which FYGR criteria predicted best the final height outcome after GH treatment in prepubertal children with GH deficiency (GHD). Design and methods: Height data of 129 GHD children (83 boys) who attained adult height and had been treated with GH for at least 4 consecutive years with at least 1 year before pubertal onset, were retrieved from the Belgian GH Registry. The FYGR parameters were: (1) increase in height (ΔHt) SDS, (2) height velocity (HV) SDS, (3) ΔHV (cm/year), (4) index of responsiveness (IoR) in KIGS prediction models, (5) first-year HV SDS based on the KIGS expected HV curve (HV KIGS SDS), (6) near final adult height (nFAH) prediction after first-year GH treatment. Poor final height outcome (PFHO) criteria were: (1) total ΔHt SDS <1.0, (2) nFAH SDS <−2.0, (3) nFAH minus midparental height SDS <−1.3. ROC curve analyses were performed to define the optimal cut-off for FYGR parameters to predict PFHO. Only ROC curves with an area under the curve (AUC) of more than 70% were further analyzed. Results: Twelve, 22 and 10% of the children had respectively a total ΔHt SDS <1, nFAH SDS <−2, and nFAH minus midparental height SDS <−1.3. The AUC's ranged between 73 and 85%. The highest AUC was found for first-year ΔHt SDS to predict total ΔHt SDS <1, and predicted nFAH SDS to predict nFAH SDS <−2. The currently used FYGR criteria had low specificities and sensitivities to detect PFHO. To obtain a 95% specificity, the cut-off value (and sensitivity) of FYGR parameters were: ΔHt SDS <0.35 (40%), HV SDS <−0.85 (43%), ΔHV <1.3 cm/year (36%), IoR <−1.57 (17%), HV KIGS SDS <−0.83 (40%) to predict total ΔHt SDS <1; predicted nFAH SDS (with GH peak) <−1.94 (25%), predicted nFAH SDS (without GH peak) <−2.02 (25%) to predict nFAH SDS <−2. At these cut-offs, the amount of correctly diagnosed poor final responders equals the amount of false positives. Conclusion: First-year growth response criteria perform poorly as predictors of poor final height outcome after long-term GH treatment in prepubertal GHD children.


INTRODUCTION
Growth hormone deficiency (GHD) in children is mostly idiopathic and is treated with daily growth hormone (GH) injections for a mean duration of 4 to 11 years (1)(2)(3)(4)(5)(6)(7)(8). GH treatment is therefore not only burdensome for the patients and their families, it is also costly. In addition, not every child benefits from GH treatment and the poor responder rate in GHD has been found to be between 10 and 30% (9,10). It is therefore common practice to evaluate the response to GH therapy after 1 year to detect poor responders in order to reassess the diagnosis, adapt the GH dose or stop the treatment to avoid unnecessary daily injections and expenses. The evaluation is usually done after 1 year of treatment because it is known that the first year response is an important determinant of the total treatment height outcome (11).
Several methods exist to evaluate this first year response such as increase in height ( Ht) SDS, height velocity (HV), HV SDS on the population HV reference curve, and HV SDS on the predicted HV for idiopathic GHD curve (12,13). A parameter (index of responsiveness, IoR) has been introduced that compares the observed first year HV to a predicted HV derived from prediction models (14,15). More recently, models have been proposed that predict the near final height outcome after the first treatment year (16). All these methods for evaluation of first-year growth response use arbitrary decision values that are not based on their ability to predict a final height outcome. Up to now, the value of these first-year growth response and responsiveness parameters as predictors of a poor final height outcome after long-term GH treatment in GHD patients has not been analyzed.
We therefore set out to determine the sensitivity and specificity of these first year growth response (FYGR) criteria at their proposed threshold levels to detect a poor final height outcome (PFHO), defined by different criteria. In addition, we performed ROC analyses to calculate the decision levels at a desired 95% specificity.

Materials
The auxological data and GH treatment characteristics of prepubertal children diagnosed with GHD, who were enrolled in the Registry of the BElgian Society for PEdiatric Endocrinology and Diabetology (BESPEED) since 1986, were retrieved. This registry was approved by the ethical committee of the Brussels University and the University Hospital Brussels in Belgium. The legal representatives of all subjects gave written informed consent to have their data registered in a national registry and to use their data for scientific purposes in accordance with the Declaration of Helsinki. All data are pseudonymised to comply with rigorous privacy guidelines. Only patients, who had been treated with recombinant human GH on a daily regimen for at least 4 consecutive years and at least 1 year before pubertal onset and who had attained final adult height were included. Growth hormone was only of recombinant origin in all cases. GHD patients with and without developmental anatomical anomalies of the pituitary were included, but those with acquired GHD were excluded. Other exclusion criteria were any medication or medical condition other than GHD that can affect growth, interruption of GH treatment for more than 6 months, and smallness for gestational age. In total, 129 patients (83 males and 46 females) with GHD (81 with isolated GHD and 48 with multiple pituitary hormone deficiency) met the inclusion and exclusion criteria.

Methods
The diagnosis of GHD was made by the treating physician and peer-reviewed at the monthly meeting of BESPEED, according to the KIGS etiology classification system (17). All patients had a peak GH concentration of < 10 µg/l after glucagon and/or insulin stimulation. Pubertal onset was defined as testicular volumes ≥ 4 ml for boys and Tanner breast stage ≥ 2 in girls.
Variables retrieved from the registry were (a) status at birth: sex, birth weight and length; (b) father's and mother's height (Ht); (c) pre-treatment Ht when measured between 6 and 18 months before GH treatment; (d) patient variables at the start of the treatment period: chronological age, Ht, weight (Wt), body mass index (BMI), the highest peak GH concentration during a provocation test, the presence of other pituitary hormone deficiencies, and (e) treatment modality: average GH dose (µg/kg.day) during the first year of GH treatment.
Near (n) FAH was defined as the height attained when HV was less than 2 cm/year, calculated over a period of minimum 9 months, and when the child had a chronological  age >17 years in boys and >15 years in girls. nFAH SDS was calculated in 2 different ways: (1) using the chronological age (CA), (2) using the growth reference data at age 21 years (A21). The FYGR parameters were: (1) increase in height ( Ht) SDS, (2) height velocity (HV) (cm/year), (3) HV SDS, (4) HV (cm/year), (5) index of responsiveness (IoR) in KIGS prediction models, (6) first-year HV SDS based on the KIGS expected HV curve (HV KIGS SDS), (7) near final adult height (nFAH) prediction after first-year GH treatment.
First-year gain in height ( Ht) SDS and first-year HV (cm/year), were calculated as the increment in height between start and after minimum 9 months and maximum 15 months of GH therapy and subsequently scaled to 12 months. HV (cm/year) was calculated as the HV during the first year of GH treatment minus the HV during the pretreatment year. The HV during the first year of GH treatment was plotted on the Flemish HV curve (20), and on the reference curve for the HV during the first year of GH treatment developed by Ranke et al. (15), and its SDS value was calculated. Predicted HV was calculated using the KIGS prediction models for idiopathic GHD (14,15), if all parameters required for the mathematical algorithm were available. Differences between observed and predicted HVs were expressed as index of responsiveness (IoR), calculated as the observed HV minus the predicted HV, divided by the SD of the predicted HV of the child. The predicted nFAH was calculated after the first year of GH treatment, using the prediction models by Ranke et al. (16). For the prediction models, observed heights (height at start, height after first year, parental heights, and nFAH) were converted to SDS using reference data by Prader et al. (21) and the MPH SDS was calculated with the Cole formula: (father height SDS + mother height SDS)/1.61.
The long-term growth response to GH was evaluated by three different, but complementary methods: (1) nFAH, expressed as a height SDS; (2) total Ht SDS, calculated as the nFAH SDS minus height SDS at start of GH treatment; (3) nFAH SDS minus MPH SDS, an index of achieving genetic height potential.

Statistical Analysis
The variables are reported as the median (25-75th percentile) and mean (±SD). A Shapiro-Wilk test was used to test for the normal distribution. Differences between groups were tested with a t-test when the distribution of data was normal, and with a Mann-Whitney U-test otherwise. ROC curve analyses were performed to examine the relationship between sensitivity and specificity for the different FYGR parameters and PFHO criteria and to determine the test cut-off values that had a 95% specificity. The minimum AUC was set at 0.7. Significance was considered at the 5% level (p < 0.05). MedCalc R and IBM SPSS Statistics 25 R software was used for all statistical analyses.

Background Characteristics
The background and auxological characteristics of 129 included GHD children (83 males, 46 females) are listed in Table 1. GH therapy was initiated at a mean age of 6.8 years, a median height SDS of −3.31 and a median height minus MPH SDS of −2.34. The mean GH dose at start was 28 µg/kg.day.
First-Year Response and Responsiveness to GH Treatment (Table 1) After the first year of GH therapy, the median Ht SDS was 0.99, the mean (± SD) first-year HV was 10.2 cm/year (±2.5) or 1.91 SD (±2.23), and the mean HV was 5.1 cm/year (±3.3). The mean HV SDS on the first-year GH treatment response curve by Ranke et al. was 0.31 (±0.88). The mean IoR was respectively 0.07 (±1.13) and 0.21 (±1.13), for the formula with and without max. GH peak. The mean predicted nFAH SDS was −0.84 (±0.87) with, and −0.85 (±0.87) without the maximum GH peak included. FIGURE 1 | (A) ROC curve analysis for first-year response and responsiveness parameters, with its sensitivity and specificity to predict total Ht SDS a <1(CA). CA, SDS calculated at chronological age; SDS, standard deviation score; cm, centimeter; HV, height velocity; GH, growth hormone; IoR, index of responsiveness; AUC, area under the ROC curve; a gain in height SDS f rom start of GH treatment until near final adult height; b gain in height SOS after first-year GH treatment; c HV during first-year GH treatment minus HV during pretreatment year; d growth targets for first-year GH response by Ranke et al. (B) ROC curve analysis for first-year response and responsiveness parameters, with its sensitivity and specificity to predict total Ht SDS a <1(A21). A21, SDS calculated at age 21years; SDS, standard deviation score; cm, centimeter; HV, height velocity; GH, growth hormone; AUC, area under the ROC curve; a gain in height SDS f rom start of GH treatment until near final adult height; b gain in height SDS after first-year GH treatment; c HV during first-year GH treatment minus HV during pretreatment year; d growth targets for first-year GH response by Ranke et al. (C) ROC curve analysis for predicted nFAH after first-year GH treatment a , with its sensitivity and specificity to predict nFAH SDS <−2 (Prader, CA). nFAH, near final adult height; GH, growth hormone; SDS, standard deviation score; CA, SDS calculated at chronological age; AUC, area under the ROC-curve; a prediction model f or nFAH after first-year GH treatment by Ranke et al. (D) ROC curve analysis for predicted nFAH after first-year GH treatment a , with its sensitivity and specificity to predict nFAH SDS <−2 (Prader, A21). nFAH, near final adult height; GH, growth hormone; SDS, standard deviation score; A21, SDS calculated at age 21 years; AUC, area under the ROC-curve; a prediction model f or nFAH after first-year GH treatment by Ranke et al.  (Figures 1A-D). Only ROC-curves with an AUC ≥70% were further analyzed. Tables 2A-D show the thresholds with their sensitivity and specificity of the different tests vs. the different outcomes. The thresholds for the tests currently proposed in the literature are set in bold.
Tables 2A,B show cut-off values for first-year response and responsiveness parameters, with its sensitivity and specificity to predict total Ht SDS <1 (CA and A21). The first-year response criterion Ht SDS <0.5 had a relatively low specificity (86%) to predict a total Ht SDS <1. The corresponding sensitivity was 60%. The other proposed first-year response and responsiveness criteria had a specificity of 67-97%, with corresponding sensitivities of 17-78%.
To predict a total Ht SDS <1 (CA) with a 95% specificity (in italic) the following threshold levels were found: Ht < 0.35 SD; HV < 6.8 cm/year; HV < −0.85 SD for age and sex; HV < 1.3 cm/year; HV < −0.83 SD for first-year GH treatment by Ranke et al.; IoR (without GH peak) <  D show cut-off values for predicted nFAH after first-year GH treatment, with its sensitivity and specificity to predict nFAH SDS <−2.0 (Prader, CA and A20). A predicted nFAH after first-year GH treatment < −1.94 SD (model with GH peak) and < −2.02 (model without GH peak) predicted nFAH SDS <−2 (CA) with 95% specificity and 25% sensitivity. The nFAH SDS of the good final responders who were wrongly diagnosed as poor final responders (according to the above criteria) varied between −1.98 and −1.28.
For all FYGR parameters in relation to nFAH minus MPH SDS < −1.3, the AUC's were <70% and therefore not further analyzed.

Comparison of the Good and the Poor Final Height Responders
The patients having a total Ht SDS in the highest quartile had a significantly lower height SDS at start of GH treatment compared with the patients in the lowest Ht SDS quartile (−3.78 SD vs. −3.03 SD; p < 0.001) ( Table 3). They also had a significantly higher first-year Ht SDS (1.50 SD vs. 0.61 SD; p < 0.001). Therefore, they reached a comparable height SDS after the first year of GH treatment (−2.28 SD vs. −2.41 SD; p = 0.5). The total Ht was 3.71 SD for the good (highest quartile) and 0.98 SD for the poor (lowest quartile) total Ht responders. The poor total Ht SDS responders had a significantly lower birth weight, shorter parents, and a less severe GHD. They started GH at an older age, with a taller height, and lower BMI, and received GH for a shorter period than the good total Ht SDS responders.
The patients in the highest quartile nFAH SDS had a significantly higher height SDS at start compared to the patients  in the lowest quartile nFAH SDS (−3.10 SD vs. −3.88 SD; p < 0.01) ( Table 3). Delta height SDS after the first year, at onset of puberty and at nFAH was significantly higher in the good responders. They had also taller parents and more severe GHD.

DISCUSSION
In this study of a cohort of GHD patients treated with GH extracted from the Belgian Registry we found that the mean nFAH was still below average and 10-22% of the patients had a poor final height outcome. ROC-analysis showed that the currently used FYGR criteria had low specificities and sensitivities to detect PFHO. Our final height outcome data in Belgian patients are comparable with the results of a Swedish (2) and Canadian (4) study, using the same criteria for nFAH, where idiopathic GHD children were treated with a similar GH dose for a mean period of 8.6 and 5.4 years, respectively: up to 84 and 90% obtained a nFAH SDS > −2. We previously reported in a smaller group of Belgian idiopathic GHD patients a comparable nFAH (170.4 cm in males and 158 cm in females after a mean treatment duration of 5.2 years) and a similar response rate (84% had a nFAH within normal limits) (22).
Near FAH was taken as a proxy of FAH as an outcome parameter, as many patients usually stop GH treatment and disappear from follow-up when growth slows down to less than 2 cm per year and before adult height is reached (23). To overcome this problem, nFAH SDS could be calculated at a reference age of 21 years instead of the actual chronological age. This underestimates the real Ht SDS since most adolescents will still gain a few centimeters. On the other hand, since the mean height of the reference population also increases between 16 and 21 years, nFAH SDS at the actual chronological age will overestimate the real Ht SDS. We therefore calculated nFAH SDS both with age set at 21 years (worst case scenario) and at chronological age (best case scenario), accepting that the first method will underestimate and the second will overestimate the actual FAH SDS.
This ROC-analysis showed that the classically proposed threshold levels for first-year growth response and responsiveness parameters had a low sensitivity and specificity to predict a poor near final height outcome. For example, first-year Ht SDS <0.5 had a sensitivity of 60%. This means that 60% of the poor final responders (total Ht SDS < 1.0) had a poor first-year response (first-year Ht SDS < 0.5), and 40% (100-sensitivity) of the poor final responders had a good firstyear response (first-year Ht SDS > 0.5). The corresponding specificity was 86%, meaning 86% of the good final responders had a good first-year response, and 14% (100-specificity) of the good final responders had a poor first-year response. Thus, first-year Ht SDS < 0.5 correctly identified 60% of the poor final responders, but misdiagnosed 14% of the good final responders as poor responders. In order to misdiagnose good final responders as few as possible (5%), we decided to set the specificity of the FYGR parameters at 95% and determined the test cut-off values. At these newly defined threshold values, the sensitivity to detect poor final height responders decreased considerably. Of course, every physician can chose the specificity required by the local circumstances. The FYGR threshold values that best predicted total Ht SDS < 1 with a 95% specificity were: Ht SDS < 0.35; HV SDS < −0.85, HV for first-year GH treatment SDS < 0.83, and HV < 1.3 cm/year. On the other hand, predicted nFAH SDS (with GH peak) < −1.94, and predicted nFAH SDS (without GH peak) < −2.02 performed best to detect nFAH < −2 SD (Prader) with a 95% specificity. These criteria only correctly identify 25-43% (=sensitivity) of the patients with a poor final outcome (= 3.8-5.2% of the total population). At a specificity of 95%, 5% of good final responders is wrongly diagnosed as poor final responder (=4.2-4.4% of the total population). At these cut-offs the amount of correctly diagnosed poor responders equals the amount of false positives due to the relatively low prevalence of poor responders. Several parameters, such as birth weight, midparental height, age at start, max. GH peak in provocation test, height at start, and IoR after the first year of GH treatment were found to differ between patients with a good or a poor final height outcome. Not surprisingly, these parameters are also used in prediction models for nFAH, such as in the model by Ranke et al. (16).
However, these parameters were found to only explain 60% of the variability. An incorrect diagnosis of GHD or the presence of another growth limiting condition at start of GH as well as several conditions during the GH course, such as GH dose adaptations during the first year, poor compliance after the first year of GH treatment as well as variability in pubertal onset, pubertal growth and bone age progression may all explain the poor predictability of the FAH outcome in GH treated children.
This is the first study evaluating the final height predictability of the currently used first year growth response parameters, putting them in a new long-term perspective. However, this study has also several shortcomings. Treatment adherence and the persistence of the GHD were not assessed routinely in the studied cohort. Secondly, the size of the cohort was rather small, despite the national recruitment of patients.
Despite FYGR criteria were found not to be suitable for detecting poor or good final responders without too many misdiagnoses, it is still important to evaluate first-year response to GH to identify poor compliance, improper administration of GH, additional health problems, poor nutrition, impaired GH sensitivity due to mutations in the GH-IGF-1 axis genes, incorrect initial diagnosis, etc.
In conclusion, the currently used first-year growth response and responsiveness parameters perform poorly as predictors of a poor final height outcome after long-term GH treatment in prepubertal GHD children, due to low sensitivities and/or specificities and the low prevalence of poor responders in this group. The FYGR parameters may perform better in indications with more poor responders or when more stringent criteria for poor near final height outcome (e.g., Ht SDS >1.5) are used.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethical Commitee of the University of Brussels and University Hospital of Brussels. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
SS, JD, MT, ST, VB, and RR contributed to the conception, design of the study and wrote sections of the manuscript. SS and MT organized the database. SS performed the statistical analysis and wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

FUNDING
This study was supported by a research grant from the BElgian Society for PEdiatric Endocrinology and Diabetology (BESPEED).