HoNOSCA-D As a Measure of the Severity of Diagnosed Mental Disorders in Children and Adolescents—Psychometric Properties of the German Translation

The Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA), in use worldwide, is a 13-item measure assessing the biopsychosocial severity of mental health problems in children and adolescents. This article introduces the authorized German-language version of HoNOSCA, the HoNOSCA-D, and examines and discusses its psychometric properties based on a clinical sample of 1,533 children and adolescents aged 4;0 to 17;11 years. For the HoNOSCA-D total score (severity of mental health problems), internal consistency (Cronbach’s alpha) was 0.63. The discriminative power of the items ranged from 0.07 to 0.44; the average interitem correlation was 0.11. Due to this stochastic independence, calculation of a total severity index is acceptable. Using factor analysis, the principal axis factoring and varimax rotation resulted in a four-factor structure, which with a Kaiser–Meyer–Olkin measure of sampling adequacy of 0.684 explained 30.62% of total variance. The convergent correlations with the German-language parent report version of the Strengths and Difficulties Questionnaire were as expected and showed a medium effect size. Gender and age differences in the HoNOSCA-D total score were small. Regarding the 13 items gender and age differences were negligible to medium. The highest severity was found for schizophrenia and psychotic disorders, followed by affective disorders and social behavior disorders. Overall, validity of HoNOSCA-D was clearly supported.

(HoNOS) (4,5), which were developed in the United Kingdom in the early 1990s as a clinician-rated assessment instrument for routine use in adult mental health services. A major aim was the construction of a practicable instrument for differentiated assessment of the severity of mental illness that can be used as a routine clinical outcome measure.
Based on a review of the literature by Hunter et al. (6), HoNOSCA was developed taking account of specific needs in child and adolescent mental health, which resulted in an increase in the number of items (scales) from 12 to 13. Each of these biopsychosocial indicators flags mental health problems and can determine whether the young person will be referred to psychiatric care. Like HoNOS, HoNOSCA is a clinician-rated instrument for assessing the severity of mental illness. Hunter et al. (6) emphasized that this kind of graduated rating of severity complements a psychiatric diagnosis and is of central importance with regard to outcome and measurement of the course of illness. The 13 items, or scales, assess different areas of symptoms and functioning: Behavior, Impairments, Symptoms, and Social. Two additional items concern parents' lack of understanding or lack of information about difficulties and about therapy, services, and other help and arrangements offered in connection with the patient's mental health problems. Ratings on these two items are not a part of the total score calculated for items 1-13. honOsca goodness criteria Regarding interrater reliability, previous studies found that the total score for the 13 items shows good-to-very good agreement, with intraclass correlations from 0.63 to 0.98 (3) [see also Ref. (7)]. Noteworthy is a study by Hanssen-Bauer et al. (8), which reported an intraclass correlation of raters in five countries of 0.84. For retest reliability, Harnett et al. (9) found, with a time interval of 2-3 weeks, a correlation between the total scores of r = 0.80. But there are also less favorable findings: Brann et al. (10) found a very low intraclass correlation (ICC = 0.06) for one item and only moderate intraclass correlations (ICC = 0.43) for three items. This discrepancy underlines the importance of careful rater training.
Factor analysis of the four areas of functioning suggested by Gowers et al. (3,9) yielded less unambiguous results, but no convincing evidence has since been found for these results.
Regarding criterion-related validity of HoNOSCA, Gowers et al. (3) tested whether HoNOSCA could differentiate between different patient groups. They found evidence of this insofar as they found that for clients in inpatient psychiatric treatment, who made up approximately 12% of the sample of N = 1,143, the HoNOSCA total score was M = 15.51 (SD = 7.19), but for clients in outpatient treatment, the total score was M = 11.18 (SD = 5.30). This difference was statistically significant [t(1 141) = −8.03, p < 0.001], and the effect size was nearly medium. As to age differences in HoNOSCA total scores, in a child and adolescent mental health service sample from the Melbourne area, Brann et al. (10) found significant differences among three age groups (n<5 years = 34, n5-12 years = 142, n>12 years = 129) with a medium effect size (η 2 = 0.095). Post hoc, this significance with a medium effect size (d = 0.70) was ascribable to the difference between the 13-to 20-year age range (M = 15.21, SD = 6.66, n = 113) and the 5-to 12-year age range (M = 11.11, SD = 5.15, n = 125). In contrast, Harnett et al. (9) found no age differences.
At the level of individual HoNOSCA items, Kisely et al. (18) found that children younger than age 12 years had significantly lower scores than older children, with medium effect size, on the items non-accidental self injury and alcohol, substance/solvent misuse.
For children and adolescents aged 0-20 years (N = 173), Bilenberg (11) found that in patients with comorbidity (n1 = 77), both the HoNOSCA total score baseline and the follow-up measurement 3 months later or at the end of therapy were significantly higher than in comparable patients without comorbidity (n2 = 96).
In studies with longitudinal designs, Kisely et al. (18) found significant changes over time in HoNOSCA total scores for both outpatients and inpatients. For inpatients (n1 = 30) the effect size was very large (d = 1.34; with an average measurement interval of 24.3 days); for outpatients (n2 = 123) the effect size was small (d = 0.32; with an average measurement interval of 45.5 days). Harnett et al. (9) reported similar results. Quantified in raw scores, Gowers et al. (3) found a 38% reduction in the total score; this was after an average treatment duration of 3 months and in a sample of several hundred persons. Where clinicians reported definite improvement of the mental health problem, an average change in the HoNOSCA total score of Md = 7.70 (SDd = 5.40) was found. Where clinicians reported no change, the average change was only Md = 1.60 (SDd = 4.99), and where clinicians diagnosed a worsening of the mental health problem, the average change in the HoNOSCA total score was found to be Md = −1.00 (SDd = 4.53).
Bilenberg (11) reported on the usefulness of HoNOSCA in the context of a Danish field trial; about 80% of the clinicians surveyed found HoNoSCA to be clinically feasible and easy and fast to use. Gowers

aims of This study
The aim of this study was to report initial results on item characteristics and validity of the German-language version of HoNOSCA, the HoNOSCA-D, based on the following analyses: -Item and scale analyses, homogeneity, internal consistency.
-Correlations with the parent report version of the SDQ (17,19).
-Exploratory factor analysis (principal axis factoring and varimax rotation).
-Group differences for gender and age.
The first three analyses above concern the internal consistency and construct validity of HoNOSCA-D, and the last two pertain to criterion-related validity.

MaTerials anD MeThODs german Translation of honOsca
Health of the Nation Outcome Scales for Children and Adolescents, consisting in a score sheet and user guide, was translated by the first author of this article using the World Health Organization translation guidelines. HoNOSCA-D was then practice-tested at two offices of the child and adolescent psychiatric services of the Canton of St. Gallen (KJPD SG) in Switzerland for a period of 6 months. Clinicians' ideas and suggestions for changes were recorded and then reviewed by a group of clinical experts, and some of them were incorporated in a revised version. The revised version of HoNOSCA-D was then back-translated by a native speaker of English and sent to the authors of HoNOSCA (Simon Gowers) for authorization.

setting and Design
The study was approved by the Eidgenössische Expertenkommission für das Berufsgeheimnis in der medizinischen Forschung (Swiss federal expert committee for professional confidentiality in medical research), which was also responsible for ethical approval. The data were collected from May 2010 and April 2012 in the context of a naturalistic field study at the KJPD SG. The KJPD SG is responsible for basic psychiatric services in a region with a population of approximately 500,000 persons, about one-fifth of whom are children and adolescents. The scales were part of a systematic outcome measurement and were used for both outpatients and inpatients aged 4;0 to 17;11 years. For all new patients, the case manager (psychiatrist, psychologist, or social worker) had to fill out HoNOSCA-D and decide on the diagnosis within the first two in depth consultations. The ICD-10 diagnosis was given then in agreement with the senior physician. The parents of all new patients received the SDQ questionnaire, which they usually filled out on-site and immediately before the first meeting. This interrogation is part of the usual intake procedure and is related to the diagnostic process and quality assurance. Parents usually were informed that if they don't want to fill out an anonymized SDQ and therefore contribute to the quality assurance one actively has to pull out from the survey.
The items cover four different areas of symptoms and functioning: Behavior (items 1-4), Impairments (items 5-6), Symptoms (items 7-9), and Social (items [10][11][12][13] and are rated by the clinician on a 5-point scale of severity (0 = no problem, 1 = minor problem requiring no action, 2 = mild problem but definitely present, 3 = moderately severe problem, and 4 = severe-to-very severe problem). For each item the rating format is complemented by item-specific criteria for including and not including certain problems and behaviors in the rating. For each item the most severe known problem has to be rated, and the rater may use all available information (such as information from the referring institution, from teachers, etc.) for forming a judgment. If the rater does not know the severity of a particular item, or if the item is not applicable, the rater rates the item "9, " and it is not used further. The HoNOSCA-D total score represents the summed severity of individual items 1-13. For the entire procedure (conducting, rating, and interpretation), the clinicians in this study had access to a comprehensive manual, had 1 day of training and every 3 months a refresher of 1-2 h with the opportunity to ask questions. Each of this units was held by the first author of this article.

Data analyses
To evaluate the item and scale characteristics of HoNOSCA-D, after analysis of missing values, descriptive characteristic values were calculated and item and scale analyses carried out. The average interitem correlation (homogeneity) was calculated using Fisher's z transformation. For construct validity, we computed Pearson correlations with the parent SDQ. We also carried out a principal axis factor analysis with varimax rotation. For criterionrelated validity, we used a 2 × 3 ANOVA. To minimize statistical inference type 1 errors due to alpha inflation, we used Bonferroni correction of the level of significance. As a measure of effect sizes (23), for the correlations, we used their r measure (small: r = 0.1;    medium: r = 0.3; large: r = 0.5) and for the ANOVA η 2 (small: η 2 = 0.01; medium: η 2 = 0.06; large: η 2 = 0.14).
resUlTs sample Table 1 shows the composition of the sample. The HoNOSCA-D total sample of N = 1,553 contains, as a paired sample with n = 1,408, the sample of the parent SDQ. The difference is missing data.

item and scale analyses, homogeneity
Concerning difficulty index P ( Table 2), there was a wide range, starting with items with low difficulty index values: hallucinations, delusions, or abnormal perceptions; alcohol, substance/solvent misuse; and non-accidental self injury. It is noteworthy that for every item the entire range of five points was used; there were no missing values.  Table 4 shows that for each disorder there was at least one HoNOSCA-D correspondence with a severity rating of at least 2 (2 = mild problem but definitely present). Emotional and related symptoms played a leading role here, whereby they were particularly severe for ICD-10 F50 Eating disorders and F3 Affective disorders. In second place were problems in family life and relationships. On the basis of the HoNOSCA-D total score, ICD-10 F2 Schizophrenia, schizotypal and delusional disorders, followed by F92 Mixed disorders of conduct and emotions and F3 Affective disorders, were the most severe disorders.

group Differences in gender and age
As mentioned in the note, the significance level in Table 5 is Bonferroni adjusted. Regarding age differences, the item score increased, if significant, steadily across the three age groups, whereby adolescents had a significantly highest score, unless overactivity, attention, or concentration, which had a significant lower score than the other two groups. The effect sizes of these main effects are negligible to small, with the exception of alcohol, substance/solvent misuse which shows a medium effect size.

DiscUssiOn item and scale analyses
The item and scale analyses (see Table 2) showed that only 6 of 13 items had high enough discrimination for a priori justification of scale construction. The interitem correlations and homogeneity were also very low. This resulted in a relatively low internal The HoNOSCA-D total score was skewed to the left (skewness = 0.82, kurtosis = 0.84); difficulty index value was low, at 0.24. Discrimination of 7 of the 13 items was below 0.30. Internal consistency (Cronbach's α) of the total score was 0.63.
In all, only 6 of 78 intercorrelations had at least a medium effect size of more than 0.3; the other effect sizes were small. After Fischer's z transformation, the homogeneity of the scale was 0.11.
Of the correlations with at least medium effect size, noteworthy is the HoNOSCA-D item peer relationships, which was correlated with SDQ peer relationship problems, followed by disruptive, antisocial or aggressive behavior, which was correlated with SDQ conduct problems and hyperactivity and the SDQ total difficulties score. The HoNOSCA-D item overactivity, attention or concentration was correlated with SDQ hyperactivity, and emotional and related symptoms was correlated with SDQ emotional problems. Regarding assessment of the HoNOSCA-D total score, it was found to be moderately correlated with the SDQ total difficulties score.

exploratory Factor analysis (Principal axis Factor)
Principal axis factoring and varimax rotation ( Table 2)-with a mediocre Kaiser-Meyer-Olkin measure of sampling adequacy of 0.684, an eigenvalue of 1, and explained total variance of 30.62%-produced a four-factor solution.

honOsca-D scores and Psychiatric Diagnoses
The psychiatric diagnoses were combined in diagnostic groups, which do not consistently follow the single-digit mental disorders groups of the ICD-10, however ( Table 5). The aim was to form meaningful evaluation categories in view of the HoNOSCA-D items.  consistency of 0.63 (Cronbach's α). Harnett et al. (9) reported an even lower internal consistency of α = 0.45. This situation is actually not sufficient to ascribe satisfactory scale characteristics to HoNOSCA-D for scale construction in the classical sense. However, the finding that the interitem correlations and the homogeneity are so low allows the construction of an index, since a high share of the items are stochastically independent. The result of this is a HoNOSCA total score in which the contributions of the individual items come into effect independently of one another. Based on 13 dimensions, various clinical features in specific domains of functioning are described. As mentioned above, Harnett et al. (9) found it diagnostically useful to capture these various psychosocial and psychiatric features in the items.

Validity
Correlations with the Parent SDQ As mentioned above in the introduction, there are high correlations between HoNOSCA and other outcome measures, such as GPAD (12), CGAS (14), and PCS (13). The HoNOSCA total score-that is, the complexity of the mental health problem-was also found to be correlated with therapeutic progress as assessed by therapists (9). For parent ratings, using the parent SDQ Yates et al. (13) found a medium correlation effect of r = 0.40. This value is only negligibly higher than the r = 0.32 (Table 3) value reported in this study; both correlations are in the range of medium effect size. Also the correlative analysis of the 13 HoNOSCA-D items with the parent SDQ dimensions revealed various associations that confirm that HoNOSCA-D has intact convergent validity, insofar as four dimensions of the parent SDQ including the total difficulties score were moderately correlated with items with related content on HoNOSCA-D. Only the SDQ dimension prosocial behavior-which, however, is a strength and is not included in the SDQ total difficulties score-had a significant negative correlation with the HoNOSCA-D items. This means overall that between the parent SDQ and HoNOSCA-D there is expected but not very marked correspondence in the areas to which they apply: The HoNOSCA-D provides a perspective on 13 broad problem indicators, whereas the parent SDQ is meant for analytical distinguishing between 4 and 5 latent dimensions.

Factor Analysis
Although we found a four-factor solution, our factor analysis did not confirm the areas of functioning postulated a priori by Gowers et al. (3) ( Table 2); however, Gowers et al. themselves were not able to confirm that structure through factor analysis.

Psychiatric Diagnoses
Next, we evaluated the average HoNOSCA-D scores for some psychiatric diagnoses (11, 24) (  (11) found a substantial correlation between clinical severity (captured through diagnosis) and mean HoNOSCA scores. In this study, we also found that correlation, which confirms the validity of the rating.

Gender and Age Differences
This study found small gender differences. Similar to Hanssen-Bauer et al. (8), boys had significantly higher HoNOSCA-D scores as girls in the two externalizing scales disruptive, antisocial, or aggressive behavior and overactivity, attention, or concentration. However, unlike Bauer et al. (8), in our sample the difference in scholastic or language skills was not significant. Although not significant, there was a tendency for girls to show higher scores on internalizing scales like non-organic somatic symptoms and emotional and related symptoms. Again, this is similar to Hanssen-Bauer et al. (8), who found higher scores for emotional symptoms and non-accidental self-injury for girls. In our sample, girls in the oldest age group had significant higher scores for self-injury than boys. A look at HoNOSCA-D scores across the three age groups ( Table 5) reveals that adolescents have a significant and noteworthy higher score than the two younger age groups on six scales: self-injury, drug or alcohol misuse, abnormal thoughts or perceptions, emotional symptoms, family problems and poor school attendance. The highest scores in adolescents for the scales self-injury, drug or alcohol misuse, abnormal thoughts or perceptions, family problems, and poor school attendance are understandable. Hanssen-Bauer et al. (8) found three of them as well: self-injury, drug or alcohol misuse, and poor school attendance. However, the highest scores for adolescents are less obvious for emotional symptoms. The middle age group (6;0-11;11) showed more difficulties with overactivity and concentration than the adolescents.
If we take into account the factors that can be responsible for the differences found for gender and age, it becomes difficult to use these variables as correlates for criterion-related validity. Harnett et al. (9) named various aspects that can cause the discrepancies, such as: different access to alcohol and drugs, practices in admittance to inpatient or outpatient treatment settings, sociodemographic characteristics of the population receiving mental health services, the available age-, gender-, and problem-specific mental health services, at what time point in the progression of the disorder the HoNOSCA was completed, etc.

Other Psychometric Characteristics
As mentioned in the Introduction section, clinicians emphasize the usefulness of HoNOSCA and its economical application (11). Also in this study HoNOSCA could be used across all disorders; clinicians saw no restrictions. Gowers et al. (3) reported a mean time to complete the scale of 8.5 min, with a range up to 18 min. Yates et al. (13) found that it took clinicians familiar with the instrument about 5 min to fill out HoNOSCA.

interrater reliability
The results on intraclass correlations of HoNOSCA at the level of the individual items reported in the introduction above (3) made it clear that high interrater agreement is achievable in principle. It should be borne in mind that the frequent changes in resident and intern physicians and psychologists reduce the reliability of clinical observer ratings. Therefore, training must be provided frequently, and appropriate manuals must be made available.
cOnclUsiOn In sum, this study supports the validity of HoNOSCA-D. Beyond that, this study found no inconsistencies that would call into question already established qualities of HoNOSCA. Again no support was found for internal consistency of HoNOSCA-D, which is not a new finding (3,6,9); the HoNOSCA total score follows a different logic than the logic expressed by Cronbach's α-namely, index construction using independent indicators. Through the low covariance between the HoNOSCA-D items, it is possible for the scale to cover a large area of application, which otherwise would only be possible through an extensive battery of tests for diagnosis.

eThics sTaTeMenT
Ethics committee: Eidgenössische Expertenkommission für das Berufsgeheimnis in der medizinischen Forschung (Swiss federal expert committee for professional confidentiality in medical research). All entering patients were informed about the survey and could, but only if they wanted to, refuse the survey; they actively had to withdraw from the survey. aUThOr cOnTriBUTiOns AW: theoretical backgrounds, methods, statistical analyses, reporting, and implementing these into the article. Project leader ST: statistical analyses, reporting, and implementing these into the article. RZ: critical review and correction/upgrade of the article.