All for One and One for All? – Examining Convergent Validity and Responsiveness of the German Versions of the Tinnitus Questionnaire (TQ), Tinnitus Handicap Inventory (THI), and Tinnitus Functional Index (TFI)

Background Measurement of tinnitus-related distress and treatment responsiveness is key in understanding, conceptualizing and addressing this often-disabling symptom. Whilst several self-report measures exist, the heterogeneity of patient populations, available translations, and treatment contexts requires ongoing psychometric replication and validation efforts. Objective To investigate the convergent validity and responsiveness of the German versions of the Tinnitus Questionnaire [TQ], Tinnitus Handicap Inventory [THI], and Tinnitus Functional Index [TFI] in a large German-speaking sample of patients with chronic tinnitus who completed a psychologically anchored 7-day Intensive Multimodal Treatment Programme. Methods Two-hundred-and-ten patients with chronic tinnitus completed all three questionnaires at baseline and post-treatment. Intraclass correlation coefficients determined the convergent validity of each questionnaire’s total and subscale scores. Treatment responsiveness was investigated by [a] comparing treatment-related change in responders vs. non-responders as classified by each questionnaire’s minimal clinically important difference-threshold, and [b] comparing agreement between the questionnaires’ responder classifications. Results The total scores of all three questionnaires showed high agreement before and after therapy (TQ | THI: 0.80 [Pre], 0.83 [Post], TQ | TFI: 0.72 [Pre], 0.78 [Post], THI | TFI: 0.76 [Pre] 0.80 [Post]). All total scores changed significantly with treatment yielding small effect sizes. The TQ and TFI yielded comparable (19.65 and 18.64%) and the THI higher responder rates (38.15%). The TQ | THI and TQ | TFI showed fair, and the THI | TFI moderate agreement of responder classifications. Independent of classification, responders showed significantly higher change rates than non-responders across most scores. Each questionnaire’s total change score distinguished between responders and non-responders as classified by the remaining two questionnaires. Conclusion The total scores of all three questionnaires show high convergent validity and thus, comparability across clinical and research contexts. By contrast, subscale scores show high inconsistency. Whilst the TFI appears well suited for research purposes, the THI may be better suited to measure psychological aspects of tinnitus-related distress and their changes with accordingly focused treatment approaches.


INTRODUCTION
Subjective tinnitus is a multicausally generated symptom that denotes an auditory "phantom" perception without external sound source. Prevalence estimates vary widely due to broad variations in study quality and construct definitions and range between 5 and 43% (McCormack et al., 2016;Biswas and Hall, 2020). The majority of people habituate to the percept (Phillips et al., 2018). However, a subset of those affected link its experience to the onset or exacerbation of psychological distress (Langguth et al., 2013) which may pose a key risk factor for symptom chronification (Wallhäusser-Franke et al., 2017) and can severely impact upon individuals' quality of life (Cima et al., 2011;Baguley et al., 2013;Ayodele et al., 2021).
In conceptualizing the interplay between tinnitus and affective symptomatology, many researchers have highlighted interdependent associations between tinnitus-related-and broader psychological distress (Ahmed et al., 2017;Bhatt et al., 2017;Boecking et al., 2019). Hence, it is not surprising that psychological treatment approaches have demonstrated effectiveness across both tinnitus-specific and associated psychological domains (Hesser et al., 2011;Cima et al., 2014;Zenner et al., 2017;Landry et al., 2020). Among these treatment approaches, a 7-day Intensive Multimodal Therapy Programme has demonstrated beneficial, if small, long-term effects on tinnitus-related distress, wider emotional distress, and depressive symptoms (Seydel et al., 2010(Seydel et al., , 2015Brueggemann et al., 2018a,b). This psychologically anchored, 7-day Intensive Multimodal Therapy Programme comprised detailed ear-nosethroat (ENT), psychosomatic and psychological diagnostics as well as psychoeducational ("counseling"), auditory, relaxation and physiotherapy-related elements whilst placing particular emphasis on cognitive-behavioral treatment components to address and alleviate emotional distress.
Amidst these psychometric evaluation efforts, replications of previous findings are essential in order to build a reliable and valid evidence base across measures, translations, patient populations and treatment approaches (Zeman et al., 2012;Fackrell et al., 2016). For example, it has been pointed out that the English and German versions of the TQ differ considerably (Fackrell et al., 2014)

Participants
The present sample consisted of N = 210 adult patients with chronic tinnitus who attended the Tinnituscentre in 2015 and provided both baseline and post-treatment data; i.e., completed  the TQ, THI, and TFI on the first and last days of the therapy programme. Participants were between 18 and 77 years old (M age = 48.39 years; SD = 12.38). Forty-four percent were female. Participants were included if they were 18 years of age or older and reported experiencing chronic tinnitus for more than 3 months. Subjects were excluded if they reported significant difficulties in understanding the German language or if identifiable medical factors explained the tinnitus symptomatology. All patients signed an informed consent form agreeing for the study data to be collected and used for research purposes. The Charité Universitätsmedizin Berlin's ethics committee approved data analysis (EA4/137/20).

Tinnitus Questionnaire (German version)
The TQ (Hallam, 1996;German version: Goebel and Hiller, 1998) is a self-report measure designed to assess tinnitus-related distress. The German version consists of 52 statements that are answered on a 3-point scale (0 = not true, 1 = partly true; 2 = true). The total score sums 40 items with two items being included twice, thus yielding a score between 0 and 84.   points has been considered to denote reliable clinically significant improvement . In the current sample, the measure's internal consistency was excellent (α = 0.92).

Tinnitus Handicap Inventory
The THI (Newman et al., 1996(Newman et al., , 1998German version: Kleinjung et al., 2007) measures self-perceived tinnitus handicap severity. It consists of 25 items that are answered on a 3-point scale (0 = no; 2 = sometimes; 4 = yes) resulting in a total score between 0 and 100. It features three subscales: [1] functional (role limitations in the areas of mental, social/occupational, and physical functioning), [2] emotional (affective reactions to tinnitus), and [3] catastrophic responses [catastrophic responses to the symptoms of tinnitus; (Newman et al., 1996, p. 144)]. A change score of at least seven points has been considered to denote reliable clinically significant improvement (Zeman et al., 2011). In the current sample, the measure's internal consistency was excellent (α = 0.93).

Statistical Analysis
Where possible, statistical analyses for this paper follow the approach applied and reported by Jacquemin et al. (2019) in order to facilitate comparability of results. Importantly, however, whilst Jacquemin et al. (2019) examine the TQ and TFI's responsiveness to HD-tDCS-treatment with regard to an additionally measured external criterion (a patient-rated clinical global improvement score), such a criterion was not available in the present study. Hence, the here-reported responsiveness analyses are limited to three-way cross-comparisons.

Descriptive Analyses and Treatment-Related Changes
We examined the means and standard deviations for the tinnitus and psychological measures at baseline and post-treatment. Treatment-related change was quantified by computing dependent samples t-tests and estimating effect sizes d with 95% confidence intervals (Cohen, 1988). Here, | d | < 0.20 denotes a negligible, 0.20 < | d | < 0.49 a small, 0.50 < | d | < 0.79 a moderate, and | d | > 0.80 a large effect size.

Convergent Validity
Convergent and discriminant validity between the tinnitus questionnaires' total and subscale scores was examined using two-way-mixed intra-class correlation coefficients (ICC; Field, 2005). We expected high convergent validity between the tinnitus questionnaires' total scores. By contrast, expectations for the

Responsiveness
Based on the respective tinnitus questionnaires' minimal clinically significant improvement thresholds, patients were classified as responders or non-responders. To compare the questionnaires' responder classifications, we computed three sets of analyses: First, κ coefficients indexed the agreement between the different responder classifications (Altman, 1991). κ < 0.00 indicates poor, 0.00 < κ < 0.20 slight, 0.21 < κ < 0.40 fair, 0.41 < κ < 0.60 moderate, 0.61 < κ < 0.80 substantial, and κ > 0.81 perfect agreement (Landis and Koch, 1977). Second, independent samples t-tests compared change scores between the respectively classified responder vs. non-responder subgroups across both total and subscale scores of each tinnitus questionnaire. Third, Receiver Operator Characteristics (ROC) analyses investigated, if each tinnitus questionnaire's [a] change or [b] post-treatment score effectively distinguished between responders and non-responders as classified by the two respectively remaining questionnaires. The associated "area under the curve" statistic (AUC) denotes 0.50 < AUC < 0.70 low, 0.71 < AUC < 0.90 moderate, and AUC > 0.91 high ability of the predictor variable to do so (Streiner and Cairney, 2007;Pintea and Moldovan, 2009). All analyses were computed using SPSS statistical software version 25 (SPSS Inc., Chicago Il, United States).

Descriptive Analyses and Treatment-Related Changes
As a first step, the frequency distributions of the total scores of the three tinnitus questionnaires were examined at baseline (see Figure 1). Visual inspection of the associated Q-Q plots suggested that the tinnitus scores were normally distributed, and we used parametric tests for all subsequent analyses. Investigating treatment-related change, Table 1 features descriptive statistics and baseline-to-post-treatment changes for each questionnaire's total and subscale scores.
All total scores showed significant improvements with treatment. Similarly, most subscale scores changed significantly except for [TQ] "somatic complaints, " and [TFI] "intrusiveness, " "cognitive interference, " and "auditory difficulties attributed to tinnitus." Most changes yielded small effect sizes with confidence intervals ranging from negligible ([TQ]

Convergent Validity
Intraclass correlation coefficients examined the convergent and discriminant validity between the tinnitus questionnaires' [a] total scores at baseline and post-treatment (Table 2, Panel 1) and [b] subscale scores at baseline, respectively (Panel 2).
The total scores of all tinnitus questionnaires showed moderate-to-good agreement at both baseline and posttreatment. The cognitive-emotional subscale scores of the TQ and THI showed moderate agreement. The TFI subscale scores showed poor agreement with both the TQ and THI subscale indices.

Responsiveness
Juxtaposing total treatment-related change with that observed in the respectively specified responders subgroups, Figure 2, (A) depicts box plots that illustrate the total scores of the TQ, THI and TFI at baseline and post-treatment (see also

Comparisons of Change Scores for Responders vs. Non-Responders According to Each Tinnitus Questionnaire's Responder Classification
Next, we compared the baseline-to-post-treatment change scores for responders vs. non-responders as classified by each questionnaire's MCID thresholds (Table 3).
Cross-comparing the questionnaires' responder classifications and associated within-group change scores, responders showed higher levels of change than non-responders did on the [TQ] total and subscale scores, [THI] total and subscale scores, and [TFI] total and "control, " "cognitive, " "quality of life, " and "emotional" subscales. On the remaining TFI subscales, responder change scores differed from non-responder change scores according to the THI and TFI's but not the TQ's responder classifications.

ROC Analyses
ROC analyses then estimated the ability of each questionnaire's [a] baseline-to-post-treatment change and [b] post-treatment scores to distinguish between responders and non-responders as classified by the respectively remaining questionnaires (Table 4).
Results indicated that each questionnaire's total change score "moderately" distinguished between responders or non-responders as classified by the respectively remaining questionnaires. By contrast, post-treatment scores yielded only a "low" ability to do so.

DISCUSSION
The present study investigated, in the same study, the convergent validity and responsiveness of [a] the German versions of [b] the TQ, THI, and TFI [c] before and after a psychologically anchored, 7-day Intensive Multimodal Therapy Programme. The questionnaires were completed by a large convenience sample of N = 210 with chronic tinnitus. Where possible, the present study followed the analysis outline set by Jacquemin et al. (2019) who compared the Dutch versions of the TQ and TFI before and after six sessions of HD-tDCS. Unlike this work, however, our study did not feature a patient-rated clinical global improvement criterion. Consequently, responsiveness analyses were limited to cross-comparisons of the three tinnitus questionnaires.
Across both baseline and post-treatment timepoints, the total scores of the TQ, THI, and TFI showed high convergent validity. In keeping with conclusions drawn by previous studies (Baguley et al., 2000;Jacquemin et al., 2019), all questionnaires thus measure tinnitus-related distress, and their total scores appear comparable across both research and clinical contexts -at least when examining studies from German-speaking populations.
Analogous to results reported by Jacquemin et al. (2019) for the Dutch versions of the TQ and THI, the German versions' subscale scores showed poor agreement irrespective of similar factor labels. Unlike results from the Belgian study, the [TQ] "cognitive-" and "emotional distress" subscales did not show agreement with the [TFI] "intrusiveness" subscale score thus emphasizing the need to consensually define "intrusiveness"across both cultural spheres, languages and intervention approaches (Londero and Hall, 2017;. In the present study, the [THI] "catastrophic" subscale showed moderate agreement with the [TQ] "cognitive distress" subscale, the [THI] "emotional" with the [TQ] "emotional distress" and "cognitive distress" subscales and the [THI] "functional" with the [TQ] "emotional distress" and "intrusiveness" subscales suggesting an overlap in measured constructs across these indices. However, a need for homogenization of labels emerges as despite a similarity of measured constructs, applied labels feature wide variability and vice versa . Most indices showed significant change with treatment, except for the [TQ] "somatic complaints" and [TFI] "intrusiveness, " "cognitive, " and "auditory" subscales. Unlike results reported following the HD-tDCS intervention (Jacquemin et al., 2019), the more psychologically focused Intensive Multimodal Therapy Programme examined in the present study appeared to be associated with improvements across psychological indices as measured by the TQ, THI and some TFI indices.
Using previously defined MCIDs (Zeman et al., 2011;Meikle et al., 2012;, responder vs. non-responder classifications showed fair agreement for the TQ | THI as well as the TQ | TFI, and moderate agreement for the THI | TFI. Proportionately, the TQ and TFI yielded comparable responder rates (19.65 and 18.64% respectively) whilst the THI responder classification resulted in an overall higher proportion of responders (38.15%).
Investigating change rates across a scale × questionnairespecific responders vs. non-responders classification matrix revealed that, compared to non-responders, responders showed significantly higher changes across most indices of all three questionnaires. Exceptions comprised the [TFI] "intrusiveness, " "sleep, " "auditory, " and "relaxation" subscales that significantly improved according to the THI's and TFI's, but not the TQ's responders classifications.
Finally, ROC analyses revealed that each questionnaire's change score showed a "moderate-to-high" ability to distinguish between responders and non-responders as classified by the remaining two questionnaires indicating reasonable overlaps in the identification of treatment responders between the three measures. Post-treatment scores yielded only a "low" ability to do so suggesting that [a] all questionnaires adequately measure treatment change and [b] change scores are the index of choice when wishing to quantify treatment change or compare outcome studies.
In conclusion, the present study demonstrated [a] high convergent validity for the total scores of the German versions of the TQ, THI, and TFI and [b] moderate agreement between TQ and THI subscale scores with each discriminating against TFI indices. Each questionnaire is thus suitable as an outcome measure. Baseline-to-post-treatment change scores successfully distinguished between responders and nonresponders as per each questionnaire's responder classification threshold. Comparing the three measures, results of the present study indicated that [a] the TQ and THI showed higher sensitivity to change than the TFI when focusing on statistical significance, [b] the THI and TFI showed higher sensitivity to change than the TQ when comparing responders vs. non-responders as defined by the questionnaires' MCID scores, [c] the TQ and TFI yielded lower, yet comparable responder rates compared to the THI which classified a higher proportion of patients as responders, and [d] the THI and TFI showed high agreement between responders and non-responders classifications with the former possibly featuring a higher rate of Type I errors.
In keeping with Jacquemin et al.'s (2019) conclusion, the TFI appears most suitable as an outcome measure when aiming to identify treatment responders in tinnitus-specific domains. Notwithstanding, the THI or TQ may be preferable when the featured psychological constructs form the focus of interest -perhaps in more psychologically orientated research or intervention contexts.
The present study has important limitations: First, because it did not feature a patient rated criterion of clinical global improvement, the three questionnaires fall short of extended validity or responsiveness investigations. Second, the present two time-point design does not preclude the possibility that measurement error accounted for a proportion of the measured treatment change (Schmidt and Hunter, 1996;de Vet et al., 2006). Future prospective multi-timepoint studies will be helpful in addressing this issue. Third, MCID scores -and thereby responders classifications -are usually established using subjective estimates of clinical global improvement following a particular treatment and are thus likely to show variability across baseline symptom severity, type of intervention, or patient (sub)populations (Olsen et al., 2018;Draak et al., 2019). Fourth, it is noteworthy that the questionnaires' subscales have not been validated for the assessment of tinnitus-related distress or treatment-related change. Hence, the here-presented subscale analyses ought to be interpreted with caution. Despite these limitations, the present study extends our knowledge of the emerging psychometric literature of measures of tinnitus-related distress by comparing the convergent validity and responsiveness of the German versions of three commonly used questionnaires in the context of a psychologically anchored multimodal treatment programme.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because as per Charité Universitaetsmedizin Berlin's Ethics Committee, unfortunately, we cannot make the data public without restrictions because we did not obtain patients' consent to do so at the time. Nevertheless, interested researchers can contact the directorate of the Tinnitus Center Charité Universitätsmedizin Berlin with data access requests (birgit.mazurek@charite.de).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Charité Universitätsmedizin Berlin (EA4/137/20). The patients/participants provided written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
BB designed and performed the data analysis, wrote the original draft, addressed the reviewers' comments, and wrote the final version of the manuscript. BB, PB, and BM supervised data analysis. PB and BM reviewed the manuscript. PB, TK, and BM curated the datasets. BM led the project.