The voice characterisation checklist: psychometric properties of a brief clinical assessment of voices as social agents

Aim There is growing interest in tailoring psychological interventions for distressing voices and a need for reliable tools to assess phenomenological features which might influence treatment response. This study examines the reliability and internal consistency of the Voice Characterisation Checklist (VoCC), a novel 10-item tool which assesses degree of voice characterisation, identified as relevant to a new wave of relational approaches. Methods The sample comprised participants experiencing distressing voices, recruited at baseline on the AVATAR2 trial between January 2021 and July 2022 (n = 170). Inter-rater reliability (IRR) and internal consistency analyses (Cronbach’s alpha) were conducted. Results The majority of participants reported some degree of voice personification (94%) with high endorsement of voices as distinct auditory experiences (87%) with basic attributes of gender and age (82%). While most identified a voice intention (75%) and personality (76%), attribution of mental states (35%) to the voice (‘What are they thinking?’) and a known historical relationship (36%) were less common. The internal consistency of the VoCC was acceptable (10 items, α = 0.71). IRR analysis indicated acceptable to excellent reliability at the item-level for 9/10 items and moderate agreement between raters’ global (binary) classification of more vs. less highly characterised voices, κ = 0.549 (95% CI, 0.240–0.859), p < 0.05. Conclusion The VoCC is a reliable and internally consistent tool for assessing voice characterisation and will be used to test whether voice characterisation moderates treatment outcome to AVATAR therapy. There is potential wider utility within clinical trials of other relational therapies as well as routine clinical practice.


Background
Voice-hearing, or auditory verbal hallucinations (AVH), are a common experience among those diagnosed with psychotic disorders (1) and there is growing interest in voice-hearing across diagnoses as well (2). While voices can occur in the general population without associated distress (3,4), for a significant number of voice-hearers, the experiences become persecutory, debilitating and persist despite interventions (5).
Voices are often described in terms of an experience of communication with a personified other (6, 7), and there has been longstanding interest in this aspect of voice phenomenology (8,9). Personification or characterisation of voices (terms we view as essentially equivalent) is common, and around 70% of voice-hearers associate their voice(s) with 'characterful qualities' (10); that is, people or person-like entities with distinct characteristics, such as gender, age, patterned emotional responses, or intentions. In a study involving people accessing early intervention in psychosis services 40% of participants described complex voice personification (6). This was defined as the voice having more than one kind of person-like quality, including elaborate descriptions of intentional states (the voice wants/ thinks/feels), agency (the voice will 'make something happen'), or identity (the voice 'comes' from somewhere or has a specific and idiosyncratic ontological status). The increased recognition of the communicative and relational aspects of voice-hearing demonstrated by such studies, reflects an important evolution from early information processing accounts which centred on the misattribution of an 'auditory stimulus' to an external source [see (11) for a discussion]. While existing tools adopt a multidimensional approach to voices, including assessment of coping strategies, rating of beliefs, and acceptance or mindfulness, there are currently no validated measures assessing voice characterisation (12).
There is growing interest in developing treatments, which are tailored to diverse phenomenological features of voice-hearing (13). This includes a new wave of psychological interventions which target the relationship between the person and their voice, specifically Relating Therapy (14), Talking with Voices (15), and AVATAR therapy (16). In AVATAR therapy, a novel therapeutic context allows 'face-toface' dialogue between the person and a computerised representation of their persecutory voice. Using voice-transformation software, the therapist facilitates a dialogue between the person and the avatar in which the person develops an increased sense of power, control, and confidence within the relationship. This approach has been shown, in a fully powered trial, to reduce voice frequency and voice-related distress when compared with an active control at the end of therapy (primary endpoint) although group differences did not persist at follow-up (17). A large multi-site randomised controlled trial focused on optimization and implementation is underway (18). While there is promising evidence of effectiveness, including emerging replication by independent research teams (19) there is a need for research into factors which might influence AVATAR therapy outcomes that are likely to be relevant to other relational approaches.
A study published as part of the first AVATAR therapy trial investigated whether the experience of a person's dominant voice as a highly characterised social agent was associated with differences in voice engagement in both daily life and during AVATAR therapy (20). In line with study hypotheses, more highly characterised voices were associated with increased behavioural engagement with voices in daily life and, crucially, increased dialogic engagement during AVATAR dialogues. While this suggested that voice characterisation may be an important factor in engagement with AVATAR therapy, the study was not designed to test the key question as to whether this phenomenological aspect of voices might moderate treatment outcomes. To date, studies exploring voice characterisation or personification have utilised coding of phenomenology based on detailed clinical assessments (20) or qualitative interviews (6). This approach is well suited to exploration of what can be complex and nuanced voice phenomenology but presents challenges in a large clinical trial with the requirement for a comprehensive assessment battery of validated measures.
A tool capable of assessing voice characterisation in an efficient but robust manner is therefore required to examine the impact of voice characterisation on outcomes following intervention. Such a tool would also have wider utility beyond the research context, for example, as an aid to comprehensive clinical assessment of this hitherto neglected aspect of the voice hearing experience. The AVATAR2 trial is a multi-site randomised controlled trial of AVATAR therapy in comparison to treatment as usual (18). As part of the trial design, we have developed the Voice Characterisation Checklist (VoCC) based on the framework developed in AVATAR1 (20) and aim to examine its reliability with the large sample of voice-hearers taking part in AVATAR2. This group of voice-hearers report current voicerelated distress and include a wide range of pathways to care and voice-hearing experiences.

Aims
• To examine the reliability and factor structure of the Voice Characterisation Checklist (VoCC) in a sample of people who hear distressing voices. • To report a preliminary description of the characterisation of the voice-hearing experiences in participants in the AVATAR2 clinical trial.

Recruitment
AVATAR2 is a multi-site parallel group randomised controlled trial which is due to be completed in October 2023 (18). Randomisation to AVATAR-brief (six sessions), AVATAR-extended (12 sessions) therapy or Treatment as Usual was performed on a 1:1:1 allocation basis and was stratified by voice characterisation (more vs. less highly characterised). Four United Kingdom research sites took part in the trial: King's College London, University College London, The University of Manchester and the University of Glasgow. Each research site was linked to two National Health Service (NHS) Trusts/ Health Boards, where potential participants were identified and referred to the trial by their treating clinician. Self-referrals were considered too, and recruitment databases and consent for contact (C4C) initiatives were also utilised where available to maximise the participant pool. The full inclusion and exclusion criteria can be found in the published protocol (18), in brief, participants were adults who had been hearing a distressing voice(/s) within the context of psychosis for at least 6 months at the time of the baseline assessment.

Procedure
The Voice Characterisation Checklist (VoCC) was administered as a semi-structured interview by research assistants as part of the baseline assessment which took place face-to-face or online. To prevent rater drift across the trial, research assistants received training, passed an observed assessment, and attended weekly group supervision from clinicians in administration of this and other measures.

Voice characterisation checklist
The voice characterisation checklist was devised from a qualitative coding framework employed by Ward et al. (20) in their study of voice characterisation and avatar engagement, which was itself informed by previous phenomenological work, e.g. (10). The VoCC is administered as an interview and scored by the interviewer, the language used to refer to the voices is flexible to enhance communication and understanding and interviewers may use a variety of terms; singular, plural, voices and others. In the VoCC there are 10 items, scored 'Yes' , 'No' or 'Do not Know' which assess key areas highlighted in the qualitative coding framework: identity, physical and psychosocial characteristics. Items are scored 'Yes' where participants can provide information in response to the question, a 'No' where they have no information to provide, and 'Do not Know' if they are unsure if it applies to their voice. Anecdotally reported time to administer the VoCC ranged from 5 to 30 min. The range of scores is 0-10 and a score of 7+ is the threshold for a more highly characterised voice as this ensures the voice has traits in all three categories. The VoCC is free to use and available in Figure 1.

Statistical tests
The descriptive statistics of the included sample as well as the frequency of VoCC responses were reported, to provide a general overview of the data. The scale's reliability was assessed through interrater reliability and internal consistency analysis (Cronbach's alpha). Inter-rater reliability was assessed in a sample of 33 AVATAR2 participants, who were randomly selected from the pool of participants' IDs across four sites: South London (n = 8), North London (n = 8), Manchester (n = 9), and Glasgow (n = 8). A total of 13 research assistants from the four sites are represented in the scores used. The lead author (CE), acted as the expert scorer and blind rated the VoCC from audio recordings. Internal consistency, on the other hand, was determined by assessing the correlation between items within the scale.
To determine the underlying construct or factors and assess the validity of the conceptual model, an exploratory factor analysis (EFA) was conducted on the 10 VoCC items (21). For this analysis, the iterated principal axis method, also known as principal factors, was used as the factoring estimation method. This method is a robust and efficient way of finding the few factors that account for the common variance of several variables. Oblique rotation (promax) was used to better interpret the factor loading (22). Promax allows for correlated factors, which is more realistic in many psychological studies (23).
Before conducting the factor analysis, the Bartlett test of sphericity was conducted. A value of p less than 0.05 indicates that the correlation matrix of the observed variables is not an identity matrix, and that the variables are correlated enough, therefore suitable for factor analysis. Additionally, the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy was calculated to provide an overall measure of the overlap (shared variance) between the variables. A KMO value of more than 0.6 is generally considered acceptable, indicating that the sample is suitable for factor analysis (24) [Statistical analyses were conducted using Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC, and R statistical programme (2022) (25)].

Sample
The sample comprised participants who had completed their baseline assessment as part of the AVATAR2 trial between January 2021 and July 2022, the cut-off date for uploading the database for this study (n = 170). All participants, demographic characteristics are presented in Table 1.

Frequency of responses
The 'Unclear/Do not Know' response choice is recoded as ' Absent' to create a dichotomised variable. The frequency of dichotomised response choices for each item is presented in Table 2 and Figure 2. Overall, there are 561 Absent (33%) and 1,139 Present responses (67%). Bases on the overall cut-off score 7 or higher, from the 170 participants, 71 (41.8%) were classified as less highly characterised and 99 (58.2%) were classified as more highly characterised, with the ratio of 1.4 (more/less).
An example of responses to the VoCC for more versus less highly characterised voices can be seen in Table 3. These responses were given by two participants of the AVATAR2 trial when administered the VoCC at baseline assessment, details have been altered to protect patient identity.

Statistical analysis
To evaluate the item-to-item relationship of the VoCC, a pairwise correlation analysis was conducted on the 10 binary variables (indicating the presence or absence of each characteristic). The results of this analysis are presented in Table 4. Subsequently, an exploratory factor analysis was performed on this matrix to identify underlying latent factors and patterns of association among the variables. The highest correlation observed was between the presence of Q2 and Q9 (r = 0.49), while the lowest correlation was found between the presence Q2 and Q6 (r = −0.002).

Factor analysis
The Bartlett sphericity test findings were acceptable (Chi 2 = 238.9, df = 45, p < 0.0001) and KMO = 0.772 (>0.60 is desirable). Two factors had an eigenvalue of more than one and cumulatively explained about 29% of the data variance. The correlation between the two factors was 0.63 and the factor loading for each item is presented in Table 5.

Internal consistency
The α coefficient (Cronbach's α) for the 10 items of the VoCC was 0.71, which is considered acceptable within the range of 0.7-0.8. An examination of item-level correlations and Cronbach's α after removing each item revealed no significant impact on the overall α coefficient, as none of the coefficients exceeded the all-items coefficient (Table 6).

Discussion
This study aimed to present the VoCC as a novel brief (10 items) tool for assessing the extent to which a distressing voice is experienced as a characterised social agent. The study has demonstrated its reliability and internal consistency within a large sample of people who experience distressing voices, recruited as part of the AVATAR2 trial. The findings therefore establish the VoCC as a useful research tool, capable of reliably (and quickly) assessing voice characterisation, which we hypothesise to be a potential moderator of treatment outcome in AVATAR therapy. In addition to use in a research context, where the VoCC's brevity means it is easily integrated as part of an assessment battery, the tool has also been designed with wider utility in mind as a means of facilitating assessment of voice characterisation in routine clinical practice.
The descriptive data indicate that most people in the AVATAR2 sample report voices which are personified to some degree (94%) with high endorsement of voices as distinct auditory experiences (from one another and other sounds; 87%) and with associated basic attributes of gender and age (82%). Endorsement of psychosocial aspects was more varied. For example, while most people identified a basic voice intention (75%) and personality (76%), only around a third (35%) endorsed the item assessing attribution of mental states to the voice ('What are they thinking?'). A similar minority of people identified a known historical relationship with the voice (36%) although the nature of these autobiographical relationships was not possible to determine from the checklist-context, which is likely to be crucial within the nuance of a relational intervention, where developmental trauma often plays a pivotal role. This descriptive pattern of endorsement across items was supported by the factor analysis which confirmed two Voice characterisation checklist (VoCC).
Frontiers in Psychiatry 05 frontiersin.org factors, one incorporating physical and identity characteristics, and the other the psychosocial characteristics. The two items focused on relationships between the voice and others (Q9 and 10), originally conceptualised as psychosocial characteristics, loaded onto Factor I. The stronger association between these relational items and the identity and physical characteristics of the voice rather than the psychological items in Factor II should be examined in further validation of this scale. Overall, the findings are consistent with the proposition that characterisation (or personification) is a common feature of voice-hearing but also suggest the relevance of potential 'levels of agency' (27). While not designed to explore the granular complexity of voice agency, the data from the VoCC appear broadly consistent with earlier phenomenological work (6) suggesting that most voices recurred over time, had a distinct character, but could not be related to a known person (termed 'internally individuated agency') (27) and reported by 75% of people in the study by Alderson-Day et al. (6). In summary, the findings presented here therefore confirm, in a large empirical/quantitative study, that voice characterisation is a common phenomenon among distressed voice hearers, with most of this sub-sample endorsing the items regarding physical characteristics and identity. Fewer people (although still a significant minority of 30-40%) endorsed the psychosocial items around the intention and thoughts of the voice, which may reflect more general difficulties in mental state attribution (28). The threshold for more highly characterised voices in the VoCC (a score of 7 or above) requires someone to endorse items across both the physical and psychosocial categories. This does not account for the complexity of the characteristics, but only that an awareness of both physical and psychosocial components are part of the person's experience of the voice; this therefore is a low threshold for considering a voice to be more highly characterised when compared with the thresholds devised utilising qualitative frameworks. In line with this, we found 58.2% people reached the threshold for more highly characterised voices in this sub-sample compared to earlier work (20) in which 33% percent reported high voice characterisation, 42% medium and 25% low. Previous work (6, 20) highlight differences in voice engagement between high characterisation versus low/medium characterisation meaning that the current VoCC threshold will require further validation in future work. Nonetheless, from a clinical utility standpoint, the VoCC presented in this paper appears a useful tool to facilitate clinical assessment around this potentially important feature of voice-hearing (see clinical implications).

Limitations
While we have demonstrated reliability and internal consistency, validity of the VoCC was not examined because, to our knowledge, there are no validated quantitative measures which assess this specific construct. Future studies could explore convergent validity of the VoCC with coding of voice personification based on qualitative analysis, e.g. (6). It should be noted that the purpose of the VoCC is not to supplant the valuable insights delivered through qualitative work but rather to connect this important phenomenological work with the exigencies of a clinical trial and routine clinical practice. With respect to constructs which are plausibly linked to characterisation, the DAIMON measure (29) has been developed to assess the dialogical and emotional aspects of the relationship(s) between the voice-hearer and their voices and relationships with the VoCC could be explored in future research. While reliability of the categorisation of voices as more versus less highly characterised was acceptable overall, the least reliable question from the item-level analysis was 'does the voice have its own personality?' While this might be viewed as a central question, assessing a sense of personality or character is arguably a more complex task compared to other items. It may therefore be that this item is less suited to a briefer 'checklist' with evidence that rater disagreement related to times where researchers were rating based on contextual information emerging at other stages of the assessment. It was notable that the overall reliability of the measure was improved with removal of this item. Therefore, one suggested option is to streamline the VoCC to include nine items but retain this question at the end as an optional (but suggested) aid to clinical assessment.
Finally, it is important to note that participants in this study (n = 170) were recruited as part of a trial for a relational intervention for voices (AVATAR therapy), so we are not able to generalise these findings to people who hear voices more generally, both in clinical groups and people who experience voices without an associated need for care.

Future directions
The VoCC was developed as part of the AVATAR2 trial, to enable voice characterisation to be included as a moderator of treatment outcome following AVATAR therapy. The VoCC has been used to stratify randomisations according to degree of voice characterisation (adopting a binary classification of 'more highly' vs. 'less highly' characterised). The tool has been suitable for integration within a comprehensive trial baseline assessment and the findings are positive with respect to establishing reliability and internal consistency. However, linked to its use as a stratification variable, a further key test of utility of the VoCC will come in the planned analysis of moderation of treatment outcome by degree of characterisation. If the VoCC does show utility with respect to these planned moderation analyses, it would suggest opportunities for exploring its use in trials of other relational approaches to working with distressing voices. For example, the Talking with Voices approach adopts an inclusion criterion based on people experiencing voices which are (at least to some degree) dialogic in form, given the nature of the therapy which involves direct (facilitated) dialogues with the voices. This inclusion decision is based on a discussion with participants to establish whether the approach is a 'good fit' for the person. Pilot work in the Talking with Voices approach suggests that instances in which people were unable or unwilling to engage in voice dialogue were relatively uncommon (15). Nonetheless, if characterisation as assessed by VoCC is shown to moderate treatment outcome to AVATAR therapy, it would be of interest to explore whether this is also observed in other dialogical approaches.
In addition to use in clinical trials, the questions themselves have been reported as helpful by some participants on the AVATAR2 trial, underscoring the importance of routinely assessing the social and relational elements relevant to the person and their voices. In our view, this relates to an attitude of respectful curiosity to voice The histogram of VoCC overall score for 170 participants.
Frontiers in Psychiatry 07 frontiersin.org phenomenology and developmental context which is central to the AVATAR therapy approach. We recommend potential use of the VoCC in clinical practice as part of a standard voices assessment. Use of the tool delivers an important, early message that the clinician is respectfully open to considering voices as nuanced, social communicative agents within the person's life rather than just a symptom. A richer understanding of voice characterisation, including attribution of thought and intention, can facilitate the process of building understanding and meaning making. It also acts as an invitation to consider possible mirroring of current voice experiences with other relationships, autobiographical context, and the role of trauma (See also (15)). Future work using the VoCC could also

More highly characterised Less highly characterised
Is it a person? Yes they have got a name and everything. I have got a very distinct idea of who it is… so the leader is a guy called Bill he lives below me apparently.
He lives there with his wife but now he is changed that to his partner because he is now bisexual. He threatens to beat me up constantly but he is a coward because whenever I say yes okay let us do this he will not meet up with me to do it so he is basically a loudmouth who just swears and rants and raves and he is the most unpleasant out of all of them.
I think it is… I have never asked this question to myself so I do not know. I think it might be like… it is not a person as such. I think it is more, maybe, I do not really believe in ghosts but it might be a spirit or a bad entity.
Age and gender Yeah I had say he is about 40. No.
Distinctive sound qualities Well it was a Scottish guy initially and then the Scottish guy seemed to morph into Bill and now Bill sounds more Irish than Scottish.
There is no accent. It is almost like my thoughts but it is saying words and sentences.
What does the voice want? He wants my money basically and also to punish me. The idea as well is that they will get me sectioned, somehow take my flat off me-I do not know how they will do that-then they will get a tenant and charge them rent.
I am not sure I have never asked it.

Summary
This study has, for the first time, presented a brief tool to assess degree of voice characterisation (the VoCC), which is reliable, internally consistent, and capable of being delivered as part of clinical research and practice. The VoCC meets a need for robust measures to assess constructs relevant to relational therapies. Moving forward, the key test of utility will be whether it is helpful in helping us understand the question of whether certain forms of voice-hearing are more amenable to dialogical interventions such as AVATAR therapy.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by NHS Health Research Authority London-Camberwell St Giles Research Ethics Committee. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.    Frontiers in Psychiatry 09 frontiersin.org