Acoustic analysis of clients’ expression of self-compassion, self-criticism, and self-protection within emotion focused therapy video sessions

Introduction When it comes to the non-verbal communication of emotions, it is apparent that the human voice is one of the main ways of expressing emotion and is increasingly important in psychotherapeutic dialog. There is ample research focusing on the vocal expression of emotions. However, to date the analysis of the vocal quality of clients’ in-sessional emotional experience remains largely unexplored. Moreover, there is generally a gap within the psychotherapy literature in the understanding of the vocal character of self-compassion, self-criticism, and protective anger. Methods In this study we investigated how clients vocally convey self-compassion, self-protection and self-criticism in Emotion Focused therapy sessions. For this purpose we investigated 12 commercially available Emotion Focused Therapy videos that employed a two chair or empty chair dialog. Praat software was used for the acoustic analysis of the most common features – pitch (known as fundamental frequency or F0) and intensity (voice amplitude, i.e., loudness). Results Results showed that intensity was significantly higher for self-criticism and self-protection than for self-compassion. Regarding pitch the findings showed no significant differences between the three states. Discussion More research analyzing acoustic features in a larger number of cases is required to obtain a deeper understanding of clients’ vocal expression of self-compassion, self-protection and self-criticism in Emotion Focused Therapy.


Introduction
In Emotion Focused Therapy (EFT) emotional processing is acknowledged to have significance for therapeutic change (Pascual-Leone and Greenberg, 2007;Auszra et al., 2013;Pascual-Leone, 2017;Haberman et al., 2018).According to the EFT model of change, selfprotection (also known as assertive anger) and self-compassion are particularly relevant to the transformation of self-criticism (Pascual-Leone and Greenberg, 2007;Timulak, 2015;Pascual-Leone, 2017).Furthermore, it is well known that self-criticism is associated with high levels of shame and negative self-evaluation (Blatt and Zuroff, 1992;Gilbert et al., 2004;Haberman et al., 2018), and that the activation of self-compassion and self-protection are recognized as effective responses for coping with self-criticism (Pascual-Leone, 2017;Halamová et al., 2018Halamová et al., , 2019;;Halamová and Kanovský, 2019).In EFT it is further established that developing self-compassion and self-protection are substantial steps to transforming global distress (e.g., hopelessness, self-pity, helplessness, guilt) (Timulak, 2015;Pascual-Leone, 2017).Therefore, in the sequential emotional processing model (for more information see Pascual-Leone and Greenberg, 2007;Pascual-Leone, 2017) selfcompassion and self-protection are seen as primary adaptive emotions and key elements to the process of emotional change.Expressing unmet needs through self-protection, being kind, caring and acknowledging human imperfection through self-compassion strengthens the self and decreases negative self-evaluation and selfcriticism (Greenberg and Watson, 2006;Greenberg, 2015;Timulak, 2015;Singh et al., 2020).One of the major ways in which EFT therapists help clients transform maladaptive emotions into primary adaptive emotions is through two chair enactments.EFT distinguishes between two sorts of two chair dialogs: self-self dialogs with specific parts of the self (e.g., critical voice) and empty chair dialogs with imaginary significant others (Greenberg and Malcolm, 2002;Shahar et al., 2012;Goldman, 2017).In self-critical chair work the therapist facilitates a conversation between the critical part of the client and the experiencing self who receives the criticism (e.g., Shahar et al., 2012).During an empty chair enactment the therapist encourages the client to express unfinished, problematic emotional experiences to a significant other and work through them (Greenberg and Malcolm, 2002).Both two chair techniques are aimed at promoting clients' abilities to be compassionate toward their painful emotional experiences and developing self-protection capacities.Therefore, whether these adaptive emotions are expressed toward the self in a two chair dialog or in an empty chair dialog with a significant other makes little difference.Although activation of self-criticism, self-compassion and self-protection are acknowledged as being essential for the emotional change process in EFT, how these states are expressed in EFT sessions is a grey area.Clients express these states through different types of verbal as well as nonverbal communication including vocal quality.
When it comes to the non-verbal communication of emotions, it is apparent that the human voice is becoming increasingly important in psychotherapeutic dialog (Rice and Kerr, 1986;Tomicic and Martínez Guzmán, 2011) and one of the main ways in which emotions are expressed.There is ample research investigating the vocal expression of emotion (Kappas et al., 1991;Bachorowski and Owren, 1995;Bachorowski, 1999;Scherer, 2003;Scherer et al., 2003Scherer et al., , 2011;;Simon-Thomas et al., 2009) and demonstrating the relationship between emotional state and the acoustic characteristics of vocalizations (Kappas et al., 1991;Bachorowski and Owren, 1995;Banse and Scherer, 1996;Scherer, 2003;Scherer et al., 2003).In the literature this is called the vocal expression of emotion (Kappas et al., 1991;Bachorowski and Owren, 1995;Banse and Scherer, 1996).Generally research on the vocal expression of emotion lags someway behind the study of facial affect expression (Bailey et al., 2023).Until now research in the field of emotion recognition in speech has been mainly based on laboratory acted emotions, examining sustained vowels (Scherer et al., 1991;Banse and Scherer, 1996;Bachorowski and Owren, 2001) or utterances (e.g., Razak et al., 2003).Fewer studies have been published that use naturally occurring speech (e.g., Crangle et al., 2019).According to Parsa and Jamieson (2001) there are only a small number of studies investigating acoustic features in running speech and to our best knowledge there has been no paper evaluating pitch and intensity in continuous speech in therapy sessions.Rochman et al. (2008) were among the first to examine the vocal quality of women expressing sadness and unresolved anger in a psychotherapy setting using a two chair dialog with an attachment figure.They demonstrated the value of computerized acoustic analysis in recognizing clients' productive expression of emotional experiences.Recent advances in machine learning permit researchers to analyze vocal affect expression automatically (Vogt et al., 2008;Rabiei and Gasparetto, 2015;Kumar and Iqbal, 2019).Computerized speech recognition allows the tracking of emotion in speech in more natural settings.
Based on the present state of the evidence, the consensus is that the vocal expression of emotion involves the following variables (Kappas et al., 1991;Bachorowski and Owren, 1995;Banse and Scherer, 1996): level of frequency (perceived pitch of a sound, referred to as F0), amplitude (voice intensity, e.g., loudness), voice quality (e.g., breathing, hoarseness, harshness) and duration (speech rate, e.g., tempo and pausing).However, the most commonly analyzed features in emotion recognition [e.g., in Scherer et al. (2003), Magdin et al. (2019)] are: fundamental frequency (F0, perceived as pitch), intensity (perceived as loudness), jitter (pitch perturbation) and shimmer (intensity perturbation).Dynamic changes in pitch and intensity have been related to emotional arousal: for instance, anger is identified with a high pitch and high intensity whereas sadness is identified with a lower pitch and low intensity (Scherer et al., 2003;Rochman et al., 2008).Up until now the research has mainly focused on the analysis of basic emotions such as anger, sadness, happiness and fear (Kappas et al., 1991;Banse and Scherer, 1996;Scherer, 2003).
Consequently, there is a gap in the understanding of the vocal character of self-compassion, self-criticism and self-protection.This study was designed to identify the acoustic features, pitch and intensity of self-compassion, self-criticism and self-protection through the computer analysis of the speech signals of clients in EFT therapy sessions in a two chair or empty chair dialog.First, we review the current state of the research on the vocal analysis of self-compassion, self-criticism and self-protection generally.

Vocal expression of self-criticism
Although the importance of self-critical talk is well known in research and practice, there is a gap in the literature on the acoustic analysis of self-criticism.While there is ample research on the verbal expression of self-criticism (Whelton and Henkelman, 2002;Whelton and Greenberg, 2005;Kannan and Levitt, 2013), there is little knowledge on what self-criticism sounds like in practice when articulated by clients in therapy sessions.To the best of our knowledge and to date there has been no study examining the vocal expression of self-criticism.However, in EFT theory self-criticism is considered to be a type of problematic anger and viewed as a secondary emotion (Pascual-Leone et al., 2013;Kramer and Pascual-Leone, 2015).In the EFT model secondary emotions are reactive, defensive responses to a primary emotion (Pascual-Leone et al., 2013;Herrmann et al., 2016).Thus, self-criticism (as problematic anger) is defined as a secondary emotion to the primary maladaptive emotion of shame, characterized as the expression of self-hate and contempt by highly critical people (Blatt and Zuroff, 1992;Whelton and Greenberg, 2005;Kannan and Levitt, 2013) anger and disgust toward oneself.According to Kramer and Pascual-Leone (2015) people who are vulnerable to maladaptive anger express more self-contempt when criticizing themselves.Moreover, their study demonstrates the importance of the tone participants use when being self-critical, as a major factor affecting level of self-criticism.Hence, self-criticism is associated with the expression of anger toward oneself (Luyten and Blatt, 2012;Abi-Habib and Luyten, 2013;Pascual-Leone et al., 2013;Kramer and Pascual-Leone, 2015).In addition, according to the classification of affective meaning states (CAMS; Pascual-Leone and Greenberg, 2005) the anger known as "rejecting anger" is characterized by an increase in pitch and moderate to high intensity.We therefore assume that the voice quality of self-criticism is similar to that of rejecting anger.Rochman et al. (2008) were among the first to investigate the vocal quality of maladaptive anger in a therapeutic setting by analyzing the expression of anger before and after expressing sadness.Their findings are in line with previous research demonstrating an increase in pitch (Breitenstein et al., 2001;Razak et al., 2003;Yildirim et al., 2004) and intensity (Kapoor and Sagar Verma, 2019) associated with anger.Furthermore, it has been established that higher activated emotions (e.g., anger and happiness) are linked to higher fundamental frequency (F0) (Breitenstein et al., 2001) and voice intensity (Schröder et al., 2001).

Vocal expression of self-compassion
In contrast to the situation regarding self-criticism, there has been greater interest over the last few years in examining and differentiating a broader range of emotions such as a variety of positive emotions like compassion (Simon-Thomas et al., 2009;Kamiloğlu et al., 2020).In their review of vocal expressions of positive emotions, Kamiloğlu et al. (2020) systematically compared 108 studies investigating acoustic features across different positive emotions, highlighting differences in pitch, loudness, and speech rate.The authors classified positive emotions into emotion families such as epistemological emotions (amusement, interest, relief), savoring emotions (contentment and pleasure) and prosocial emotions (admiration).According to that study compassion is categorized as a positive emotion in connection with, for instance, kindness, contentment, pleasure and gratitude.This group of low aroused positive emotions contrasts with epistemological emotions described as having moderate pitch and loudness and a slower speech rate.The low level of these emotions characterizes their purpose: "adaptive functions for the person experiencing them" (Kamiloğlu et al., 2020, p.24).Furthermore, Simon-Thomas et al. ( 2009) indicated that compassion is communicated through vocal bursts.They investigated vocal bursts of 22 emotions.In their study judges were asked to listen and identify different vocal bursts of emotions expressed by actors."Posers" were asked to express 22 emotions without using words.Compassion was grouped as a positive, pro-social emotion along with contentment, amusement, gratitude, love and so forth, and were identified accurately.The judges identified compassion correctly in 30% of the positive vocal bursts and in 47% of the selected, prototypical vocal bursts.In line with Kamiloğlu et al. (2020) self-compassion is defined in EFT as a primary adaptive emotion that helps clients to be caring and attentive to their needs (Pascual-Leone, 2017).Diamond et al. (2010) were among the first to examine the acoustic parameters of primary emotions in an EFT setting.Their findings are in line with prior research (Scherer et al., 2003) and show a decrease in pitch and intensity during the expression of sadness when attending to one's losses, in comparison to non-emotional speech.According to Neff (2003) and Strauss et al. (2016) self-compassion is compassion directed toward oneself.It is defined as self-kindness and the ability to turn to and feel empathy for one's own suffering, among other things.Consistent with this Pascual-Leone and Greenberg (2005) describe self-compassion as the presence of caring, tenderness and nurturing by the self through a variety of selfcompassion dialogs such as caring for the inner child or a significant other, offering soothing in an imaginary dialog.For this reason, in this study the expression of compassion and selfcompassion are understood to be equivalents.Taken together these studies show compassion is identified as a low aroused adaptive positive emotion (Simon-Thomas et al., 2009;Kamiloğlu et al., 2020) marked by a moderate pitch and intensity similar to love, kindness (Sauter, 2017;Kamiloğlu et al., 2020).

Vocal expression of self-protection
In contrast to maladaptive anger expressed through self-criticism, arousing self-protection as primary adaptive anger is one of the key aspects of emotional change in EFT and a core factor in treatment outcomes (Greenberg and Malcolm, 2002;Pascual-Leone, 2017).Helping clients to stand up and express their unmet needs in an assertive manner strengthens their self and increases their sense of worthiness (Pascual-Leone and Greenberg, 2007;Timulak, 2015).Therefore, self-protection contrasts to rejecting anger, particularly when characterized by a sense of entitlement and positive selfaffirmation (Pascual-Leone and Greenberg, 2005;Timulak, 2015;Pascual-Leone, 2017).As stated in CAMS (Pascual-Leone and Greenberg, 2005) self-protection is experienced with a moderate to high expressive arousal and loud voice.The increased sense of agency and empowerment is expressed in the anger.Unlike when expressing rejecting anger, a client expressing self-protection "is strong, clear and well-grounded, and speaks with a sense of growing confidence" (Pascual-Leone and Greenberg, 2005, p. 57).There has been no study on the acoustic parameters of this type of anger comparing it to rejecting anger.

Aim of the study
To our best knowledge there has been no study on the acoustic analysis of self-compassion, self-protection and self-criticism in running speech in psychotherapy sessions.Therefore, the purpose of this study was to shed light on the following research question: What are the characteristic acoustic features of pitch and intensity in clients' expression of self-compassion, self-protection and self-criticism in real EFT-therapy sessions?

Hypothesis 1
Based on previous findings and self-criticism defined as a form of rejecting anger, we hypothesize that clients expressing self-criticism will have significantly higher pitch (e.g., Breitenstein et al., 2001;Razak et al., 2003;Yildirim et al., 2004) and intensity (Kapoor and Sagar Verma, 2019) in comparison to baseline-controlled self-compassion.

Hypothesis 2
Considering the adaptive functions of self-compassion together with previous research defining compassion as a low aroused positive emotion (e.g., Kamiloğlu et al., 2020), we expect that clients' expressions of self-compassion in an EFT session will have significantly lower pitch and intensity in comparison to self-criticism, and self-protection controlled for baseline.

Research question 1
As there is no research on the vocal quality of self-protection, we want to deepen our understanding by examining the following question: How do the acoustic parameters of the pitch and intensity of self-protection differ from self-compassion and self-criticism controlled for their baseline?

Materials
For the purpose of this study we decided to examine previously recorded videos of real EFT therapy sessions that are commercially available and can be used for research purposes.The following criteria were set for selecting the videos for this analysis.The videos had to be in English, show clients expressing selfcompassion, self-protection or self-criticism in a two chair or empty chair dialog in a therapy session, and the voice quality of the tapes had to be sufficiently good for computer software recognition.A total of 17 EFT sessions were reviewed and 12 were identified as valid examples based on the first author's requirements and consulted with the second author.Four sessions were two chair dialogs with the critical self and eight were empty chair dialogs with a significant other.The therapy sessions were led by EFT experts.All the clients were female.
The EFT videos were on the following topics: Leslie Greenberg as therapist: • EFT over time with Marcy.Psychotherapy in six sessions (American Psychological Association, 2007a,b).Sessions 2, 3, 4, and 6 were selected for this study.• EFT for Depression with Dione (American Psychological Association, 2007a).This is a series of two sessions with Dione who is suffering from depression.Both sessions were chosen for the study.

Procedure
Voice sections segmenting self-compassion, self-criticism and self-protection were coded and extracted from each 50-min therapy session by the first author and reviewed by the second author.Both coders were certified EFT therapists and trained to recognize the states of self-compassion, self-criticism and protective anger.The selected sequences were converted into .wavfiles with VLC (version 3.0.12,VideoLan, 2006).Extracts of the therapist's voice were trimmed using audacity (Version 2.3.3;R Core Team, 2019), a free audio editing tool.There are several different technical challenges involved in recognizing emotion in naturally occurring running speech in previously recorded videos.The most challenging aspects include the microphone and the environmental conditions.Our videos were not specially recorded for acoustic analysis and so the voice recordings were not of the required quality.As the recordings were poor quality and to increase the pitch determination algorithm accuracy of Praat, a profile of the background noises was obtained from the silences in each audio segment.Then the background noise was reduced, using the frequencies of the noises previously obtained.In order to eliminate the pitch-outliers and enhance the accuracy of the analysis, the outliers in maximum and minimum pitches were eliminated using the spectrum analysis in the Praat software.The software mistakenly identified most of these outliers as voice pitch.In reality though most were caused by background noises or other sources and were not emitted by the client.The final audio versions were then used for the feature extractions in Praat (Version 6.1.34;Boersma and Weenink, 2018).In the end, the length of the selfcriticism sequences was between 0:28 min and 01:34 min; the selfcompassion sequences were between 0:25 min and 03:26 min; and self-protection between 01:02 min and 04:35 min.Altogether there were seven clients and 06:84 min of self-criticism, ten clients and 11:76 min of self-compassion and ten clients and 15:19 min of protective anger.According to Diamond et al. (2010, p. 405) at least "three sentences of speech are necessary for emergent emotions to be sufficiently formed." All the audios in our study fulfilled this requirement.The baseline was extracted from 0:30 to 02:30 min of each session.We decided to start from 0:30 min as in the majority of the videos the therapists started the sessions with a short introduction about EFT and the clients had not begun speaking.The time sequence for the baseline at the beginning of the session was chosen to capture he clients' emotional neutral speech.We followed a similar approach to Rochman et al. (2008) with the timing differing as described above.Moreover, according to previous research (Watson and Bedard, 2006;Auszra et al., 2013) clients' level of emotional processing increases throughout the session and is at its lowest at the beginning of the session before the working phase.In view of this, the segments selected for the baseline were the most ideal in terms of emotional neutral speech.After extracting the client's voice a minimum of 0:40 and maximum of 01:16 min remained for the baseline, totaling 8:45 min for all 12 sessions.

Measurement instrument
Praat is among the most commonly used software for computerized acoustic analysis (Boersma and Weenink, 2018).Developed by Boersma and Weenink, Praat is a free program for the analysis and reconstruction of acoustic speech signals and available for all major computer platforms (Boersma, 2013b).It offers a wide range of procedures (Boersma and Weenink, 2018) relevant to speech emotion including fundamental frequency (pitch), speech rate, pauses, voice intensity, jitter (pitch perturbations) and shimmer (loudness perturbations).As mentioned above the vocal expression of emotion is characterized by several acoustic features (Bachorowski and Owren, 1995;Banse and Scherer, 1996).For the purpose of this study we decided to examine the most commonly used prosodic parameters that effectively characterize emotion in running speech and can be directly measured by the software (Boersma, 2013a).We decided to set the time step strategy to automatic as recommended by Praat.That way Praat computes just enough pitch and intensity values to draw reliable pitch and intensity contours (Boersma and Weenink, 2003).

Fundamental frequency (F0)
The human voice produces sounds through vocal fold vibration and resonance.This vibration produces the sound wave of the voice (Chen, 2016).The rate of the vocal fold vibrations determines the fundamental frequency F0 of the voice and is measured in Herz (Hz).Cyclic variations in the fundamental frequency are recognized by the listener as pitch.The higher the frequency of the vocal folds vibrations, the higher the pitch (Chen, 2016;Dasgupta, 2017).As reviewed by Razak et al. (2003) the pitch contour of a speaker is the most valuable indicator of the person's emotional state.Women's vocal folds, for instance, vibrate faster than men's.That is why female voices have a higher mean pitch than male voices (Murray and Arnott, 1993).According to the literature the average pitch range for women is 100-300 Hz (Traunmüller and Eriksson, 1994;Rochman, 2008).In their literature review on human vocal emotion, Murray and Arnott (1993) state that shouting not only displays an increase in intensity but also in pitch, while tiredness and sadness are characterized by a lower pitch, compared to neutral speech.Based on an acoustic analysis of emotions in speech (Yildirim et al., 2004) the pitch range for sadness is between 66 Hz and 195 Hz and the average pitch of neutral emotion is between 49 Hz and188 Hz.Furthermore, Yildirim et al. (2004) report that aroused emotions such as anger and happiness are linked to even higher pitch values than are sadness and emotional neutral speech.The pitch range for anger is around 140-400 Hz (Razak et al., 2003;Kapoor and Sagar Verma, 2019) and the average pitch for happiness is over 176 Hz (Razak et al., 2003;Yildirim et al., 2004).Self-compassion is often related to sadness and calmness.These emotions are often characterized by a lower pitch, whereas selfcriticism is defined as a form of rejecting anger, which is associated with higher pitch values.Given that all our clients were female and based on previous studies (Rochman, 2008), the pitch range in Praat was set at 100-300 Hz.

Intensity (amplitude)
The intensity of the sound is defined as the loudness of the sound.The amplitude of the vibrations (i.e., the size of the oscillations of the vocal folds) affects the loudness.The loudness depends on the level of air pressure in the lungs.The higher the intensity, the louder the voice.The sound intensity is measured in decibels (dBs).The standard intensity setting in Praat is from 50 dB to 100 dB.Sadness is associated with a lower intensity (between 60 dB and 68 dB) and anger with a higher intensity (max.85 dB) (Kapoor and Sagar Verma, 2019) than emotional neutral speech.

Data analysis
For statistical purposes, we used program R version 4.0.2(R Core Team, 2019).Our measures were repeated within individuals.We therefore used package "lme4" (Bates et al., 2015) to fit a multilevel model (8 respondents in total, and 3 states for 2 vocal expressions for each respondent).Two vocal expressions included pitch and intensity in 3 states -self-criticism, self-compassion and self-protection.We decided to do two separate multilevel models because vocal expression pitch was measured in Hz and intensity was measured in dB.The parameters of the first multilevel model were ID (variability among respondents) and Pitch (variability among the pitch states mentioned above).For the second multilevel model the parameters were ID (variability among respondents) and Intensity (variability among intensity states).We treated these as random effects and used a logistic multilevel regression model.In this study we report conditional R 2 measure (the overall effect size) and random effects variance (ID, Pitch and ID, Intensity).

Results
Baseline mean for Pitch was 179.57and for Intensity 61.24.The number of observations for Pitch was 115,097 and the number of observations for Intensity was 274,171.The ID variance for the first multilevel model was 3.061e+14, and Pitch variance was 2.146e+13, which means that respondent variance was far larger than Pitch variance (individual differences between respondents were larger than differences among pitch states).R 2 for the first model was 0.15.The ID variance for the second multilevel model was 2.080e+13 and the Intensity variance was 9.567e+12.R 2 for the second model was 0.10.All Intensity states had lower means than the baseline mean.Intensity for self-criticism and self-protection was significantly higher than for self-compassion.See Figures 1, 2 for details of the vocal expressions -pitch and intensity.

Discussion
The aim of this study was to examine the pitch and intensity features of self-compassion, self-criticism and self-protection in real EFT-therapy sessions.The first hypothesis was that clients expressing self-criticism would show an increase in pitch and intensity in comparison to self-compassion.Our results confirmed higher values for intensity of self-criticism compared to self-compassion, but not for higher pitch.Within the literature self-criticism is acknowledged  (Kamiloğlu et al., 2020).Thus, the values are in line with previous research identifying higher intensity values for anger (Banse and Scherer, 1996), and not for compassion (Kamiloğlu et al., 2020), which is a low aroused emotion.As a high aroused emotion anger is characterized through high levels of intensity (Banse and Scherer, 1996;Sobin and Alpert, 1999;Rochman et al., 2008;Kapoor and Sagar Verma, 2019).This is in agreement with previous findings characterizing physiological arousal with higher levels of pitch and intensity (Banse and Scherer, 1996).Thus, when clients start criticizing themselves, they become emotionally activated and their voices get louder.It is probable that if there had been more participants in the analysis, significance would also have been yielded for pitch, not just intensity, of self-criticism.However, this outcome is not in line with our prediction about pitch.Self-criticism shows lower intensity values than the baseline (emotional neutral speech).Overall, the results indicate lower values for all states in comparison to the baseline.One reason for this could be the technical environment.Our voice samples were extracted from videos that did not meet laboratory conditions.As a result, the microphone setting could not be adjusted to obtain a suitable speech recording, which had that been possible may have led to different values than assumed.However, though scholars agree that activated emotions such as reactive anger have higher intensity levels (e.g., Banse and Scherer, 1996;Sobin and Alpert, 1999), clients do not appear to express their anger as loudly as actors do when asked to elicit emotions (e.g., Banse and Scherer, 1996).According to Sobin and Alpert (1999) emotions expressed by actors do not fully capture the complexity of emotions.As a consequence, it is unclear to what extent actors are able to represent emotions as they would naturally be expressed by human beings.In this study, our results demonstrate that the clients did not become more activated but used emotionally neutral speech in the therapy session.Nevertheless, there were differences between the states.In line with our second prediction, self-compassion had lower intensity values than both self-protection and self-criticism, but this did not apply to pitch.Self-protection had higher intensity values than self-compassion and lower intensity values than self-criticism.However, only the difference between selfcompassion and self-protection is significant.This outcome fits the definition that self-protection is a form of healthy, protective anger (Pascual-Leone and Paivio, 2013; Pascual-Leone, 2017), characterized through firmness and strength (Timulak, 2015).So far there has been no study on the vocal expression of self-protection.Our results suggest that when clients feel powerful their voices increase strongly and the intensity rises.The results on the pitch values are not in line with our assumptions.However, the findings are not significant enough to allow accurate interpretation.From our findings we can conclude that emotional neutral speech has lower pitch values than Frontiers in Psychology 07 frontiersin.orgself-compassion, self-protection and self-criticism.The findings for self-protection and self-criticism are in accordance with previous research showing higher pitch values for higher aroused emotions such as anger (Razak et al., 2003;Yildirim et al., 2004).Selfcompassion is a low aroused emotion and earlier studies report lower values for it than for emotional neutral speech (Razak et al., 2003;Yildirim et al., 2004).But we cannot validate this in our paper.Pitch values can differ between individuals for various reasons.All our clients were female and were speaking English in the sessions, but pitch values may also depend on sexual orientation (Waksler, 2001), first language (Andreeva et al., 2014) or cultural background for instance (Scherer et al., 2011).As mentioned above (see procedure), part of the problem in demonstrating significant differences between self-compassion, self-protection and self-criticism is the challenge of measuring pitch within continuous speech in natural settings.
Another important factor that can lead to a poor outcome is the low number of voice samples analyzed.Moreover, the clients in the selected audios represent different kinds of problematic behaviors.As Rochman et al. (2008) point out it is important to note that different psychological disorders, such as depression or anxiety, can influence the way emotions are vocally expressed.It is worth mentioning that, so far, the majority of the research has focused on the investigation of intensity and pitch values in acted laboratory settings.This paper is the first to examine these features in spontaneous, continuous speech in an EFT therapy session.

Limitations
Our research limitations are as follows.First, it is important to point out that our audio material was not of a high recording quality.Therefore, we did not have the required voice quality for the Praat feature analysis, although the background noises were eliminated.Second, the segments for self-compassion, self-protection and selfcriticism were trimmed from different points in the therapy.According to Nardone et al. (2021) there is a significant connection between the temporal sequence of expressed emotional arousal, the vocalization of unmet needs and treatment outcome.Similarly, Rochman et al. (2008) demonstrated the sequence of expressing emotions has an impact on the acoustic articulation of the emotion.As we extracted the analyzed sequences throughout the sessions, we could not ensure how the timing might have had an influence on the pitch and intensity values.Furthermore, as not all clients were expressing self-compassion, selfprotection and self-criticism we had a different number of clients for each state.Thus, our material was not consistent.In addition, two clients were analyzed twice, as they appeared in multiple therapy sessions.This, along with the fact that the length of the audios varied owing to the natural setting, meant we had no control over how long the clients expressed each state in the session.Also, although our aim was to examine pitch and intensity features in real therapy sessions, the sessions were studio recorded so we could not ensure that the setting was completely natural.Lastly, it is important to mention the low number of clients voice extracts and the fact that all the clients were female and were speaking English.Our findings therefore tell us little about the pitch and intensity of these states in men or other languages.

Further research
Future research should be conducted in more appropriate technical conditions.The microphone needs to be set up so that it consistently captures the voice, and a quiet setting should be used.In order to ensure that emotionally neutral speech is used as the baseline we suggest extracting clients voices from a natural conversation before the therapy session.We also recommend analyzing the same number of voice extracts for each state and, insofar as is possible in natural settings, making sure that all three states are expressed by each client.It goes without saying that these things are of course challenging to achieve in real therapy sessions.A larger number of clients voice samples would also yield a higher number of values, which could lead to a more differentiating outcome.Future studies could investigate additional acoustic features such as speech rate, harmonicity and perturbations, and could examine dynamic changes in the features throughout the therapy session.Finally, researchers in this field could examine gender, culture and language differences in self-compassion, selfprotection and self-criticism in real EFT therapy sessions.

Conclusion
Our research analyzing the pitch and intensity of self-compassion, self-protection and self-criticism in EFT therapy videos is a first step toward investigating acoustic features of clients in real psychotherapy sessions.While earlier research has focused on the analysis of vowels or sustained vocals, we examined the pitch and intensity of selfcompassion, self-protection and self-criticism in running speech using Praat computer software.Our results demonstrate that whereas selfcriticism had the highest intensity values, self-compassion had the lowest among the three states.Our findings on pitch were not significant.
to be a form of reactive anger (Pascual-Leone et al., 2013) and selfcompassion a positive, low aroused emotion

FIGURE 2
FIGURE 2Intensity during self-criticism, self-compassion and self-protection.