Mismatch Negativity to Threatening Voices Associated with Positive Symptoms in Schizophrenia

Although the general consensus holds that emotional perception is impaired in patients with schizophrenia, the extent to which neural processing of emotional voices is altered in schizophrenia remains to be determined. This study enrolled 30 patients with chronic schizophrenia and 30 controls and measured their mismatch negativity (MMN), a component of auditory event-related potentials (ERP). In a passive oddball paradigm, happily or angrily spoken deviant syllables dada were randomly presented within a train of emotionally neutral standard syllables. Results showed that MMN in response to angry syllables and angry-derived non-vocal sounds was significantly decreased in individuals with schizophrenia. P3a to angry syllables showed stronger amplitudes but longer latencies. Weaker MMN amplitudes were associated with more positive symptoms of schizophrenia. Receiver operator characteristic analysis revealed that angry MMN, angry-derived MMN, and angry P3a could help predict whether someone had received a clinical diagnosis of schizophrenia. The findings suggested general impairments of voice perception and acoustic discrimination in patients with chronic schizophrenia. The emotional salience processing of voices showed an atypical fashion at the preattentive level, being associated with positive symptoms in schizophrenia.


INTRODUCTION
Schizophrenia, a chronic and disabling brain disorder, has three categories of symptoms: positive, negative, and cognitive symptoms. Hearing voices is the most common type of hallucination associated with positive symptoms. Deficits in the ability to recognize emotions from vocal expressions are treatment resistant and associated with poor outcomes (Bach et al., 2009;Leitman et al., 2010Leitman et al., , 2011. To advance our understandings of the relationship between the symptoms of schizophrenia and the perception of emotional voices, this study, through the neurophysiological approach, clarified whether emotional voice processing is impaired per se, and further, associated with sensory dysfunction or attention abnormalities. The extent to which basic auditory processing contributes to impaired voice perception in schizophrenia is unclear. Some studies reported that deficits of emotional prosodic identification in individuals with schizophrenia reflect, at least in part, a relative inability to process the acoustic characteristics of prosodic stimuli (Leitman et al., 2005(Leitman et al., , 2010(Leitman et al., , 2011. They have argued that schizophrenia is associated with structural and functional disturbances at the primary auditory cortex (Leitman et al., 2007). However, other studies found that individuals with schizophrenia had more difficulties at emotional prosody comprehension than controls, but equivalently proficient at stress prosody comprehension (Murphy and Cutting, 1990). Their performance was worse at identifying high-clarity emotional prosodic stimuli, but not at identifying low-clarity stimuli (Bach et al., 2009). Individuals with schizophrenia relative to healthy controls showed comparable performance for discriminating among terminal pitch changes, but more difficulties for internal pitch discrimination (Matsumoto et al., 2006).
Mismatch negativity (MMN) and P3a are event-related potentials (ERPs) that can be elicited by a passive oddball paradigm. MMN and P3a have been used as neurophysiological biomarkers in schizophrenia research (Javitt et al., 2008;Javitt and Sweet, 2015). MMN reflects a preattentive stage of auditory information processing. For MMN generation, oddball stimuli may differ from standards based on a number of physical dimensions, including sensory modality, frequency, duration, or intensity (Näätänen et al., 2007). Primary generators for MMN are located in the primary auditory cortex (Alho, 1995;Maess et al., 2007). Through a meta-analysis, deficits in MMN generation were suggested to be a robust feature in chronic schizophrenia, indicating abnormalities in automatic context-dependent auditory information processing in these patients (Umbricht and Krljes, 2005). MMN reduction was associated with global impairments in everyday functioning in schizophrenia patients (Light and Braff, 2005). MMN appeared to be reduced, even at illness onset (Salisbury et al., 2002(Salisbury et al., , 2007Umbricht et al., 2006;Jahshan et al., 2012). In addition, P3a is an ERP-index of an involuntary attention switch (Escera et al., 2000). Auditory P3a is the earliest ERP abnormality to be studied in schizophrenia (Roth and Cannon, 1972). P3a was reduced in patients with chronic schizophrenia (Mathalon et al., 2000;Jeon and Polich, 2003). P3a might serve as a risk or trait marker of the genetic risk of schizophrenia (Winterer, 2000;Hall et al., 2006).
Until recently, emotional MMN and P3a were not utilized to assess the automaticity and involuntary attention of emotional salience processing of voices, respectively (Schirmer et al., 2005). The unexpected presence of emotionally spoken syllables embedded in a passive oddball paradigm can trigger emotional MMN and P3a. Particularly, emotional mismatch response, an infant analog of the adult emotional MMN, was identified in newborns, reflecting the emergence of emotional arousal during the first days of life (Cheng et al., 2012). Females exhibited stronger emotional MMN and P3a than did males, inferring the sex hormone-mediated processing of emotional voices (Hung and Cheng, 2014). Testosterone administrations could alter emotional MMN and P3a, lending support to the involvement of amygdala in the generator sources (Chen et al., 2015). These findings support the notion that emotional MMN and P3a can probe emotional voice processing. In the same vain, emotional MMN and P3a were reduced in individuals with autism spectrum conditions and lower angry MMN amplitudes were associated with higher levels of autistic traits (Fan and Cheng, 2014). However, to the best of knowledge, emotional MMN and P3a have been examined in individuals with schizophrenia.
To understand the extent to which basic auditory processing contributes to impaired emotional salience processing of voices, we presented the emotionally spoken meaningless syllables dada, and acoustically matched non-vocal sounds in a passive oddball paradigm to individuals with chronic schizophrenia and matched controls. It is worth to mention that the disrupted activity in amygdala might lead to abnormal assignment of salience to ambiguous, potentially threatening stimuli, such as angry voices, in patients with schizophrenia, particularly in those with positive symptoms (Holt and Philops, 2009). One neuroimaging study demonstrated that the amygdala was activated by using a passive oddball paradigm on angry syllables (Schirmer et al., 2008). It is thus hypothesized that, if general deficits in auditory processing existed, then patients with schizophrenia would exhibit altered MMN and P3a responses to angry and happy syllables and corresponding non-vocal sounds. If the deficit were selective for threatening voices, then individuals with schizophrenia would elicit distinct MMN and P3a to angry syllables from controls. In addition, to further explore the relationship between neurophysiological responses and symptom severity, we conducted correlation analyses to test the extent to which emotional MMN and P3a covaried with the Positive and Negative Syndrome Scale (PANSS).

Subjects
Thirty schizophrenia patients and thirty controls were enrolled. Individuals with schizophrenia were recruited from local hospital. Using the Structured Clinical Interview from the Diagnostic and Statistical Manual of Mental Disorders Fourth Revised Patient Edition, psychiatrists reconfirmed that the illness was in a non-acute and stable phase. All subjects were ethnic Chinese. The age-and handedness-matched controls were recruited from local community and screened for major psychiatric illnesses by using the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I). Table 1 lists demographics and clinical variables. Subjects with comorbid psychiatry or neurological disorders (e.g., dementia or seizures), a history of Abbreviations: yrs, years; PANSS, the Positive and Negative Syndrome Scale (Kay et al., 1987;Phillips et al., 1991).  head injury, alcohol or substance abuse or dependence were excluded. All of the participants exhibited normal peripheral hearing bilaterally (pure tone average thresholds<15 dB HL) at the time of testing. For the handedness, medications, and does for each patient, please refer to Table 2. All subjects provided written informed consent and assent for the study, which was approved by local ethics committee (Yang-Ming University Hospital) and conducted in accordance with the Declaration of Helsinki.

Auditory Stimuli
The stimuli have two categories: emotional syllables and acoustically matched non-vocal sounds. For emotional syllables, a young female speaker from a performing arts school produced the meaningless syllable dada with three sets of emotional (angry, happy, neutral) prosodies. Within each set of emotional syllables, the speaker produced the syllables for more than ten times. Emotional syllables were edited to become equally long (550 ms) and loud (min: 57 dB; max: 62 dB; mean 59 dB) using Cool Edit Pro 2.0 and Sound Forge 9.0. Each syllable set was rated for emotionality on a 5-point Likert-scale (see Cheng et al., 2012;Fan et al., 2013;Fan and Cheng, 2014 for validation). Two emotional syllables that were consistently identified as 'extremely angry' and 'extremely happy' and one neutral syllables rated as the most emotionless were selected as the stimuli. The ratings on the Likert-scale (mean ± SD) were 4.26 ± 0.85, 4.04 ± 0.91, and 2.47 ± 0.87 for angry, happy, and neutral syllables, respectively.
To create a set of control stimuli that retain acoustical correspondence with emotional syllables, we synthesized nonvocal sounds by using Praat (Boersma, 2001) and MATLAB (The MathWorks, Inc., USA). The central gravity of frequency (fn) of each original syllable was defined as , where X(f) was the Fourier spectrum of emotional syllables. The fn of angry, happy, and neutral syllables was 1249 Hz, 1159 Hz, and 1156 Hz, respectively. We then produced non-vocal sounds by multiplying the sine waveform with two Hamming windows that were temporarily centered at each of the syllable [non-vocal sounds = fn(t) × Hamming window(t)]. This way has been used to synthesize non-vocal sounds for controlling the temporal envelope and core spectral element of emotional syllables (Fan et al., 2013;Chen et al., 2014;Hung and Cheng, 2014). The time-course and frequency spectrum of emotional syllables and corresponding non-vocal sounds are illustrated in Figure 1. In addition, non-vocal sounds had comparable emotionality ratings on the Likert-scale (2.47 ± 0.87) with neutral syllables (P > 0.1) as well as below-chance hits on the emotional categorization task (Chen et al., 2014), indicating emotional neutrality of acoustic controls.

EEG Apparatus, Procedures, Recording, and Data Analysis
Before EEG recordings, psychiatrists administered the PANSS (Kay et al., 1987;Phillips et al., 1991) to evaluate the symptom severity of schizophrenia. The EEG recording was conducted in a sound-attenuated and electrically shielded room. During EEG recording, participants were required to watch a silent movie with subtitles while task-irrelevant stimuli in oddball sequences were presented. Particularly, instead of presenting physically identical stimuli as both of standards and deviants (Schirmer et al., 2007), we applied the same theorem as previous work for controlling the mismatch paradigm (Čeponienë et al., 2003;Chen et al., 2014). The passive oddball paradigm for emotional syllables employed happy and angry syllables as deviants (D1, D1) and neutral syllables as standards (S). Their correspondingly non-vocal sounds were applied in the same oddball paradigm, but were presented as separate blocks so that relative acoustic features among S, D1, and D2 were controlled across blocks. Each stimulus category (emotional syllables vs. non-vocal sounds) comprised two blocks, the order of which was counter-balanced and randomized across participants. Each block consisted of 600 trials, of which 80% were neutral syllables or neutral-derived sounds, 10% were angry syllables or angry-derived sounds, and 10% were happy syllables or happy-derived sounds. The sequences of blocks and stimuli were quasirandomized to avoid successive blocks and successive deviants from identical stimulus categories. A minimum of two standards was always presented between any two deviants. The stimulus-onset-asynchrony was 1200 ms, including a stimulus length of 550 ms and an interstimulus interval of 650 ms.
The MMN and P3a amplitudes were analyzed as an average within a 50-ms window surrounding the peak at selected electrode sites. Based on prior literature (Näätänen et al., 2007(Näätänen et al., , 2011, the MMN peak was defined as the largest negativity after subtracting the standard ERP from the deviant ERP during a period of 150 to 350 ms after stimulus onset. Only the standards before the deviants were included in the analysis. The P3a peak was defined as the largest positivity within the period of 250 to 450 ms. Three-way mixed ANOVAs were separately performed on MMN and P3a for each category (emotional syllables or nonvocal sounds) with deviant type (happy or angry) and electrode site (F3, Fz, F4, C3, Cz, or C4) as the within-subject factors and group (schizophrenia or control) as the between-subject factor. The dependent variables were the mean amplitudes and peak latencies of the MMN and P3a components at the selected electrode sites. Degrees of freedom were corrected using the Greenhouse-Geisser method while sphericity had been violated. A Bonferroni-corrected t-test was only conducted when preceded by significant main effects. Spearman's correlation analysis was conducted between emotional MMN or P3a and the PANSS subscales.
Mismatch negativity peak latencies to emotional syllables and non-vocal sounds did not reveal any effect involving the group factor. It indicated that individuals with schizophrenia did not differ from controls in term of the speed of preattentive processing.

Correlation between Emotional MMN or P3a and Symptom Severity
There were significant correlations between angry MMN amplitudes and positive symptoms ( Table 5). The Holm-Bonferroni step-down procedure was conducted to control the family wise error rate (FWER, p < 0.05) for multiple comparisons. Spearman's correlation analyses on the PANSS subscales indicated that angry MMN amplitudes at C3 were negatively correlated with positive symptoms (ρ = −0.52, p = 0.003) (Figure 3). Such correlation was not observed either in the negative symptoms or general psychopathology scores. P3a was not correlated with the PANSS. In addition, neither MMN nor P3a exhibited any age-related correlation.

Relationship between Sensitivity and Specificity for Angry MMN
Receiver operating characteristic (ROC) analyses was conducted to measure the ability of emotional and non-vocal MMN amplitudes to differentiate between schizophrenia and control individuals (Figure 4). The area under the ROC curve (AUC) is indicative of the overall accuracy of the measure, representing the probability that a randomly selected true-positive individual scored higher on the measure than a randomly selected truenegative individual while 50% was chance level. Receiver operating characteristic analysis for angry MMN resulted in AUC values of 0.65 (p = 0.049) over frontal electrodes. The most appropriate cut-off point for angry MMN with sensitivity of 70% and specificity of 70% was −1.89 µV.
The AUC values for angry-derived MMN and angry P3a were 0.70 (p = 0.007) and 0.66 (p = 0.037). This indicated that angry and angry-derived MMN as well as angry P3a could help predict whether someone had received a clinical diagnosis of schizophrenia or not.

DISCUSSION
This study aims to clarify the extent to which basic auditory processing contributes to impaired emotional prosodic detection in schizophrenia. The results indicated that abnormal assignment of salience to threatening voices could help predict positive symptoms in schizophrenia. MMN, indexing preattentive detection of emotional salience of voices, was significantly reduced to angry syllables and angry-derived non-vocal sounds in schizophrenia. P3a, an index for selective attention control, showed greater amplitudes but longer latencies to angry syllables in schizophrenia. Weaker MMN amplitudes were associated with more positive symptoms of PANSS. ROC analyses suggested that angry MMN and P3a could predict whether someone had received a clinical diagnosis of schizophrenia or not.
Mismatch negativity amplitudes decreased for angry syllables and angry-derived non-vocal sounds in chronic schizophrenia. This finding might support the proposal that basic auditory processing abnormalities contribute to affective prosody dysfunction in schizophrenia (Leitman et al., 2005(Leitman et al., , 2007(Leitman et al., , 2010(Leitman et al., , 2011. Similarly, affective prosody recognition and MMN amplitudes elicited by infrequent high-pitched tones in the oddball paradigm were significantly associated (Jahshan et al., 2013). The emotional-derived non-vocal sounds in this study may partially reflect analog frequency (pitch) changes in pure tones. Studies on schizophrenia patients have reported decreased MMN in response to pitch deviants (e.g., Javitt et al., 1993Javitt et al., , 1998 Catts et al., 1995). As indicated by reduced MMN in response to angry syllables and angry-derived non-vocal sounds, this study demonstrates that people with chronic schizophrenia process emotional voices in an atypical fashion at the preattentive level. Emotional voice processing abnormalities might be partially driven by impaired processing of low-level acoustic parameters. Angry P3a, an index for selective attention control, had longer latencies in schizophrenia patients than in controls. Despite general consensus that P3a indexes attention switching to novel stimuli associated with psychopathology, the findings on increases or decreases of P3a amplitudes in psychotic patients are mixed (Javitt et al., 2008). Some reports have stated that patients at risk for schizophrenia exhibit weaker P3a (Mathalon et al., 2000;Jeon and Polich, 2003), whereas another found that stronger P3a was associated with an increased risk (Winterer, 2000). Atypical P3a might reflect pathological distractibility in chronic psychiatric patients (Escera et al., 2000). Emotional voices usually attract involuntary attention (Grandjean et al., 2008). Disturbed reciprocal fronto-limbic pathways might impair prefrontal dominance for controlling the hyperactive limbic system, resulting in failure to inhibit irrelevant information (Weinberger, 1987). Schizophrenia patients with auditory hallucination symptoms find it more difficult to control their selective attention, particularly in the presence of emotional distracters (Alba-Ferrara et al., 2013). In this study, the presence of P3a differentiation between angry and happy syllables along with the absence to differentiate angry-derived from happy-derived non-vocal P3a among schizophrenia patients could be ascribed to an imbalance of involuntary attention switching between emotional voices and acoustic attributes. Consistent with P3 being quantitative phenotypes (Winterer et al., 2003), ROC analyses indicated that angry P3a could help predict whether someone had received a clinical diagnosis of schizophrenia. Positive symptoms coupled with angry MMN amplitudes within schizophrenia patients support the hypothesis that prosodic dysfunction may mediate the misattribution of auditory hallucination (David, 1994). Hearing voices is the most common type of hallucination associated with positive schizophrenia symptoms. Deficits of emotional prosodic perceptions were proposed as critical contributors to the formation of auditory hallucinations (Leitman et al., 2005;Rossell and Boundy, 2005). Patients experiencing auditory hallucinations were not as successful at recognizing prosodic cues as the non-hallucinating patients (Shea et al., 2007). Hallucinating patients exhibited reduced activations in the amygdala and insula when hearing crying sounds (Kang et al., 2009). Sensory gating deficits reflect the inability to filter out extraneous noise from meaningful sensory inputs (Freedman et al., 1987). They cause a cascade failure, rendering the malfunctioned limbic system unable to detect the emotional salience of incoming stimuli (Anticevic et al., 2012).
Some limitation of this study must be acknowledged. First, regarding sample homogeneity, the generalizability of the results may be limited because people with acute schizophrenia were not included. Second, the MMN recording here may not be state of the art. Unlike to the use of the stimuli as both of standards and deviants for controlling the mismatch paradigm (Schirmer et al., 2007), the MMN effect in this study may be potentially driven by physical stimulus characteristics. However, based on the same theorems as previous work (Čeponienë et al., 2003), we have conducted a series of studies to verify emotional and non-vocal MMN in the strict sense of disentangling emotional salience from physical properties (Cheng et al., 2012;Hung et al., 2013;Fan and Cheng, 2014;Chen et al., 2015). This may not be the optimal design, and future studies are warranted with a larger sample size, in which people with acute schizophrenia are recruited and stimuli with greater acoustic correspondence are included.
This study demonstrates that patients with chronic schizophrenia exhibited reduced MMN responses to both of angry syllables and non-vocal sounds, indicating general impairments of voice perception and acoustic discrimination. The atypical processing of emotional salience at the preattentive level might be partially driven by impaired processing of lowlevel acoustic parameters. The failure to tune their attention to contextually irrelevant stimuli of emotional voices could be ascribed to pathological distractibility. In particular, the MMN amplitudes to emotional voices predicted the severity of positive symptoms. These findings could provide evidence for bottomup (i.e., perceptually based) cognitive remediation approaches, and indicate that emotional MMN and P3a can be potential neurophysiological endophenotypes of schizophrenia (Turetsky et al., 2007;Javitt and Sweet, 2015).

AUTHOR CONTRIBUTIONS
CC, P-YW, and YC took part in designing the study. CC and C-CL undertook the statistical analysis. C-CL and YC managed the literature search and wrote the first draft of the manuscript. All authors have contributed to and approved the manuscript.