Asymmetric Influence of Vocalic Context on Mandarin Sibilants: Evidence From ERP Studies

In the present study, we examine the interactive effect of vowels on Mandarin fricative sibilants using a passive oddball paradigm to determine whether the HEIGHT features of vowels can spread on the surface and influence preceding consonants with unspecified features. The stimuli are two pairs of Mandarin words ([sa] ∼ [ʂa] and [su] ∼ [ʂu]) contrasting in vowel HEIGHT ([LOW] vs. [HIGH]). Each word in the same pair was presented both as standard and deviant, resulting in four conditions (/standard/[deviant]: /sa/[ʂa] ∼ /ʂa/[sa] and /su/[ʂu] ∼ /ʂu/[su]). In line with the Featurally Underspecified Lexicon (FUL) model, asymmetric patterns of processing were found in the [su] ∼ [ʂu] word pair where both the MMN (mismatch negativity) and LDN (late discriminative negativity) components were more negative in /su/[ʂu] (mismatch) than in /ʂu/[su] (no mismatch), suggesting the spreading of the feature [HIGH] from the vowel [u] to [ʂ] on the surface. In the [sa] ∼ [ʂa] pair, however, symmetric negativities (for both MMN and LDN) were observed as there is no conflict between the surface feature [LOW] from [a] to [ʂ] and the underlying specified feature [LOW] of [s]. These results confirm that not all features are fully specified in the mental lexicon: features of vowels can spread on the surface and influence surrounding unspecified segments.


INTRODUCTION
To comprehend spoken language, listeners need to decode the incoming speech stream and segment it into units which map onto the phonological representations of words. However, the incoming acoustic cues for consonants and vowels can vary quite substantially due to factors such as context, speaking rate, and speaker characteristics. Nevertheless, mature listeners rarely experience any difficulty in recognizing spoken words and inferring the intended message (Marslen-Wilson, 1984;Norris et al., 1995;Lahiri andReetz, 2002, 2010).
The speech signal varies in different contexts where the realization of a particular sound can differ within and across individual words (cf. Holt and Kluender, 2000). Furthermore, contextual modifications (contiguous sounds affecting each other such as vowels affecting consonants, consonants affecting other consonants, etc.) can alter the pronunciation of a sound quite drastically. A familiar example is that of place assimilation where the underlined medial sequences [ng] in greengage or [np] gunpoint are habitually articulated as [ g] and [mp] respectively. Here, the place of articulation of the [CORONAL] nasal [n] is affected by that of the following consonant, transforming it into a [DORSAL] [ ] or [LABIAL] [m] nasal. Vowels can also affect consonants as is seen in word pairs such as face∼ facial or commerce∼ commercial, where the final sound [s] of the first word of each pair becomes [S] in the context of the vowel [i] when suffixed with -ial [i@l]. Here the [i] is no longer pronounced; however, in other contexts, such as in dictator∼ dictatorial, the vowel [i] does not change. In this paper, we investigate brain responses to variability in sound sequences where vowels alter neighboring consonants.
The effect of one sound on another tends not to be symmetric. For example, in the example given above (greengage and gunpoint), the assimilation of the place of articulation is asymmetric. Although [CORONAL] [n] can change to [m] and [ ], the reverse is usually not the case: a [DORSAL] nasal, as in the sequence [ad] kingdom does not become * [nd] nor does the [LABIAL] nasal in sometime change to * [nt]. Thus, [CORONAL] consonants such as [n] can assimilate easily to the place of articulation of the following [LABIAL] (e.g., [p], [b]) or [DORSAL] consonants (e.g., [k], [g]) but not vice versa (cf. Cornell et al., 2013). One approach to capture this asymmetry is to assume that not all features or properties of consonants and vowels are fully specified in the lexicon (cf. the Featurally Underspecified Lexicon (FUL) model; Lahiri and Reetz, 2010;Scharinger et al., 2012;de Jonge and Boersma, 2015;Schluter et al., 2016;Højlund et al., 2019;Kotzor et al., 2020). In this model, consonants and vowels are defined by PLACE OF ARTICULATION which include ARTICULATOR features such as [CORONAL], [DORSAL], [LABIAL], and HEIGHT features [HIGH] and [LOW]. Of these, [CORONAL] is assumed to be universally underspecified (see Figure 1).
Since each word has a unique phonological representation, the features extracted from the acoustic signal are used to map speech onto underlying representations. Listeners process the variable speech signal and parse it into features which are then directly mapped onto the lexicon (Lahiri and Reetz, 2010;Kotzor et al., 2020). This mapping from the features in the signal to the lexicon is based on a ternary logic: match, mismatch, and nomismatch. The first two options are transparent: match equates to the feature from the signal matching the lexicon completely while mismatch occurs when there is a conflict. Thus, the feature [CORONAL] from the acoustic signal of [n], for instance, will mismatch with the lexically represented feature [LABIAL] of [m]. The no-mismatch condition suggests a level of tolerance and is particularly important for underspecified features such as [CORONAL]. Consequently, [LABIAL] extracted from the signal of [m] will be in a no-mismatch relationship with [n] since its place feature [CORONAL] is not specified. Thus, during speech processing, all words in the lexicon with matching and no mismatching features are activated, but when mismatching features are encountered, words are deactivated.
There has been considerable evidence from both behavioral and neurophysiological studies for the underspecification of [CORONAL] place of articulation (Lahiri andReetz, 2002, 2010;Eulitz and Lahiri, 2004;Cornell et al., 2011). For instance, the mismatch negativity (MMN) component has been used widely as a robust measure to examine [CORONAL] underspecification Cornell et al., 2011). The MMN component, which usually peaks at 100-250 ms after the onset of a stimulus, signals the automatic or pre-attentive detection of an infrequent change in regular auditory stimulations (Näätänen et al., 2007). The MMN can be elicited by the deviant that violates the representation of repetitive standards before the occurrence of that deviant, suggesting that the sensory memory trace of preceding stimuli is compared against incoming sounds (Näätänen and Winkler, 1999;Horváth et al., 2008). The amplitude or latency of the MMN component depends on the magnitude of the stimulus deviation, with larger deviance resulting in an increase in amplitude and shorter latencies (Näätänen et al., 2007). In MMN studies examining coronal underspecification (e.g., Eulitz and Lahiri, 2004;Roberts et al., 2014) ). Here, the representation activated by the repeated processing of standard stimuli is from a long-term memory trace, and associated to the underlying representation in the mental lexicon. In contrast, the sound percept elicited by the deviant stimulus corresponds to the surface representation, which is formed by the phonological features extracted from the acoustic signal Cornell et al., 2011). The change detection response reflects the contrast between the underlying and surface representations.
Comparing both ARTICULATOR and TONGUE HEIGHT features, Kotzor et al. (2020) examined asymmetric ARTICULATOR features as well as symmetric HEIGHT features in vowels in words and non-words (  [aen]. This assimilation also has processing consequences (Lahiri and Marslen-Wilson, 1991). In English, this is purely allophonic, which means that the nasalization is entirely predictable and there is no real phonemic contrast between oral and nasal vowels; e.g., cad  Table 2). At first glance, the pairs appear to be straightforward; however, the underlying phonological representation of the features for these pairs depends not only on the phonemes but on the general phonological inventory of Mandarin.
The phonological feature specifications within a language are determined by the number of contrastive segments. In Mandarin, there are two sets of [CORONAL] obstruents: dental [t, t h , , h , s] and retroflex [tù, tù h , ù]. There are fewer retroflex consonants than dentals in Mandarin: Duanmu (2007) states that the retroflex series is a "major characteristic of Standard Chinese (SC) speakers from Beijing" (p. 24) and that speakers of other Chinese dialects replace the retroflex with the dental; e.g., there would be no distinctions between [sa] "sprinkle" and [ùa] "stupid." To distinguish between the two types of [CORONAL] obstruents, our feature system uses the HEIGHT features [HIGH] and [LOW] (cf. Lahiri, 2018). Based on their acoustic characteristics, the retroflex consonants would be characterized as [HIGH] and the dentals as [LOW]: dental sibilants have more energy in the higher frequencies compared to retroflexes and palatals (Stevens and Blumstein, 1975;Lahiri and Reetz, 1999;Lahiri and Kennard, 2019;Kennard and Lahiri, 2020). However, Mandarin only has a two-way contrast in the voiceless sibilant fricatives [s] and [ù]; thus, it is only necessary to lexically specify one of these phonemes; the other remains unspecified. Since there are more dental consonants than retroflexes and since the dentals are less likely to vary in comparison to the retroflexes, the dentals would be more likely to be specified for HEIGHT in the lexicon.
Further evidence of the specification of the HEIGHT feature is provided by the co-occurrence restriction that certain adjacent identical elements are prohibited in consonant-glide sequences (Yip, 1988;Wiese, 1997;Duanmu, 2007). As for vowels, descriptively Mandarin allows five basic vowels where [i u y] are high vowels, [@] is a mid vowel and [a] is usually characterized as a low vowel. In terms of features, the mid vowel is underspecified while the high and the low vowels are specified. Mandarin syllables can only have single consonants as onsets and codas and no clusters are permitted. Thus, since all initial consonants followed by high vowels /i u y/ attain a secondary articulation described as glide spreading, /CuC/ becomes [C w uC] where the [C g ] holds a single position in the onset. As we can see, [s] can occur with both high and non-high vowels, such as /suu 4/ [s w u ] "send" or /su@n1/ [s w @n] "grandchild, " where the glide formation rule turns the vowel [u] into a glide leading to a secondary articulation [s w ]. Since [s] is specified as [LOW] and the glides are high, the secondary articulation is allowed. Had/ ù/ been specified for [HIGH], the sequence /ùuu/ > [ù w u] "to lose" would not have been permitted because of identical height features ( Table 2). Crucially, the feature [HIGH] is not specified in the language for any of the consonants, thus allowing them to take on the secondary articulations triggered by high vowels. We will examine the two sibilants [s] and [ù], which are differentiated in HEIGHT, in combination with two vowels also differing in HEIGHT: [u] and [a].
As we mentioned above, not only do ARTICULATOR features such as [LABIAL] or [DORSAL] spread leading to assimilation in words such as greengage, but TONGUE HEIGHT features such as [LOW] or [HIGH] can also spread to preceding unspecified segments (Kunisaki and Fujisaki, 1977;Mann and Repp, 1980;Lahiri and Reetz, 2002). In a study by Kunisaki and Fujisaki (1977)  The symbol ↑ represents nomismatch and represents mismatch. Along with MMN, an additional negativity, the late discriminative negativity (LDN), was observed in our study. The LDN is a recently established component found in oddball paradigms and serves as an index of phonological discriminative abilities (Hill et al., 2004;Horváth et al., 2009;Jakoby et al., 2011;David et al., 2020). Similar to the MMN, the LDN is also an automatic response associated with higher cognitive processes and may represent the recruitment of additional cortical resources needed to extract the phonological differences between the standard and deviant stimulus and form phonological representations (Shestakova et al., 2003;Hill et al., 2004;Zachau et al., 2005;Barry et al., 2009). The LDN can be elicited by both speech and non-speech sounds, and its amplitude was found to be related to the difficulty in discriminating the stimuli (Korpilahti et al., 1995(Korpilahti et al., , 2001Schulte-Körne et al., 1998). For example, Yu et al. (2017) compared the processing of Mandarin disyllabic non-words with different inter-stimulus intervals (ISIs) between Mandarinand English-speaking groups. For both groups, robust MMNs to contrasts with either similar or contrastive lexical tones at shorter ISIs were observed. Compared to the English group, a larger LDN was only found for the Mandarin group when processing contrasts at longer ISIs, especially those with similar lexical tone. These results suggest that it is easier to discriminate the acoustic correlates of lexical tone at shorter ISIs. To discriminate words at longer ISIs, language-specific experience is necessary. Following the FUL model, Hestvik and Durvasula (2016) examined the underspecification-driven asymmetry in the processing of the English contrast between /d/, which is underspecified for [VOICE], and /t/, which is specified for the feature [SPREAD], using the oddball paradigm. The LDN component exhibits the same asymmetry as the MMN with a

Methodology
The presented study examines the interactive effect of vowels on fricative sibilants to determine whether the TONGUE HEIGHT features of vowels can spread on the surface and influence unspecified preceding consonants. Coarticulation, which leads to feature spreading, would suggest symmetric MMNs between phonological contrasts, independent of the direction of presentation of the standard and deviant if the features are fully specified in both standards and deviants. In contrast, asymmetric MMNs would be expected between the two directions of presentation (i.e., standard vs. deviant) of phonological contrasts where the HEIGHT feature [HIGH] is unspecified in one of the stimuli.
Since Mandarin also has a tonal contrast, it was necessary to keep the tones consistent across the stimuli. Two monosyllabic word pairs with Tone 1, [sa]∼[ùa] and [su]∼[ùu], were used as the standard and deviant stimuli. Mandarin is a language where one syllable corresponds to one morpheme in most cases, with each syllable being comprised of an optional initial consonant, optional glide, a vowel, and an optional final consonant [n (n) or ng ( )]. We already described the two voiceless sibilant fricatives in Standard Chinese (SC, or Mandarin), represented as the dental/alveolar [s] and retroflex [ù]. Here, the retroflex [ù] in Mandarin is different from the palatoalveolar [S] in English in terms of the consonant position and air flow through the mouth. The palatoalveolar is pronounced with the air flow through the tongue blade and even a portion of the front part of the tongue. For the retroflex, the air flow is more limited to the tongue tip/blade region (Lin, 2007). Here, we follow Duanmu's (2007) position and treat the voiceless fricative sibilants in Mandarin as a two-way contrast: the dental/alveolar [s] and retroflex [ù]. Unlike the two-way contrast between fricative sibilants, the vowels in Mandarin are categorized into a three-way height distinction, including three high vowels [i y u], one mid vowel [@], and one low vowel [a] (Duanmu, 2007). As mentioned above, both [HIGH] and [LOW] should be specified when there is a three-way height difference (Lahiri and Reetz, 2010 [sa] condition, both the PLACE and HEIGHT features are in a no-mismatch relationship with the underlying representations and hence no-mismatching patterns are found. Therefore, symmetric MMNs and LDNs are predicted for these two conditions (as shown in Figure 2).
Our second prediction (2)  If, on the other hand, we assume a phonemic representation with every feature fully specified in all sounds, all variants should mismatch to the same degree, as the spreading from [u] would not alter the specification of [LOW] in [ù]. In such a case, we would expect to see symmetric MMN and LDN responses for both pairs of words, regardless of the direction of presentation (i.e., which is the standard and which the deviant).  testing. All participants had normal or corrected-to-normal vision and self-reported as right-handed (a modified version of the Edinburgh Handedness Inventory was also used to assess handedness, Oldfield, 1971). No history of neurological disorders or hearing deficits was reported. The study was approved by the Central University Research Ethics Committee (CUREC) and written informed consent was acquired from subjects prior to the experiment. They were compensated for their participation.  (Cai and Brysbaert, 2010).

Stimuli
As expected, the spectrogram of the same fricative varies depending on the following vowel (Figure 3) . Since Mandarin has a lexical contrast in tones, it was important to control for this as well. All syllables were chosen to have lexical Tone 1, which is usually described as a high-level tone (Duanmu, 2007). Thus, the pitch is held at a constant level.
Multiple repetitions of four syllables were recorded by a female native speaker of Mandarin in a sound-attenuated recording room using a professional quality USB microphone (Røde NT-USB) at a sampling rate of 44.1 kHz. From these syllables, we generated four naturally sounding stimuli recordings. A representative utterance of each syllable with similar duration was selected. The recordings were extracted and segmented using the speech analysis program Praat (Boersma and Weenink, 2018).

The [a] and [u] vowels in [sa] and [su]
were cross-spliced to the corresponding [ù] consonant in each pair such that the acoustic differences between the stimuli in each pair were minimized to the contrasting consonants. As shown in Figure 3, the vowel portions in each pair were identical.
Across pairs, stimuli were also controlled for duration (Figure 3). In the recordings, the vowel [u] was slightly longer than [a], so some trailing pulses at the end of [u] were removed. Likewise, some initial pulses of noise were removed in [su] and [ùu] because their frication duration was slightly longer than those of [sa] and [ùa]. Such manipulations avoided the parts of formant transitions in order to minimize the distortions of F0 and spectral features. Therefore, all initial consonants were approximately 182 ms, the vowels 328 ms, and the overall duration of a syllable was about 510 ms (as shown in Figure 3). The intensities of all stimuli were equalized in Praat.

Experimental Procedure
Two pairs of words with Tone 1 were presented to participants during the experiment. Each word pair was presented in two conditions; one with a [s] consonant as the deviant and a [ù] consonant as the standard, and one with the direction of presentation reversed (see Table 4). The word pairs will be described respectively as /sa/ [ a] ∼ /ùa/ [sa] and /su/ [ u] ∼ /ùu/ [su] in the paragraphs below (/standard/ [deviant] ).
As a result of this reversed design, four oddball blocks were presented to each participant with the sequence of blocks counterbalanced among the participants. Within each block, the deviant occurred pseudo-randomly among the standards with a probability of 15%. Any two adjacent deviants were separated FIGURE 3 | Oscillograms (above), spectrograms (below, 0-7,000 Hz), and F0 tracks of the four stimuli. All the stimuli are Tone 1 syllables. by at least two standards. A total of 610 stimuli, with ten continuous standard stimuli occurring at the beginning, were presented in each block. To eliminate the influence of a rhythmic pattern established by temporal characteristics of the acoustic stimuli, the ISI between standard and deviant varied randomly between 350 and 650 ms.

EEG Recordings
EEG recordings were made using a Biosemi ActiveTwo amplifier with 64 sintered Ag/AgCl pin electrodes placed in a 10-20 montage, online referenced to the mastoids. EOG activity was measured using four facial electrodes (IO1, IO2, LO1, and LO2). All electrode offsets (in an active-electrode system this is comparable to impedance) were kept below 30 mV and signals were sampled at 2,048 Hz. The audio stimuli were presented through headphones and participants watched a selfselected silent documentary during the experiment. All subjects participated in all four blocks and the order of the four blocks was counterbalanced across subjects. The total duration of the experiment was about 90 min and subjects had a short break between blocks.

Data Analysis
EEG data were analyzed offline using EEGLAB 14.1.2b. All continuous data were digitally-filtered offline in 0.3-30 Hz range using a finite impulse response filter (FIR filter). Bad channels and artifacts were detected and removed automatically by the artifact subspace reconstruction (ASR) method as implemented in the Clean Raw Data plug-in. EEG data were re-referenced to the linked mastoids for all analyses except for mastoid amplitudes.
Using an independent components analysis (ICA, Delorme and Makeig, 2004), ICA components that may represent eye blinking, lateral eye movement, muscle activity, or channel noise were detected and excluded from further analysis. Furthermore, epochs were created from −100 to 800 ms with the time windows from −100 to 0 ms used as a baseline. An additional artificial detection was carried out so that trials were rejected if they exceeded an amplitude of 100 µV. In addition, any participant with an acceptance rate lower than 70% was excluded, which led to the exclusion of three participants from further analysis. Finally, the first ten responses of each block and two standards after each deviant were rejected in the grand average. For the difference waves, a deviant-minus-standard calculation was carried out for each participant and condition; namely, the difference was generated by subtracting the waveform of the stimuli when it was presented as standard in one block from that of the same stimuli when it was presented as deviant in another block.

RESULTS
Based on visual inspection of the grand-average waveform, the amplitudes of MMN and LDN were determined for each participant and condition as the mean amplitude within 140-180 ms and 320-360 ms after the onset of stimuli at Fz. According to previous studies, both the MMN and LDN are typically maximal over fronto-central electrode sites (Näätänen et al., 1992;Jakoby et al., 2011). Thus, the analyses were restricted to twelve frontocentral electrodes (AF3, AFz, AF4, F3, Fz, F4, FC3, FCz, F4, C3, Cz, and C4). For each experiment, repeated ANOVAs with Condition, Vowel, Laterality (left, middle, and right), and Gradient (AF-, F-, FC-, and C-line) as within-subject variables were carried out for mean amplitude and peak latency, respectively. For all analyses, degrees of freedom were adjusted according to the method of Greenhouse-Geisser.

Mismatch Negativity
Repeated ANOVAs were conducted and significant main effects of Condition and Vowel were found, F 1 (1, 17) = 6.99, p 1 = 0.017, η p 2 = 0.29; F 2 (1, 17) = 5.62, p 2 = 0.030, η p 2 = 0.25. However, the interaction between Vowel and Condition was also significant, F(1, 17) = 21.39, p < 0.001, η p 2 = 0.56. Post hoc analyses were conducted and the results showed that for vowel [a], there was no significant difference between the mean amplitude of /ùa/ [sa] and /sa/ [ a] , F(1, 17) = 2.69, p = 1.12, η p 2 = 0.14, indicating nonsignificant difference in MMN amplitudes between the features of the /ùa/ [sa] and /sa/ [ a] word pairs as in both pairs the feature [CORONAL] of the deviant generates a no-mismatch with the underspecified [CORONAL] feature of the standard (as shown in Figure 4) (Figure 4). However, this difference did not reach statistical significance. For word pairs with vowel [u], the amplitude of the MMN response triggered by /su/ [ u] was significantly more negative than that for /ùu/ [su] , F(1, 17) = 33.84, p < 0.001, η p 2 = 0.67. As predicted by the FUL model, the asymmetric HEIGHT pair shows a larger MMN in the mismatch condition, when the HEIGHT feature [HIGH] F(1, 17) = 5.71, p = 0.029, η p 2 = 0.25. For conditions where the initial consonant of the deviants was [ù], the amplitude was more negative when followed by vowel [u] than vowel [a], F(1, 17) = 23.90, p < 0.001, η p 2 = 0.58. To further investigate these patterns of activation in both directions when followed by different vowels, the wave difference between /su/ [ u] and /ùu/ [su] was compared to that between /ùa/ [sa] and /sa/ [ a] within the 140-180 ms time window. The results showed significant differences across all gradients, ps < 0.001, suggesting asymmetric pattern of activation (see Figure 5).

DISCUSSION
The present study was designed to examine the interactive effect of different vowels on fricative sibilants. We compared both the MMN   Our results support the predictions of the FUL model (Lahiri andReetz, 2002, 2010), which proposes that phonological contrasts can either match, mismatch or stand in a no-mismatch relation depending on whether the individual phonological features are fully specified or underspecified in the underlying representation. Previous studies have argued that the influence of vocalic context on fricative sibilants is due to the coarticulation of vowel rounding and consonant place of articulation (Mann and Soli, 1991). However, phonemic coarticulation would predict symmetric MMNs between phonological contrasts, independent of the direction of presentation of the standard and deviant. Thus, only an underspecification account can explain the asymmetry found in our results, as the features of vowels spread on the surface and the unspecification of TONGUE HEIGHT in the consonant [ù] leads to an asymmetric pattern depending on which stimulus is presented as standard and which as deviant (Lahiri andReetz, 2002, 2010 (Groppe et al., 2011 [STRIDENT] are both fully specified and thus conflict equally in both directions. The results support our findings: both unspecified TONGUE HEIGHT and underspecified MANNER features can trigger asymmetric MMNs in different directions when the PLACE feature of the two consonants is kept constant. The difference is that the underspecified MANNER feature itself can trigger asymmetry while unspecified TONGUE HEIGHT feature needs to absorb additional features from surrounding segments. Therefore, different patterns of activation were found when followed by different vowels. However, unlike the underspecification of [CORONAL], the lack of specification of [HEIGHT] is not universally applicable to all languages. It is central to the FUL model that the phonological representation of each segment is feature-based and constrained by universal properties, as well as language specific requirements (Lahiri andReetz, 2002, 2010 (Lahiri and Kennard, 2019;Kennard and Lahiri, 2020). This rule cannot be applied to segments with a three-way contrast, for instance, the Mandarin vowels. Different from two-way contrasts, both features [HIGH] and [LOW] are specified for a three-way contrast. Thus, the feature [MID] does not need to be stored and can be determined as the consequence of a binary distinction between high vs. non-high and low vs. non-low (Scharinger and Lahiri, 2010). Therefore, the results found in our study might not hold in investigations of the spreading of TONGUE HEIGHT features in other languages with a different number of contrastive segments.
Since the initial logic of the experiment was built into the framework of FUL's feature model and assumptions regarding the matching algorithm, we discussed the results in that context. However, aside from the FUL model, there are other models focusing on perception asymmetry, such as the Natural Referent Vowel (NRV) framework Bohn, 2003, 2011) and the Native Language Magnet (NLM) theory (Kuhl, 1991(Kuhl, , 1992(Kuhl, , 1993. In the NRV model, Polka and Bohn suggested that vowel perception is asymmetric with respect to the location of each vowel within a traditional articulatory or F1/F2 acoustic vowel space; namely, a change from a central vowel to a peripheral vowel (e.g., from [y] to [u]) would be much easier to discriminate than the same change in the reverse direction (e.g., from [u] to [y]). Here, the peripheral vowels serve as perceptual reference for listeners to discriminate vowels and the listeners show a bias in favoring a "focal" vowel, resulting in asymmetric processing of the vowel pair in different directions. Directional asymmetry was also reported by Kuhl (1991Kuhl ( , 1992: listeners' discrimination from a prototypical to a non-prototypical vowel within a given category is more difficult than the same change in the reverse direction. For instance, listeners were presented with a range of synthesized [i] vowels which varied in F1/F2 and asked to rate the perceived goodness of the vowels. They consistently attached the highest goodness values to vowels within a particular vowel space (Kuhl, 1991). Variants with changes to F1/F2 were synthesized on the basis of the prototype and non-prototype exemplars selected according to the ratings. Compared to a non-prototype exemplar, it is more difficult to discriminate the prototype from its variants (Kuhl, 1992). Therefore, the NLM theory argues that early linguistic experience influences perceptual patterns, such that listeners become biased toward native prototypes. These prototypes in turn function as perceptual magnets for other members within category while stretching the distance between categories (Kuhl, 1992(Kuhl, , 1993. However, neither of the two models are applicable to our study as the difference wave was obtained by subtracting the waveform of the stimulus when presented as standard in one block from that of the same stimulus when presented as deviant in another block. In other words, there is no difference in vowel space or phonetic category between standard and deviant. The MMN component is automatically generated by change-detection and the neurons activated by standards are separate from those activated by deviants (Jacobsen et al., 2003;Näätänen et al., 2005Näätänen et al., , 2007. The repetition of stimuli, though, might lead to a refractory effect on neurons that are either activated by the standard or the deviant, but not both. Compared to the deviant, the neural response to standards is more likely to be suppressed due to its high probability of occurrence, resulting in a misestimate MMN (Jacobsen and Schröger, 2001;Jacobsen et al., 2003). Adopting physically identical stimuli allows for the generation of genuine MMN responses without contamination by physical differences of the stimuli (Jacobsen and Schröger, 2001;Jacobsen et al., 2003). Note that subtracting the waveform of standard stimuli from that of the deviant one may not completely eliminate the potential influence of N1 on MMN, as the amplitude of N1 elicited by different stimuli varies. Previous studies also found that distinct acoustic properties of segments in a syllable or consonant-vowel transition can lead to potential P1-N1-P2, which may have an effect on the asymmetric activation of MMN and LDN (Martin and Boothroyd, 1999;Miller and Zhang, 2014). Indeed, N1 has been noted as a component which extracts phonological features (cf. Obleser et al., 2004). Future studies could use alternative measurements to separate the effects of MMN and N1, and investigate the influence of the transition within stimuli or vocalic cue on the ERP components (Schröger and Wolff, 1996;Miller and Zhang, 2014).
To sum up, our results provide neurophysiological evidence for the interactive effect of vowels on fricative sibilants in Mandarin. Features such as TONGUE HEIGHT spread on the surface so that unspecified sibilants are influenced by following vowels. When followed by a [HIGH] [LOW]. In addition, the LDN component has demonstrated its reliability in linguistic processing among adults and its deflection pattern is roughly consistent with that of the MMN. Future studies should consider taking this component into consideration when investigating the underspecification of segments in the mental lexicon. In conclusion, not all features are fully specified in the mental lexicon and the specification of a feature such as TONGUE HEIGHT is determined by the number of contrastive segments in a certain language.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Central University Research Ethics Committee, University of Oxford. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YM: data collection, formal analysis, methodology, investigation, conceptualization, writing-original draft, and writing-review and editing. SK: methodology, conceptualization, and writingreview and editing. CX: data collection and writing-original draft. HW: writing-review and editing; AL: conceptualization, funding acquisition, methodology, project administration, supervision, writing-original draft, and writing-review and editing. All authors contributed to the article and approved the submitted version.