Abstract
Individuals vary in their ability to perceive suprasegmental cues, such as pitch, intensity, and duration, to make linguistic and nonlinguistic judgments, such as lexical stress, intonation, talker identity, and vocal emotion perception. For adult cochlear implant (CI) users, limitations in pitch perception significantly impair linguistic and nonlinguistic suprasegmental perception, creating barriers to effective real-world communication. While device-related factors are often emphasized in explaining variability in CI outcomes, growing evidence suggests that cognitive-linguistic factors play a critical role in shaping pitch-based suprasegmental perception. In this Perspective, we examine how cognitive-linguistic and experiential factors influence suprasegmental perception in both typically hearing listeners and adult CI users. We argue that these listener-level differences are essential to understanding variability in CI outcomes, offering insight beyond the effects of device limitations. We propose shifting from group-level generalizations to tailored rehabilitation strategies that target individual needs. Potential approaches include segmental speech training, auditory-cognitive training, and targeted pitch perception training. By identifying malleable sources of individual variation, we aim to support more personalized strategies to improve suprasegmental perception for both typically-hearing and hearing-impaired adults.
1 Introduction
Suprasegmental features, including pitch, intensity, and temporal components of speech that extend beyond individual sounds and convey critical linguistic and nonlinguistic information. These features allow listeners to distinguish between words, infer emotion, and interpret social context. However, adults with cochlear implants (CIs) often display deficits in suprasegmental perception, posing significant challenges to effective speech communication (Colby and Orena, 2022; Karimi-Boroujeni et al., 2023). Though research has identified group-level challenges, the substantial variability in CI outcomes suggests a need for more personalized approaches.
It is by now well-established that individuals vary in their abilities to learn, perceive, and produce language (Antoniou et al., 2010; Flege et al., 1995; Flege and Fletcher, 1992; MacWhinney, 2009). Research on segmental speech perception, or listeners’ perception of phonemes, has demonstrated individual differences in how listeners process “what is said.” However, the perception of the suprasegmental aspects of speech that convey “how it is said” also varies significantly across individuals (Abu El Adas and Levi, 2022; Adank and Janse, 2010; Bent et al., 2016; Nusbaum and Magnuson, 1997). While prior research on suprasegmental speech perception has largely emphasized talker-level sources of variability, such as prosody, voice characteristics, and accent, less attention has been paid to listener-level factors that shape perception. Among suprasegmental cues, pitch perception is central to both linguistic and nonlinguistic functions and is among the most studied suprasegmental features. In listeners with typical hearing, pitch cues are conveyed through fine spectral detail and temporal periodicity (Oxenham, 2013). In contrast, CIs provide only coarse representations of pitch, largely through temporal cues (Chatterjee and Peng, 2008; Rosen, 1992). These device limitations disrupt access to key suprasegmental information, yet adult CI users show variable outcomes. This individual variability suggests that listener-level cognitive-linguistic and experiential factors interact with degraded input to influence suprasegmental perception in meaningful ways.
In this Perspective paper, we argue that individual differences in the linguistic and nonlinguistic use of pitch are central to advancing theoretical frameworks and clinical approaches for improving communication outcomes in adult CI users. We first review literature on pitch perception by listeners with typical hearing, highlighting how language background, musical training, and domain-general cognitive skills shape suprasegmental perception. We then extend these insights to post-lingually deafened adults who use CIs, exploring how device limitations interact with listener-level factors to shape suprasegmental perception. By linking findings across populations, we aim to identify shared and unique factors contributing to individual variability. These insights will advance our knowledge regarding the mechanisms underlying linguistic and nonlinguistic pitch perception and inform the development of tailored interventions to support communication outcomes.
2 Suprasegmental perception among listeners with typical hearing
2.1 Linguistic and nonlinguistic pitch perception
Pitch perception plays a critical role in both linguistic and nonlinguistic suprasegmental tasks. For listeners with typical hearing, pitch is derived from both spectral (e.g., harmonic structure) and temporal cues (e.g., periodicity in the temporal fine structure) (Moore, 2008; Oxenham, 2013). These cues support fine frequency resolution as well as the perception of pitch contours that convey suprasegmental information in speech. However, typically-hearing adults vary in how they perceive and use pitch, reflecting differences in auditory and cognitive-linguistic processes.
One source of individual variation among typically-hearing listeners may be response strategies. For example, in a “missing fundamental” task, an artificial stimulus created of frequencies that could be harmonics of some fundamental frequency that is not present in the stimulus (Terhardt, 1979), some listeners perceive the pitch of the stimulus in terms of the missing fundamental, other listeners perceive the pitch in terms of the component frequencies. Ladd et al. (2013) determined this pattern to be a combination of perception and response strategy.
Similarly, in a set of experiments designed to explore individual variation in detecting small pitch changes, Semal and Demany (2006) identified two groups of listeners. The first group had better performance when identifying the direction of pitch change than when detecting that a pitch change occurred. The second group had poorer performance at identification relative to discrimination, and their difficulties with identification occurred when asked to make absolute direction judgments as well as when making same/different judgments. This was also interpreted to reflect response biases, highlighting the interplay between auditory processing and response strategies on pitch perception.
Outside of listeners’ response strategies, musical experience is another factor influencing pitch perception. Musicians consistently outperform non-musicians in pitch-based tasks such as absolute pitch identification, tone language learning, and psychophysical pitch discrimination, likely due to enhanced auditory and cognitive processing (Arndt et al., 2020; Perrachione and Wong, 2007; Van Hedger and Nusbaum, 2018). Cognitive factors such as working memory also contribute to pitch perception. For example, individual differences in auditory working memory capacity have been linked to variability in absolute pitch performance and learning outcomes (Van Hedger et al., 2015). Thus, the musical and cognitive skills the listener brings to the pitch perception task influence performance, with those listeners who have more baseline abilities better able to perceive pitch in more demanding listening situations. Although neither the Ease of Language Understanding model (ELU; Rönnberg et al., 2013) nor the Framework for Understanding Effortful Listening (FUEL; Pichora-Fuller et al., 2016a) specifically address suprasegmental perception, both offer useful perspectives for understanding individual differences pitch-based suprasegmental tasks. The ELU model emphasizes how degraded or ambiguous input may increase reliance on cognitive resources. The FUEL framework proposes how differences in task demands (such as ambiguity in pitch cues) may impose greater listening effort, depending on the listener’s cognitive-linguistic ability and motivation.
Taken together, these findings suggest that pitch perception in typically hearing adults is shaped by the interaction of auditory processing strategies, domain-general cognitive resources, and experiential factors. Importantly, individual differences in pitch perception extend beyond basic frequency discrimination tasks and influence how listeners perceive higher-level suprasegmental features in speech. In the following sections, we examine how pitch contributes to three key domains of suprasegmental perception, specifically, lexical stress, talker identity, and emotion.
2.2 Lexical stress perception
Variability in pitch perception plays a critical role in linguistic tasks, such as lexical stress and pitch accent perception. For example, Japanese uses pitch accent wherein the pitch pattern across the morae (syllables) of a word remain consistent regardless of the word’s place in the sentence or the prosodic phrase on the sentence. This is in contrast with English, which uses phrase-level intonation (Pierrehumbert, 1980). English-speaking learners of Japanese show significant variation in pitch accent acquisition (Muradás-Taylor, 2022), with auditory working memory and baseline pitch discrimination predicting learning gains (Goss, 2020). This suggests that the cognitive load associated with pitch perception seems to relate to listeners’ ability to utilize pitch linguistically, at least in the context of learning a new pitch system.
In English, lexical stress differentiates nouns and verbs (e.g., OB-ject vs. ob-JECT), with pitch being one of the most reliable cues, along with intensity and duration (Beckman and Edwards, 1994; Hayes, 1995). Though there is evidence that listeners track and utilize lexical stress to make lexical judgments (Cooper et al., 2002), there has been little investigation into how individual listeners vary in their use of lexical stress cues. Rather, much of the research into individual variation in lexical stress perception has focused on cross-linguistic perception, examining how native and non-native listeners integrate stress and speech segments. In one such study, native English listeners, native Mandarin listeners, and native Korean listeners were tested on the extent to which lexical stress and speech segments together shaped word recognition in English via a visual word paradigm (Connell et al., 2018). Stimuli were selected such that the target and competitor’s first syllables could match both segmentally and suprasegmentally (e.g., carpet and carton), match segmentally but not suprasegmentally (e.g., carpet and cartoon), or match neither segmentally nor suprasegmentally (e.g., parrot and parade). Mismatched segments took advantage of the fact that unstressed syllables have reduced vowels in English. As hypothesized, native English listeners were sensitive to lexical stress and were most advantaged in identifying the target when the suprasegmental and segmental information matched. Native Mandarin listeners, whose language has limited lexical stress, were also sensitive to lexical stress, but only utilized the suprasegmental cue when segmental information was unavailable. Finally, native Korean listeners, whose language does not have lexical stress, showed limited sensitivity to lexical stress, only utilizing the suprasegmental cue when it overlapped with the segmental cue. Similarly, Cantonese listeners outperform English listeners in lexical stress discrimination involving pitch, likely due to their tone language experience, but show no advantage when pitch is absent (Choi et al., 2019).
2.3 Talker identity perception
Talker identity perception relies on a combination of suprasegmental cues such as f0, speaking rate, and voice quality, alongside segmental cues. Listeners are sensitive to these talker-specific cues and are able to recognize familiar talkers (Nygaard and Pisoni, 1998; Remez et al., 2007), but individual variability in talker identification further highlights the influence of auditory and cognitive processes.
Abberton and Fourcin (1978) completed one of the earliest studies into talker identification. Using a mix of natural and synthetic tokens, they found that listeners could identify talkers based solely on prosodic information, with fundamental frequency averages and contours emerging as particularly salient cues. However, individual listeners differenced in their cue weighting strategies, with some downweighting fundamental frequency information. Later studies confirmed the reliability of fundamental frequency as a cue to talker identity, showing that listeners could identify talkers even in reversed speech or high- and low-pass filtered conditions (Compton, 1963; Lass et al., 1980; Van Lancker et al., 1985).
Language familiarity also influences talker identification; listeners are more accurate at identifying talkers in their own language than in an unfamiliar language (Goggin et al., 1991; Perrachione and Wong, 2007; Stevenage et al., 2012; Thompson, 1987). Importantly, superior pitch perception appears to be a strong predictor of talker identification ability. Xie and Myers (2015) showed that tone language speakers and musicians – both groups with enhanced pitch perception – performed better in cross-linguistic talker identification tasks relative to English-speaking non-musicians. Thus, it seems that language and musical experience remain the most reliable predictors of listeners’ talker identification performance. Extrapolating from what we learned from variation in pitch perception and lexical stress perception, it is plausible that these experiential factors reduce the cognitive load of the listening task, either directly by enhancing sensitivity to suprasegmental cues or indirectly by facilitating more efficient segmental processing. In turn, reduced cognitive load allows the listener to devote more resources to suprasegmental perception.
2.4 Emotion perception
Though the semantic content (meaning conveyed by words and word order) conveys much of a talker’s emotional state, pitch information also plays a key role in emotion perception. Indeed, the emotional prosody and the semantics of the utterance can be in conflict (as in, “I’m fine,” said in a voice indicating the talker is definitely not fine), requiring the listener to utilize both segmental and suprasegmental cues to interpret the talker’s meaning (Bachorowski, 1999; Wurm et al., 2001). Globerson et al. (2013) found that listeners’ pitch perception skills accounted for approximately one-third of the variance in vocal emotion recognition, highlighting the link between pitch perception and emotion perception.
Language background likely influences emotion perception, as seen in lexical stress perception above. Cho and Dewaele found that English and Korean listeners performed similarly when judging congruent emotional prosody and semantics in English sentences. However, when prosody and semantics conflicted, English listeners were better at integrating these cues (e.g., recognizing an utterance such as, “She’s good,” with a negative prosody as potentially being sarcastic), likely due to the greater pitch variation used to convey emotion in English compared to Korean (Cho and Dewaele, 2021). The authors interpreted this as Korean listeners being less able to recognize that the prosody carried semantics that could be distinct from the segmental semantics.
Age also appears to impact vocal emotion perception. Relative to younger adults, older adults have more difficulty identifying spoken emotions, partly due to age-related hearing loss, which reduces access to high-frequency acoustic cues (Amorim et al., 2021; Morgan and Ferguson, 2017). However, age-related declines in vocal emotion perception are not fully explained by age-related hearing or cognitive declines (Dupuis and Pichora-Fuller, 2015; Pichora-Fuller, 2003; Pichora-Fuller et al., 2016a; Ruffman et al., 2009). Older adults show broader difficulties with emotion perception, including interpreting emotional facial expressions (Connolly et al., 2021; Hayes et al., 2020; Vetter et al., 2020). These deficits may reflect a general decline in emotion recognition abilities with age.
Task complexity further exacerbates these challenges. In two experiments designed to explore the effect of sentence context on older adults’ emotion perception, Seddoh et al. (2020) found that older and younger adults were largely equivalent for sentences with a simple subject-verb-object word order. However, older adults were poorer at emotion perception than younger adults for more complex sentences with non-canonical word orders. One interpretation of these findings, then, is that older adults have insufficient cognitive capacity for suprasegmental perception when the processing task is demanding.
2.5 Conclusion
Across these studies in pitch perception, lexical stress, talker identity, and emotion perception we see common findings that variability stems from auditory, cognitive and experiential factors. These findings highlight that suprasegmental perception is not simply a function of bottom-up acoustic access, but reflects the listener’s ability to extract and use pitch information in speech. Understanding how these factors drive differences in typically-hearing populations provides a framework for addressing the unique challenges faced by CI users. By identifying shared and distinct factors driving individual outcomes, we can better inform rehabilitation strategies aimed at enhancing suprasegmental perception in both linguistic and nonlinguistic contexts. In the following section, we extend this framework to adult CI users, exploring how listener-level factors shape suprasegmental perception in the context of a degraded auditory signal.
3 Suprasegmental perception in adult CI users
Most research on individual differences in speech perception outcomes among adult CI users has focused on segmental speech perception, using isolated word (e.g., CNC words, Peterson and Lehiste, 1962) and sentence recognition tests (e.g., AzBio sentence recognition test, Spahr et al., 2012). In contrast, suprasegmental perception remains relatively understudied, despite its potential to explain the significant individual variability reported in daily listening experiences and communication outcomes among CI users (Boisvert et al., 2020; Lenarz et al., 2017).
Group-level deficits in suprasegemental perception are well established among CI users, with many studies reporting poor or abnormal performance on linguistic and nonlinguistic tasks requiring robust perception of pitch cues. These include lexical tone perception (Fu et al., 2004; Wei et al., 2004), intonation (Marx et al., 2015), talker discrimination and identification (Cullington and Zeng, 2010, 2011; Hay-McCutcheon et al., 2018), voice gender identification (Fu et al., 2004, 2005; Fuller et al., 2014; Massida et al., 2011, 2013; Meister et al., 2016), and vocal emotion perception (Jiam et al., 2017; Luo et al., 2014; Richter and Chatterjee, 2021; Luo et al., 2007), among others. However, performance varies greatly across individuals and tasks, and depends in part on the specific set of cues used to make suprasegemental judgments (Everhardt et al., 2020). Device-level factors, such as electrode array design, insertion depth, channel interaction, and the quality of the electrode-neuron interface, undoubtedly impact access to pitch cues in CI users and help explain these broad group-level limitations (Limb and Roy, 2014). However, these factors alone do not account for the considerable variation observed among individuals with similar devices or audiologic profiles. A complementary approach is to examine how CI users differ in their ability to extract and use degraded pitch cues, drawing on cognitive, linguistic, and experiential resources.
In this section, we review evidence highlighting individual differences in linguistic and nonlinguistic pitch perception among CI users, focusing on how auditory, cognitive, and linguistic factors influence performance. By linking these findings to insights from typically-hearing populations, we aim to identify shared mechanisms and unique challenges that inform more individualized models of communication and rehabilitation in this population.
3.1 Linguistic and nonlinguistic pitch perception
Adult CI users experience well-documented challenges in pitch perception, which affects both linguistic and nonlinguistic suprasegmental tasks. Compared to typically-hearing adults, CI users face additional constraints due to the limited fidelity of pitch cues conveyed by the device. In particular, pitch perception in CI users is largely based on temporal envelope cues, which support pitch discrimination up to around 300 Hz. Above this range, temporal cues are less reliable, and the absence of place-based pitch cues further limits access to pitch information (Zeng, 2002). Yet despite these limitations, pitch remains a meaningful source of information, and listeners vary in how they extract and apply this information in everyday communication.
Variability in cue weighting strategies for voice gender perception illustrates the complexity of this issue. At the group level, typically-hearing listeners use both fundamental frequency and vocal tract length (VTL) cues to identify voice gender, whereas adult CI users predominantly rely on fundamental frequency (Fuller et al., 2014; Gaudrain and Başkent, 2018). At the group level, the extent to which CI users rely upon VTL and/or F0 cues may also depend on the type and duration of the stimulus (e.g., word vs. sentence) as well as the social category (e.g., gender vs. age) (Meister et al., 2016; Meister et al., 2020; Schweinberger et al., 2020). Across individual CI users, some CI users make use of VTL cues in addition to fundamental frequency, with variability in cue weighting potentially driven by differences in auditory sensitivity. For example, while many CI users are sufficiently sensitive to fundamental frequency differences to distinguish between typical male and female voices, few demonstrate sufficient resolution to reliably use VTL cues (Fuller et al., 2014; Gaudrain and Başkent, 2018). It remains unclear whether these differences in voice cue sensitivity directly contribute to cue weighting, or whether listeners’ perceptual strategies also reflect adaptation to degraded input. Understanding these strategies could inform training approaches that support flexible cue use and help CI users more effectively leverage available suprasegmental information.
Residual acoustic hearing offers additional insights into variability in pitch perception. Due to the broadening of candidacy criteria, improved surgical techniques, and changes to electrode designs, many adult CI users are able to retain aidable residual low-frequency acoustic hearing, either in non-implanted (“bimodal” configuration) or implanted ear (electric-acoustic [EAS] stimulation). Residual hearing typically enhances performance in pitch-based tasks, including lexical tone, stress, intonation, and talker identity perception (Hay-McCutcheon et al., 2018; Luo et al., 2014; Marx et al., 2015; Most et al., 2011; Zhou et al., 2020). However, this benefit varies widely across studies and individuals, with some studies finding little to no bimodal advantage (Abdeltawwab et al., 2016; Cullington and Zeng, 2010, 2011). While auditory and device-related factors, such as residual hearing thresholds, electrode configuration, and integration of acoustic and electric input, likely influence the extent of benefit from residual hearing (Payne et al., 2023), emerging evidence suggests that cognitive factors may also shape listeners’ ability to make use of acoustic and electric cues during pitch-based tasks (Hua et al., 2017; Nyirjesy et al., 2024). Thus, further research is needed to understand how residual hearing interacts with auditory and cognitive factors to shape listening outcomes.
3.2 Talker identity perception
Adult CI users often face challenges in differentiating and identifying talkers relative to their typically-hearing peers (for a review, see Colby and Orena, 2022). While group-level deficits are well established, performance varies considerably across individuals, with some CI users potentially relying more heavily on cognitive resources. For example, Tamati et al. (2024) found that, at the group level, CI users demonstrated greater sensitivity and less bias when processing lexically easy words (high frequency, low neighborhood density) compared to lexically hard words (low frequency, high neighborhood density). At the individual level, performance varied substantially, with some CI users maintaining relatively stable accuracy across lexical difficulty, while others showed pronounced declines for lexically hard words. Exploratory analyses further revealed that bilateral and bimodal CI users demonstrated greater sensitivity in discriminating same-gender talkers, particularly for lexically hard words, compared to unilateral CI users. These findings suggest that access to additional acoustic information may enhance the ability to utilize top-down linguistic cues in suprasegmental tasks.
Other studies have more directly investigated the contribution of cognitive-linguistic abilities to talker perception. Li et al. (2022) reported that talker discrimination accuracy was related to several core cognitive functions, including inhibitory control, speed of lexical access, and nonverbal reasoning. These findings provide further support for the idea that suprasegmental perception is shaped not only by auditory input but also by cognitive-linguistic ability in adult CI users. Further, they align with cognitive hearing sciences frameworks such as ELU and FUEL (Rönnberg et al., 2013; Pichora-Fuller et al., 2016a), which propose that listeners rely more upon cognitive-linguistic resources when listening conditions are suboptimal. While these frameworks have been largely applied to broader speech understanding outcomes, they may also be useful for understanding and explaining individual differences in talker perception, as well as other pitch-based suprasegmental linguistic and nonlinguistic tasks. Taken together, this body of work highlights the importance of considering both auditory and cognitive-linguistic factors when evaluating individual differences in talker perception among adult CI users.
3.3 Lexical stress perception
Suprasegmental information, including pitch, intensity, and duration, not only support talker identification, but also serve linguistic functions such as emphasizing certain syllables to distinguish individual words and determine meaning (i.e., lexical stress), emphasize words in sentences to convey meaning (i.e., sentence stress), or to communicate grammatical information of a phrase or sentence (i.e., intonation). In English, stressed syllables are typically marked by higher pitch, greater intensity, longer duration, and fuller vowel quality, while vowels in unstressed syllables are often reduced (Bolinger, 1961; Lehiste, 1970). Typically-hearing listeners rely most heavily on vowel quality for lexical stress perception, followed by pitch, intensity, and duration (Chrabaszcz et al., 2014). In contrast, CI users generally demonstrate weak or abnormal lexical stress perception (Dincer D’Alessandro and Mancini, 2019; Everhardt et al., 2020; Morris et al., 2013; Most et al., 2011).
Recent work has begun to clarify the perceptual strategies used to perceive lexical stress by CI users. Fleming and Winn (2022) found that CI users relied more on vowel duration and intensity cues and less on pitch and vowel quality, compared to typically-hearing adults. Interestingly, CI users used both duration and pitch to a greater extent than the typically hearing adults listening to fully vocoded speech, suggesting that long-term experience with degraded signals may lead CI users to adopt more flexible cue weighting strategies. Simulations of bimodal hearing indicated low-frequency acoustic information could enhance reliance on pitch and vowel quality. Meister et al. (2011) examined sentence stress perception in adult CI users. They found that CI users were less sensitive to pitch and intensity than typically-hearing listeners, but performed similarly on duration-based tasks. Interestingly, discrimination of covarying pitch and intensity changes was more strongly correlated with identification of sentence stress in natural speech, compared to duration or pitch or intensity alone, suggesting that the use of both cues may help to compensate for degraded access to individual prosodic cues. These findings point to individual differences in auditory access and adaptive cue weighting as key contributors to variability in lexical stress and sentence stress perception among CI users.
Adult CI users similarly demonstrate relatively weak or abnormal intonation perception compared to typically-hearing peers. For example, Marx et al. (2015) found that CI users with residual hearing (thresholds better than 60 dB in the lower frequencies, 125 to 500 Hz), who were tested in their best-aided configuration, had better question/statement discrimination than CI users without any functional residual hearing, although both performed worse than typically-hearing adults. Sensitivity declined when pitch was removed, suggesting that pitch plays a role for these groups. Further, for the CI users with residual hearing, mean residual hearing level and performance on an f0 discrimination task were related to question/statement discrimination, reinforcing the critical role of pitch. Yet, it is important to note that reliance on secondary cues may increase susceptibility to additional degradations in challenging listening environments (Peng et al., 2008) and may lead to increased listening effort (Amichetti et al., 2021).
3.4 Emotion perception
Emotion perception presents another significant challenge for CI users. Luo et al. (2007) found that whereas listeners with typical hearing identified emotions with near-ceiling accuracy (89.9%) in a five-alternative forced-choice task, CI users, tested with only their implants, achieved much lower but above chance performance (44.9%). Similar findings across studies using different methodological approaches (Chatterjee et al., 2015; Jiam et al., 2017; Richter and Chatterjee, 2021) confirm that deficits in pitch perception substantially impact emotional prosody perception.
To compensate, CI users may rely on suprasegmental cues that are more reliably conveyed by the device, such as intensity and duration cues, or leverage linguistic knowledge. For example, Richter and Chatterjee (2021) showed that CI users relied more heavily on lexico-semantic cues when prosodic and semantic cues to emotion were incongruent, whereas typically-hearing listeners relied heavily on prosodic cues. Similarly, Taitelbaum-Swead et al. (2022) reported that CI users with stronger prosody-based emotion identification also showed stronger intonation perception, highlighting the shared reliance on pitch in both tasks.
Research on the perception of speaker sincerity, closely linked to emotion, offers additional insights into individual differences. Rothermich et al. (2022) showed that while CI users had difficulty identifying insincere speech, such as sarcasm or teasing, their performance improved with visual cues or verbal context. Bimodal users (CI + hearing aid) were more accurate at identifying speaker sincerity in the auditory-only condition, likely due to stronger pitch perception. Importantly, CI users who benefited more from visual cues showed more accurate understanding of the content of the conversation in the auditory-only condition. This suggests that when speech understanding is less effortful, listeners may be better able to allocate cognitive resources toward integrating multimodal cues. These findings further emphasize the importance of considering cognitive-linguistic factors in explaining variability in emotion perception among adult CI users.
3.5 Conclusion
This section illustrates how auditory and cognitive-linguistic factors jointly shape suprasegmental perception in adult CI users. Across tasks involving lexical stress, talker identity, and emotion perception, we see that individual outcomes reflect not only the degraded nature of CI input but also variability in listeners’ perceptual strategies and cognitive-linguistic skills. These findings suggest that optimal (re)habilitation strategies may need to be tailored to each listener’s unique combination of hearing configuration, auditory sensitivity, and cognitive strengths. Leveraging top-down strategies, such as linguistic content and visual cues, may help CI users mitigate challenges in pitch-based suprasegmental perception and support more effective communication in real-world settings. In the next section, we will build on these findings to consider how auditory training approaches can be designed to support suprasegmental perception.
4 Future rehabilitation targets
Before suggesting possible rehabilitation targets, we wish to emphasize that this is a perspectives article. Though we have made every effort to provide an overview of the literature regarding suprasegmental perception, we did not conduct a meta-analysis of the available data nor evaluate the quality of the literature. As a result, some of the possible rehabilitation targets may prove to be more efficacious than others, and we encourage research to rigorously test the efficacy of approaches to improve suprasegmental perception for listeners who use CIs.
While bottom-up auditory factors, including device-level constraints and listener-specific hearing characteristics, limit access to pitch cues, top-down cognitive factors offer a promising target for improving suprasegmental perception through training. For both the typically-hearing listeners and CI users, an increase in cognitive load may occur when the talker’s language background differs from that of the listener’s or when the speech signal is degraded. There is evidence indicating that listeners are more successful perceiving suprasegmental information when the listening load is lower (Xie and Myers, 2015), and increases in cognitive load are therefore thought to directly contribute to difficulties with suprasegmental speech perception. When listeners devote all their cognitive resources to accurately perceiving the segmental information, there are few resources available to perceive suprasegmental information (Ip and Cutler, 2020; Seddoh et al., 2020). To that end, rehabilitation efforts that can reduce processing demands may indirectly lead to improvements in the perception of suprasegmental information.
If it is correct that CI users are devoting most of their cognitive resources to perceive segments (Ip and Cutler, 2020), automizing the perception of segments should free up resources for suprasegmental pitch perception (Pichora-Fuller et al., 2016b; Rönnberg et al., 2013). Insights from the L2 literature have shown that training can lead to improvements in L2 speech perception accuracy, though outcomes are variable with some listeners showing larger training gains than others (for reviews see Ingvalson et al., 2014; Ingvalson and Wong, 2016). One way to reduce post-training outcome variability is to match training paradigms to listeners’ baseline abilities (Chandrasekaran et al., 2010; Golestani and Zatorre, 2009; Ingvalson et al., 2013a). Applying this approach to speech segment training in CI users, Moberly and colleagues found that phonological awareness training (i.e., rhyme detection, phoneme matching, and sound blending) led to improved speech perception accuracy in CI users with weak phonological sensitivity (Moberly et al., 2017). Future work could apply other lessons learned from the L2 speech training literature to further automize speech segment perception for CI users.
Another approach involves auditory-cognitive training, which combines auditory skills (e.g., gap detection, phoneme recognition) with domain-general cognitive abilities (e.g., working memory capacity). Though brain training in general has not been demonstrated to be effective at improving everyday cognitive performance (Simons et al., 2016), there is evidence that auditory-cognitive training can improve listeners’ speech perception (Ingvalson et al., 2015; Lelo de Larrea-Mancera et al., 2022; Mishra and Boddupally, 2018). These training paradigms are thought to improve processing capacity and efficiency, reducing the cognitive load necessary for speech perception and freeing up cognitive resources for other linguistic tasks, including suprasegmental pitch perception (Ingvalson and Wong, 2013). In CI users, auditory-cognitive training has been shown to improve speech perception in noise (Ingvalson et al., 2013b; Mishra et al., 2015). Such training, potentially improving speech processing efficiency and capacity, may lead to improved suprasegmental perception. Future work in this area could explicitly explore the impacts of auditory-cognitive training on suprasegmental tasks.
Finally, CI users’ pitch perception may be directly trainable, as evidenced by lexical tone training in prelingually deafened Mandarin-speaking children. Interestingly, these efforts have also borrowed from the L2 speech training literature, capitalizing on efforts to teach native English speakers lexical tone (Cooper and Wang, 2013; Wang et al., 1999, 2003). These studies have found that training paradigms that utilize multiple talkers, include lexical tone in multiple phonetic contexts, and provide feedback on children’s responses are effective in improving lexical tone perception (Zhang D. et al., 2023; Zhang H. et al., 2023; Zhang et al., 2021, 2024). While promising, these training paradigms have primarily been developed for typically-hearing L2 learners or pediatric CI users, and their efficacy in adult CI users has not been established. Differences in linguistic development, auditory plasticity, and cognitive abilities across populations highlight the need for adult-specific research. Adapting these methods for adult CI users, future studies could investigate how training programs tailored to baseline abilities impact pitch perception and whether these improvements translate to better suprasegmental processing. One exciting possibility for this avenue of research is that directly targeting and improving pitch perception could increase the automaticity of suprasegmental perception (Leitman et al., 2009), which in turn could free up cognitive resources for segmental perception and meaning comprehension, or simply alleviating listening-related fatigue.
5 Conclusion
Understanding variability in suprasegmental perception is critical for advancing both theoretical frameworks and clinical outcomes in CI users. We note that much of the suprasegmental research to-date has focused on talker level sources of variation; in this point of view paper we highlighted listener level sources of variability as rehabilitation is more likely to take place at the individual listener level. Variability in lexical stress, talker identity, and emotion perception demonstrates the interplay between auditory limitations and individual factors, such as residual hearing, language background, and cognitive abilities. These differences highlight the inadequacy of one-size-fits-all approaches in auditory rehabilitation and emphasize the importance of tailoring interventions to individual needs.
An increased focus on individual differences offers an opportunity to develop more effective rehabilitation strategies. In the final section we highlighted research suggesting segmental speech training, auditory-cognitive training, and/or pitch perception training may be effective for improving pitch-based linguistic or nonlinguistic perception. However, much of this work has been done in either typically-hearing adults learning an L2 or in pediatric CI users, meaning it remains unknown how effective these training paradigms will be for adult CI recipients, who were the focus of this paper.
A shift in our approach to rehabilitation for adult CI users is needed. Moving beyond group-level generalizations to embrace individual profiles will enable more targeted and effective interventions. The rapid advancements in artificial intelligence provide an exciting opportunity to personalize training approaches based on a given CI-user’s strengths and weaknesses. To support such personalized training, we encourage others in the field to help close the gap in research regarding listener-level sources of variability, and their malleability. This knowledge combined with new technologies have the potential improve outcomes for both typically-hearing listeners and hearing-impaired listeners who struggle with pitch-based suprasegmental perception, and we are excited to see what the future holds.
Statements
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
TT: Conceptualization, Investigation, Writing – original draft, Writing – review & editing. EI: Conceptualization, Investigation, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
Abberton E. Fourcin A. J. (1978). Intonation and speaker identification. Lang. Speech21, 305–318. doi: 10.1177/002383097802100405,
2
Abdeltawwab M. M. Khater A. El-Anwar M. W. (2016). Contralateral bimodal stimulation: a way to enhance speech performance in arabic-speaking cochlear implant patients. ORL78, 126–135. doi: 10.1159/000381024,
3
Abu El Adas S. A. Levi S. V. (2022). Phonotactic and lexical factors in talker discrimination and identification. Atten. Percept. Psychophys.84, 1788–1804. doi: 10.3758/s13414-022-02485-4
4
Adank P. Janse E. (2010). Comprehension of a novel accent by young and older listeners. Psychol. Aging25, 736–740. doi: 10.1037/a0020054,
5
Amichetti N. M. Neukam J. Kinney A. J. Capach N. March S. U. Svirsky M. A. et al . (2021). Adults with cochlear implants can use prosody to determine the clausal structure of spoken sentences. J. Acoust. Soc. Am.150, 4315–4328. doi: 10.1121/10.0008899,
6
Amorim M. Anikin A. Mendes A. J. Lima C. F. Kotz S. A. Pinheiro A. P. (2021). Changes in vocal emotion recognition across the life span. Emotion21, 315–325. doi: 10.1037/emo0000692,
7
Antoniou M. Best C. T. Tyler M. D. Kroos C. (2010). Language context elicits native-like stop voicing in early bilinguals’ productions in both L1 and L2. J. Phon.38, 640–653. doi: 10.1016/j.wocn.2010.09.005,
8
Arndt C. Schlemmer K. van der Meer E. (2020). Same or different pitch? Effects of musical expertise, pitch difference, and auditory task on the pitch discrimination ability of musicians and non-musicians. Exp. Brain Res.238, 247–258. doi: 10.1007/s00221-019-05707-8,
9
Bachorowski J.-A. (1999). Vocal expression and perception of emotion. Curr. Dir. Psychol. Sci.8, 53–57. doi: 10.1111/1467-8721.00013
10
Beckman M. E. Edwards J. (1994). “Articulatory evidence for differentiating stress categories” in Phonological structure and phonetic form: papers in laboratory phonology. ed. KeatingP. A. (Cambridge: Cambridge University Press), III, 7–III, 33.
11
Bent T. Baese-Berk M. Borrie S. A. McKee M. (2016). Individual differences in the perception of regional, nonnative, and disordered speech varieties. J. Acoust. Soc. Am.140, 3775–3786. doi: 10.1121/1.4966677,
12
Boisvert I. Reis M. Au A. Cowan R. Dowell R. C. (2020). Cochlear implantation outcomes in adults: a scoping review. PLoS One15:e0232421. doi: 10.1371/journal.pone.0232421,
13
Bolinger D. L. (1961). Contrastive accent and contrastive stress. Language37, 83–96. doi: 10.2307/411252
14
Chandrasekaran B. Sampath P. D. Wong P. C. M. (2010). Individual variability in cue-weighting and lexical tone learning. J. Acoust. Soc. Am.128, 456–465. doi: 10.1121/1.3445785,
15
Chatterjee M. Peng S. C. (2008). Processing F0 with cochlear implants: modulation frequency discrimination and speech intonation recognition. Hear. Res.235, 143–156. doi: 10.1016/j.heares.2007.11.004,
16
Chatterjee M. Zion D. Deroche M. L. Burianek B. Limb C. Goren A. et al . (2015). Voice emotion recognition by cochlear-implanted children and their normally-hearing peers. Hear. Res.322, 151–162. doi: 10.1016/j.heares.2014.10.003,
17
Cho C. M. Dewaele J.-M. (2021). A crosslinguistic study of the perception of emotional intonation: influence of the pitch modulations. Stud. Second. Lang. Acquis.43, 870–895. doi: 10.1017/S0272263120000674
18
Choi W. Tong X. Samuel A. G. (2019). Better than native: tone language experience enhances English lexical stress discrimination in Cantonese-English bilingual listeners. Cognition189, 188–192. doi: 10.1016/j.cognition.2019.04.004,
19
Chrabaszcz A. Winn M. Lin C. Y. Idsardi W. J. (2014). Acoustic cues to perception of word stress by English, mandarin, and Russian speakers. J. Speech Lang. Hear. Res.57, 1468–1479. doi: 10.1044/2014_JSLHR-L-13-0279,
20
Colby S. Orena A. J. (2022). Recognizing voices through a cochlear implant: a systematic review of voice perception, talker discrimination, and talker identification. J. Speech Lang. Hear. Res.65, 3165–3194. doi: 10.1044/2022_JSLHR-21-00209,
21
Compton A. J. (1963). Effects of filtering and vocal duration upon the identification of speakers, aurally. J. Acoust. Soc. Am.35, 1748–1752. doi: 10.1121/1.1918810
22
Connell K. HĂĽls S. MartĂnez-GarcĂa M. T. Qin Z. Shin S. Yan H. et al . (2018). English learners’ use of segmental and suprasegmental cues to stress in lexical access: an eye-tracking study. Lang. Learn.68, 635–668. doi: 10.1111/lang.12288
23
Connolly H. L. Young A. W. Lewis G. J. (2021). Face perception across the adult lifespan: evidence for age-related changes independent of general intelligence. Cognit. Emot.35, 890–901. doi: 10.1080/02699931.2021.1901657,
24
Cooper N. Cutler A. Wales R. (2002). Constraints of lexical stress on lexical access in English: evidence from native and non-native listeners. Lang. Speech45, 207–228. doi: 10.1177/00238309020450030101,
25
Cooper A. Wang Y. (2013). Effects of tone training on Cantonese tone-word learning. J. Acoust. Soc. Am.134:EL133–EL139. doi: 10.1121/1.4812435,
26
Cullington H. E. Zeng F.-G. (2010). Comparison of bimodal and bilateral cochlear implant users. Cochlear Implants Int.11, 67–74. doi: 10.1179/146701010X12671177440262,
27
Cullington H. E. Zeng F.-G. (2011). Comparison of bimodal and bilateral cochlear implant users on speech recognition with competing talker, music perception, affective prosody discrimination, and talker identification. Ear Hearing32, 16–30. doi: 10.1097/AUD.0b013e3181edfbd2,
28
Dincer D’Alessandro H. Mancini P. (2019). Perception of lexical stress cued by low-frequency pitch and insights into speech perception in noise for cochlear implant users and normal hearing adults. Eur. Arch. Otorrinolaringol.276, 2673–2680. doi: 10.1007/s00405-019-05502-9,
29
Dupuis K. Pichora-Fuller M. K. (2015). Aging affects identification of vocal emotions in semantically neutral sentences. J. Speech Lang. Hear. Res.58, 1061–1076. doi: 10.1044/2015_JSLHR-H-14-0256,
30
Everhardt M. K. Sarampalis A. Coler M. Başkent D. Lowie W. (2020). Meta-analysis on the identification of linguistic and emotional prosody in cochlear implant users and vocoder simulations. Ear Hearing41, 1092–1102. doi: 10.1097/AUD.0000000000000863,
31
Flege J. E. Fletcher K. L. (1992). Talker and listener effects on degree of perceived foreign accent. J. Acoust. Soc. Am.91, 370–389. doi: 10.1121/1.402780,
32
Flege J. E. Munro M. J. MacKay I. R. A. (1995). Factors affecting strength of perceived foreign accent in a second language. J. Acoust. Soc. Am.97, 3125–3134. doi: 10.1121/1.413041,
33
Fleming J. T. Winn M. B. (2022). Strategic perceptual weighting of acoustic cues for word stress in listeners with cochlear implants, acoustic hearing, or simulated bimodal hearing. J. Acoust. Soc. Am.152, 1300–1316. doi: 10.1121/10.0013890,
34
Fu Q.-J. Chinchilla S. Galvin J. J. (2004). The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users. J. Assoc. Res. Otolaryngol.5, 253–260. doi: 10.1007/s10162-004-4046-1,
35
Fu Q.-J. Chinchilla S. Nogaki G. Galvin J. J. (2005). Voice gender identification by cochlear implant users: the role of spectral and temporal resolution. J. Acoust. Soc. Am.118, 1711–1718. doi: 10.1121/1.1985024,
36
Fuller C. D. Gaudrain E. Clarke J. N. Galvin J. J. Fu Q.-J. Free R. H. et al . (2014). Gender categorization is abnormal in cochlear implant users. J. Assoc. Res. Otolaryngol.15, 1037–1048. doi: 10.1007/s10162-014-0483-7,
37
Gaudrain E. Başkent D. (2018). Discrimination of voice pitch and vocal-tract length in cochlear implant users. Ear Hearing39, 226–237. doi: 10.1097/AUD.0000000000000480,
38
Globerson E. Amir N. Golan O. Kishon-Rabin L. Lavidor M. (2013). Psychoacoustic abilities as predictors of vocal emotion recognition. Atten. Percept. Psychophys.75, 1799–1810. doi: 10.3758/s13414-013-0518-x,
39
Goggin J. P. Thompson C. P. Strube G. Simental L. R. (1991). The role of language familiarity in voice identification. Mem. Cogn.19, 448–458. doi: 10.3758/BF03199567,
40
Golestani N. Zatorre R. J. (2009). Individual differences in the acquisition of second language phonology. Brain Lang.109, 55–67. doi: 10.1016/j.bandl.2008.01.005,
41
Goss S. (2020). Exploring variation in nonnative Japanese learners’ perception of lexical pitch accent: the roles of processing resources and learning context. Appl. Psycholinguist.41, 25–49. doi: 10.1017/S0142716419000377
42
Hayes B. (1995). Metrical stress theory: principles and case studies. Chicago, IL: University of Chicago Press.
43
Hayes G. S. McLennan S. N. Henry J. D. Phillips L. H. Terrett G. Rendell P. G. et al . (2020). Task characteristics influence facial emotion recognition age-effects: a meta-analytic review. Psychol. Aging35, 295–315. doi: 10.1037/pag0000441,
44
Hay-McCutcheon M. J. Peterson N. R. Pisoni D. B. Kirk K. I. Yang X. Parton J. (2018). Performance variability on perceptual discrimination tasks in profoundly deaf adults with cochlear implants. J. Commun. Disord.72, 122–135. doi: 10.1016/j.jcomdis.2018.01.005,
45
Hua H. Johansson B. Magnusson L. Lyxell B. Ellis R. J. (2017). Speech recognition and cognitive skills in bimodal Cochlear implant users. J. Speech Lang. Hear. Res.60, 2752–2763. doi: 10.1044/2017_JSLHR-H-16-0276,
46
Ingvalson E. M. Barr A. M. Wong P. C. M. (2013a). Poorer phonetic perceivers show greater benefit in phonetic-phonological speech learning. J. Speech Lang. Hear. Res.56, 1045–1050. doi: 10.1044/1092-4388(2012/12-0024),
47
Ingvalson E. M. Dhar S. Wong P. C. M. Liu H. (2015). Working memory training to improve speech perception in noise across languages. J. Acoust. Soc. Am.137, 3477–3486. doi: 10.1121/1.4921601,
48
Ingvalson E. M. Ettlinger M. Wong P. C. M. (2014). Bilingual speech perception and learning: a review of recent trends. Int. J. Bilingual.18, 35–47. doi: 10.1177/1367006912456586
49
Ingvalson E. M. Lee B. Fiebig P. Wong P. C. M. (2013b). The effects of short-term computerized speech-in-noise training on post-lingually deafened adult cochlear implant recipients. J. Speech Lang. Hear. Res.56, 81–88. doi: 10.1044/1092-4388(2012/11-0291),
50
Ingvalson E. M. Wong P. C. M. (2013). Training to improve language outcomes in cochlear implant recipients. Front. Psychol.4, 1–9. doi: 10.3389/fpsyg.2013.00263,
51
Ingvalson E. M. Wong P. C. M. (2016). “Auditory training: predictors of success and optimal training paradigms” in Pediatric cochlear implantation: learning and the brain. eds. YoungN. M.KirkK. I.. 1st ed (New York: Springer), 293–298.
52
Ip M. H. K. Cutler A. (2020). Universals of listening: equivalent prosodic entrainment in tone and non-tone languages. Cognition202:104311. doi: 10.1016/j.cognition.2020.104311,
53
Jiam N. T. Caldwell M. Deroche M. L. Chatterjee M. Limb C. J. (2017). Voice emotion perception and production in cochlear implant users. Hear. Res.352, 30–39. doi: 10.1016/j.heares.2017.01.006,
54
Karimi-Boroujeni M. Dajani H. R. Giguère C. (2023). Perception of prosody in hearing-impaired individuals and users of hearing assistive devices: an overview of recent advances. J. Speech Lang. Hear. Res.66, 775–789. doi: 10.1044/2022_JSLHR-22-00125,
55
Ladd D. R. Turnbull R. Browne C. Caldwell-Harris C. Ganushchak L. Swoboda K. et al . (2013). Patterns of individual differences in the perception of missing-fundamental tones. J. Exp. Psychol. Hum. Percept. Perform.39, 1386–1397. doi: 10.1037/a0031261,
56
Lass N. J. Phillips J. K. Bruchey C. A. (1980). The effect of filtered speech on speaker height and weight identification. J. Phon.8, 91–100. doi: 10.1016/S0095-4470(19)31453-6
57
Lehiste I. (1970). Suprasegmentals. Cambridge, MA: Massachusetts Inst. of Technology Press, viii, 194.
58
Leitman D. Foxe J. J. Sehatpour P. Shpaner M. Javitt D. C. (2009). Mismatch negativity to tonal contours suggests preattentive perception of prosodic content. Brain Imaging Behav.3, 284–291. doi: 10.1007/s11682-009-9070-7,
59
Lelo de Larrea-Mancera E. S. Philipp M. A. Stavropoulos T. Carrillo A. A. Cheung S. Koerner T. K. et al . (2022). Training with an auditory perceptual learning game transfers to speech in competition. J. Cogn. Enhanc.6, 47–66. doi: 10.1007/s41465-021-00224-5,
60
Lenarz T. Muller L. Czerniejewska-Wolska H. Vallés Varela H. Orús Dotú C. Durko M. et al . (2017). Patient-related benefits for adults with cochlear implantation: a multicultural longitudinal observational study. Audiol. Neurotol.22, 61–73. doi: 10.1159/000477533,
61
Li M. M. Moberly A. C. Tamati T. N. (2022). Factors affecting talker discrimination ability in adult cochlear implant users. J. Commun. Disord.99:106255. doi: 10.1016/j.jcomdis.2022.106255,
62
Limb C. J. Roy A. T. (2014). Technological, biological, and acoustical constraints to music perception in cochlear implant users. Hear. Res.308, 13–26. doi: 10.1016/j.heares.2013.04.009,
63
Luo X. Chang Y. Lin C.-Y. Chang R. Y. (2014). Contribution of bimodal hearing to lexical tone normalization in mandarin-speaking cochlear implant users. Hear. Res.312, 1–8. doi: 10.1016/j.heares.2014.02.005,
64
Luo X. Fu Q.-J. Galvin J. J. (2007). Vocal emotion recognition by normal-hearing listeners and cochlear implant users. Trends Amplif.11, 301–315. doi: 10.1177/1084713807305301
65
MacWhinney B. (2009). “A unified model of language acquisition” in Handbook of bilingualism. eds. KrollJ. F.GrootA. M. B. D. (New York, NY: Oxford University Press), 49–67.
66
Marx M. James C. Foxton J. Capber A. Fraysse B. Barone P. et al . (2015). Speech prosody perception in cochlear implant users with and without residual hearing. Ear Hearing36, 239–248. doi: 10.1097/AUD.0000000000000105,
67
Massida Z. Belin P. James C. Rouger J. Fraysse B. Barone P. et al . (2011). Voice discrimination in cochlear-implanted deaf subjects. Hear. Res.275, 120–129. doi: 10.1016/j.heares.2010.12.010,
68
Massida Z. Marx M. Belin P. James C. Fraysse B. Barone P. et al . (2013). Gender categorization in cochlear implant users. J. Speech Lang. Hear. Res.56, 1389–1401. doi: 10.1044/1092-4388(2013/12-0132),
69
Meister H. Fuersen K. Streicher B. Lang-Roth R. Walger M. (2020). Letter to the editor concerning Skuk et al., "parameter-specific morphing reveals contributions of timbre and fundamental frequency cues to the perception of voice gender and age in Cochlear implant users". J. Speech Lang. Hear. Res.63, 4325–4326. doi: 10.1044/2020_JSLHR-20-00563,
70
Meister H. Fürsen K. Streicher B. Lang-Roth R. Walger M. (2016). The use of voice cues for speaker gender recognition in cochlear implant recipients. J. Speech Lang. Hear. Res.59, 546–556. doi: 10.1044/2015_JSLHR-H-15-0128,
71
Meister H. Landwehr M. Pyschny V. Wagner P. Walger M. (2011). The perception of sentence stress in cochlear implant recipients. Ear Hear.32, 459–467. doi: 10.1097/AUD.0b013e3182064882,
72
Mishra S. K. Boddupally S. P. (2018). Auditory cognitive training for pediatric cochlear implant recipients. Ear Hearing39, 48–59. doi: 10.1097/AUD.0000000000000462,
73
Mishra S. K. Boddupally S. P. Rayapati D. (2015). Auditory learning in children with cochlear implants. J. Speech Lang. Hear. Res.58, 1052–1060. doi: 10.1044/2015_JSLHR-H-14-0340,
74
Moberly A. C. Bates C. Boyce L. Vasil K. Baxter J. Ray Y. (2017). Computerized rehabilitative training in older adult cochlear implant users: a feasibility study. J. Acad. Rehabilitat. Audiol.50, 13–27.
75
Moore B. C. (2008). The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people. J. Assoc. Res. Otolaryngol.9, 399–406. doi: 10.1007/s10162-008-0143-x
76
Morgan S. D. Ferguson S. H. (2017). Judgments of emotion in clear and conversational speech by young adults with normal hearing and older adults with hearing impairment. J. Speech Lang. Hear. Res.60, 2271–2280. doi: 10.1044/2017_JSLHR-H-16-0264,
77
Morris D. Magnusson L. Faulkner A. Jönsson R. Juul H. (2013). Identification of vowel length, word stress, and compound words and phrases by postlingually deafened cochlear implant listeners. J. Am. Acad. Audiol.24, 879–890. doi: 10.3766/jaaa.24.9.11,
78
Most T. Harel T. Shpak T. Luntz M. (2011). Perception of suprasegmental speech features via bimodal stimulation: Cochlear implant on one ear and hearing aid on the other. J. Speech Lang. Hear. Res.54, 668–678. doi: 10.1044/1092-4388(2010/10-0071),
79
Muradás-Taylor B. (2022). Accuracy and stability in English speakers’ production of Japanese pitch accent. Lang. Speech65, 377–403. doi: 10.1177/00238309211022376,
80
Nusbaum H. C. Magnuson J. S. (1997). “Talker normalization: phonetic constancy as a cognitive process” in Talker Variability in Speech Processing. eds. JohnsonK. A.MullennixJ. W. (New York, NY: Academic Press), 109–132.
81
Nygaard L. C. Pisoni D. B. (1998). Talker-specific learning in speech perception. Percept. Psychophys.60, 355–376. doi: 10.3758/BF03206860,
82
Nyirjesy S. C. Lewis J. H. Hallak D. Conroy S. Moberly A. C. Tamati T. N. (2024). Evaluating listening effort in unilateral, bimodal, and bilateral Cochlear implant users. Otolaryngol. Head Neck Surg.170, 1147–1157. doi: 10.1002/ohn.609,
83
Oxenham A. J. (2013). Revisiting place and temporal theories of pitch. Acoust. Sci. Technol.34, 388–396. doi: 10.1250/ast.34.388,
84
Payne J. Au A. Dowell R. C. (2023). An overview of factors affecting bimodal and electric-acoustic stimulation (EAS) speech understanding outcomes. Hear. Res.431:108736. doi: 10.1016/j.heares.2023.108736,
85
Peng S.-C. Tomblin J. B. Turner C. W. (2008). Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing. Ear Hearing29, 336–351. doi: 10.1097/AUD.0b013e318168d94d,
86
Perrachione T. K. Wong P. C. M. (2007). Learning to recognize speakers of a non-native language: implications for the functional organization of human auditory cortex. Neuropsychologia45, 1899–1910. doi: 10.1016/j.neuropsychologia.2006.11.015,
87
Peterson G. E. Lehiste I. (1962). Revised CNC lists for auditory tests. J. Speech Hear. Disord.27, 62–70. doi: 10.1044/jshd.2701.62,
88
Pichora-Fuller M. K. (2003). Cognitive aging and auditory information processing. Int. J. Audiol.42, 2S26–2S32.
89
Pichora-Fuller M. K. Dupuis K. Smith S. L. (2016a). Effects of vocal emotion on memory in younger and older adults. Exp. Aging Res.42, 14–30. doi: 10.1080/0361073X.2016.1108734,
90
Pichora-Fuller M. K. Kramer S. E. Eckert M. A. Edwards B. Hornsby B. W. Y. Humes L. E. et al . (2016b). Hearing impairment and cognitive energy: the framework for understanding effortful listening (FUEL). Ear Hearing37, 5S–27S. doi: 10.1097/AUD.0000000000000312,
91
Pierrehumbert J. B. (1980). The phonology and phonetics of English intonation [thesis]. Philosophy: Massachusetts Institute of Technology.
92
Remez R. E. Fellowes J. M. Nagel D. S. (2007). On the perception of similarity among talkers. J. Acoust. Soc. Am.122, 3688–3696. doi: 10.1121/1.2799903,
93
Richter M. E. Chatterjee M. (2021). Weighting of prosodic and lexical-semantic cues for emotion identification in spectrally-degraded speech and with cochlear implants. Ear Hearing42, 1727–1740. doi: 10.1097/AUD.0000000000001057,
94
Rönnberg J. Lunner T. Zekveld A. Sörqvist P. Danielsson H. Lyxell B. et al . (2013). The ease of language understanding (ELU) model: theoretical, empirical, and clinical advances. Front. Syst. Neurosci.7:31. doi: 10.3389/fnsys.2013.00031,
95
Rosen S. (1992). Temporal information in speech: acoustic, auditory and linguistic aspects. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci.336, 367–373.
96
Rothermich K. Dixon S. Weiner M. Capps M. Dong L. Paquette S. et al . (2022). Perception of speaker sincerity in complex social interactions by cochlear implant users. PLoS One17:e0269652. doi: 10.1371/journal.pone.0269652,
97
Ruffman T. Sullivan S. Dittrich W. (2009). Older adults’ recognition of bodily and auditory expressions of emotion. Psychol. Aging24, 614–622. doi: 10.1037/a0016356,
98
Schweinberger S. R. von Eiff C. I. Kirchen L. Oberhoffner T. Guntinas-Lichius O. Dobel C. et al . (2020). The role of stimulus type and social signal for voice perception in cochlear implant users: response to the letter by Meister et al. J. Speech Lang. Hear. Res.63, 4327–4328. doi: 10.1044/2020_JSLHR-20-00595
99
Seddoh A. Blay A. Ferraro R. Swisher W. (2020). Prosodic perception in aging individuals: a focus on intonation. Curr. Psychol.39, 1221–1233. doi: 10.1007/s12144-018-9806-x
100
Semal C. Demany L. (2006). Individual differences in the sensitivity to pitch direction. J. Acoust. Soc. Am.120, 3907–3915. doi: 10.1121/1.2357708,
101
Simons D. J. Boot W. R. Charness N. Gathercole S. E. Chabris C. F. Hambrick D. Z. et al . (2016). Do “brain-training” programs work?Psychol. Sci. Public Interest17, 103–186. doi: 10.1177/1529100616661983,
102
Spahr A. J. Dorman M. F. Litvak L. M. Van Wie S. Gifford R. H. Loizou P. C. et al . (2012). Development and validation of the AzBio sentence lists. Ear Hearing33, 112–117. doi: 10.1097/AUD.0b013e31822c2549,
103
Stevenage S. V. Clarke G. McNeill A. (2012). The “other-accent” effect in voice recognition. J. Cogn. Psychol.24, 647–653. doi: 10.1080/20445911.2012.675321
104
Taitelbaum-Swead R. Icht M. Ben-David B. M. (2022). More than words: the relative roles of prosody and semantics in the perception of emotions in spoken language by postlingual cochlear implant users. Ear Hearing43, 1378–1389. doi: 10.1097/AUD.0000000000001199,
105
Tamati T. N. Jebens A. Başkent D. (2024). Lexical effects on talker discrimination in adult cochlear implant users. J. Acoust. Soc. Am.155, 1631–1640. doi: 10.1121/10.0025011,
106
Terhardt E. (1979). Calculating virtual pitch. Hear. Res.1, 155–182. doi: 10.1016/0378-5955(79)90025-X,
107
Thompson C. P. (1987). A language effect in voice identification. Appl. Cogn. Psychol.1, 121–131. doi: 10.1002/acp.2350010205
108
Van Hedger S. C. Heald S. L. M. Koch R. Nusbaum H. C. (2015). Auditory working memory predicts individual differences in absolute pitch learning. Cognition140, 95–110. doi: 10.1016/j.cognition.2015.03.012,
109
Van Hedger S. C. Nusbaum H. C. (2018). Individual differences in absolute pitch performance: contributions of working memory, musical expertise, and tonal language background. Acta Psychol.191, 251–260. doi: 10.1016/j.actpsy.2018.10.007,
110
Van Lancker D. Kreiman J. Emmorey K. (1985). Familiar voice recognition: patterns and parameters: I. Recognition of backward voices. J. Phon.13, 19–38.
111
Vetter N. C. Oosterman J. M. Mühlbach J. Wolff S. Altgassen M. (2020). The impact of emotional congruent and emotional neutral context on recognizing complex emotions in older adults. Aging Neuropsychol. Cognit.27, 677–692. doi: 10.1080/13825585.2019.1665164,
112
Wang Y. Jongman A. Sereno J. A. (2003). Acoustic and perceptual evaluation of mandarin tone productions before and after perceptual training. J. Acoust. Soc. Am.113, 1033–1043. doi: 10.1121/1.1531176,
113
Wang Y. Spence M. M. Jongman A. Sereno J. A. (1999). Training American listeners to perceive mandarin tones. J. Acoust. Soc. Am.106, 3649–3658. doi: 10.1121/1.428217,
114
Wei C.-G. Cao K. Zeng F.-G. (2004). Mandarin tone recognition in cochlear-implant subjects. Hear. Res.197, 87–95. doi: 10.1016/j.heares.2004.06.002,
115
Wurm L. H. Vakoch D. A. Strasser M. R. Calin-Jageman R. Ross S. E. (2001). Speech perception and vocal expression of emotion. Cogn. Emot.15, 831–852. doi: 10.1080/02699930143000086
116
Xie X. Myers E. (2015). The impact of musical training and tone language experience on talker identification. J. Acoust. Soc. Am.137, 419–432. doi: 10.1121/1.4904699,
117
Zeng F. G. (2002). Temporal pitch in electric hearing. Hear. Res.174, 101–106. doi: 10.1016/S0378-5955(02)00644-5,
118
Zhang H. Dai X. Ma W. Ding H. Zhang Y. (2024). Investigating perception to production transfer in children with cochlear implants: a high variability phonetic training study. J. Speech Lang. Hear. Res.67, 1206–1228. doi: 10.1044/2023_JSLHR-23-00573,
119
Zhang H. Ding H. Zhang Y. (2021). High-variability phonetic training benefits lexical tone perception: an investigation on mandarin-speaking pediatric cochlear implant users. J. Speech Lang. Hear. Res.64, 2070–2084. doi: 10.1044/2021_JSLHR-20-00631,
120
Zhang D. Ke S. Anglin-Jaffe H. Yang J. (2023). Morphological awareness and DHH students’ reading-related abilities: a meta-analysis of correlations. J. Deaf. Stud. Deaf. Educ.28, 333–349. doi: 10.1093/deafed/enad024
121
Zhang H. Ma W. Ding H. Zhang Y. (2023). Sustainable benefits of high variability phonetic training in mandarin-speaking kindergarteners with cochlear implants: evidence from categorical perception of lexical tones. Ear & Hearing44, 990–1006. doi: 10.1097/AUD.0000000000001341,
122
Zhou Q. Bi J. Song H. Gu X. Liu B. (2020). Mandarin lexical tone recognition in bimodal cochlear implant users. Int. J. Audiol.59, 548–555. doi: 10.1080/14992027.2020.1719437,
Summary
Keywords
suprasegmental perception, individual variation, cochlear implants, emotion perception, talker identification
Citation
Tamati TN and Ingvalson EM (2026) Individual variation in suprasegmental perception: insights from adults with typical hearing and cochlear implants. Front. Psychol. 16:1652000. doi: 10.3389/fpsyg.2025.1652000
Received
23 June 2025
Accepted
01 October 2025
Published
06 January 2026
Volume
16 - 2025
Edited by
Wioletta Pawlukowska, Pomeranian Medical University, Poland
Reviewed by
Hartmut Meister, University of Cologne, Germany
Ebru Kösemihal, Near East University, Cyprus
Updates
Copyright
© 2026 Tamati and Ingvalson.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Terrin N. Tamati, tamati.1@osu.edu
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.