Disorders of Pitch Production in Tone Deafness

Dalla Bella, Simone; Berkowska, Magdalena; Sowinski, Jakub

doi:10.3389/fpsyg.2011.00164

REVIEW article

Front. Psychol., 14 July 2011

Sec. Auditory Cognitive Neuroscience

volume 2 - 2011 | https://doi.org/10.3389/fpsyg.2011.00164

This article is part of the Research TopicThe relationship between music and languageView all 23 articles

Disorders of pitch production in tone deafness

Simone Dalla Bella^1,2*

Magdalena Berkowska¹ and Jakub Sowiński¹

¹ Department of Cognitive Psychology, University of Finance and Management, Warsaw, Poland
² International Laboratory for Brain, Music and Sound Research, Montreal, QC, Canada

Singing is as natural as speaking for the majority of people. Yet some individuals (i.e., 10–15%) are poor singers, typically performing or imitating pitches and melodies inaccurately. This condition, commonly referred to as “tone deafness,” has been observed both in the presence and absence of deficient pitch perception. In this article we review the existing literature concerning normal singing, poor-pitch singing, and, briefly, the sources of this condition. Considering that pitch plays a prominent role in the structure of both music and speech we also focus on the possibility that speech production (or imitation) is similarly impaired in poor-pitch singers. Preliminary evidence from our laboratory suggests that pitch imitation may be selectively inaccurate in the music domain without being affected in speech. This finding points to separability of mechanisms subserving pitch production in music and language.

Introduction

Making music (e.g., singing and dancing) is a universal form of expression, which is widespread across societies and cultures (Mithen, 2006). In particular, singing is as natural as speaking for the majority of people (Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Dalla Bella and Berkowska, 2009). Adult singing is accurate (although not necessarily precise, see Pfordresher et al., 2010) on both pitch and time dimensions (Dalla Bella et al., 2007; Dalla Bella and Berkowska, 2009) and remarkably consistent both within and across individuals (e.g., Levitin, 1994; Levitin and Cook, 1996; Bergeson and Trehub, 2002). Extensive vocal training is not a sine qua non for singing in tune. The majority of individuals do not require formal vocal training or musical tutoring to sing proficiently. Singing emerges spontaneously during development, as observable vocalizations during the first months of life (Papoušek, 1996). This behavior, likely facilitated by the universality of maternal singing (e.g., Trehub and Trainor, 1999), is promptly imitated by infants. Eighteen-month-old children can produce recognizable songs, by repeating short musical phrases (e.g., see Ostwald, 1973; Welch, 2006, for a review). Finally, far from being merely a cultural frill, singing (and more generally vocalizations) is likely to have played a role during evolution. It is a common observation that people particularly enjoy singing when in group contexts (e.g., during religious ceremonies, in the military). This participatory aspect of singing is thought to foster group bonding, one of the reasons, together with sexual selection and mood regulation, why music may have some adaptive value (Wallin et al., 2000; Huron, 2001; Mithen, 2006).

In spite of the fact that singing is widespread, there are noticeable exceptions. In the general population, a few individuals, referred to as “tone deaf,” have notorious difficulties in carrying a tune. Recent studies estimate that approximately 10–15% of the general population may be particularly inaccurate in producing pitch, by singing quite far from the target pitches (in experiments requiring production of familiar melodies or imitation of single pitches, intervals, or simple melodies; Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Dalla Bella and Berkowska, 2009). This estimate rises to about 55%, however, when the consistency of repeated attempts to produce a pitch (i.e., precision) is taken into account (Pfordresher et al., 2010). A few studies have been carried out recently on poor-pitch singing showing that different sources of impairment may underline this disorder, such as perceptual deficits or auditory–motor integration (see Pfordresher and Brown, 2007; Berkowska and Dalla Bella, 2009b).

In the present article we review recent findings on normal singing in adults, and on poor-pitch singing with particular attention to those studies making use of quantitative estimates of singing proficiency (i.e., based on acoustic analyses). We mostly focus on singing proficiency in the majority (i.e., non-musicians). Note that there is a rich literature regarding acoustical features of professional singing (for reviews, see Sundberg, 1987, 1999). However, in only a few isolated studies has pitch production in professional singers been examined (Vurma and Ross, 2006; Zurbriggen et al., 2006). These studies revealed that when professional singers are asked to produce pitch intervals, they can be out of tune by 20–25 cents, with respect to the equally tempered scale. This error is typically not noticed by expert listeners (Vurma and Ross, 2006). In addition, features like the accuracy of the first note of the melody and melodic contour play a role in motor planning, as shown by asking singers to prepare for producing a melody (Zurbriggen et al., 2006).

In spite of the rich literature on vocal performance during development in Music Education (e.g., see Welch, 1979, 2006, for early studies on poor-pitch singing), this review is devoted to adult singing. We review behavioral and neuroimaging evidence, in order to examine the mechanisms which are likely malfunctioning in poor-pitch singers. Finally, since accurate production of pitch variations is a key process in both music and speech, we examine whether pitch production deficits in tone-deaf individuals extend to the language domain. To this aim, we present some preliminary and intriguing findings coming from our laboratory, suggesting that pitch production mechanisms may show domain-specificity.

Normal Singing

How Singing Works

Singing, like speaking, involves three independent physical components: respiratory, laryngeal (i.e., the vocal folds), and articulatory mechanisms (i.e., the vocal tract). Lungs provide the air supply needed for vocalization. The vocal folds modulate the airstream coming from the lungs (i.e., by chopping it into air pulses), a process referred to as “phonation.” Finally the vocal tract conveys to each sound the spectral and temporal properties characteristic of sung voice (e.g., Titze, 1994; Sundberg, 1999). The quality of the vocal output, telling apart professional from amateur singers, depends on the fine coordination of these mechanisms. The acoustical properties of the singing voice in professional singers have been the subject of a substantial body of research (for reviews, see Sundberg, 1987, 1999). For example, particular attention has been devoted to the so-called singer’s formant (i.e., partials falling in the frequency range of 2.5–3.0 KHz; Sundberg, 1987), which in professional singers is much stronger in sung vowels than in spoken vowels. The intensity of the singer’s formant, the presence of vibrato, the maximum phonational frequency range, and loudness all increase with musical experience (e.g., Brown et al., 2000; Mendes et al., 2003; Hunter et al., 2006). Until recently, however, evidence was scant on the mechanisms underlying singing proficiency (i.e., on the pitch and time dimensions).

Singing proficiency can be assessed with a variety of tasks. A quite natural task, which does not require vocal training, is to have participants sing a well-known song (i.e., “Happy birthday”) with lyrics from memory (e.g., Dalla Bella et al., 2007, 2009). A variant of the task is to ask participants to sing the same melody on a syllable (e.g., /la/); this typically results in enhanced performance, likely due to reduced memory load (Berkowska and Dalla Bella, 2009). Another possibility consists in presenting a model stimulus (e.g., a single pitch, an interval, or a short novel melody) which participants have to imitate, using a vowel or a syllable (pitch-matching tasks). Single-pitch-matching is a quite common task (Goetze et al., 1990; Pfordresher and Brown, 2007; Hutchins et al., 2010a) and is considered an important factor in assessing musical talent (Watts et al., 2003). In both singing from memory and imitation tasks, singing proficiency can be assessed with natural feedback, but also while the auditory feedback is augmented, for example by providing the correct pitch or melody as one sings (e.g., Pfordresher and Brown, 2007; Tremblay-Champoux et al., in press) or an altered pitch or melody (e.g., Hafke, 2008; Zarate and Zatorre, 2008). In sum, there are different ways to obtain a measure of singing proficiency. Interestingly, different tasks are likely to reflect the activity of partly different functional components of the song system. For example, singing from memory will particularly tap the retrieval of musical information from long-term memory. In contrast, working memory and auditory–motor mapping mechanisms are mainly targeted by imitation tasks (e.g., by augmenting or altering auditory feedback). Thus, different tasks may serve to assess the correct functioning of different components of the song system.

To shed light on the functional components of the song system underlying normal pitch production in the execution of the aforementioned tasks, and eventually to account for poor-pitch singing, we present in Figure 1 a schema of the vocal sensorimotor loop (VSL; see also Berkowska and Dalla Bella, 2009b). This schema is inspired, among others, by the Perceptual loop theory (Levelt, 1989), an account of performance self-monitoring and correction in speech, processes which are similarly relevant in music performance. The VSL includes perceptual, motor, auditory–motor mapping, and memory components. According to the VSL, singing of well-known melodies from memory is based on the retrieval from long-term memory of pitch and temporal information, followed by fine motor planning/implementation. The ongoing vocal production is fed back to the system (i.e., perception), matched with the planned melody, in some cases leading to error correction, thereby affecting planning of upcoming events. Vocal imitation of novel pitch sequences relies on short-term memory, and auditory–vocal mapping, without tapping retrieval from long-term memory. The target pitches to be imitated are perceptually analyzed, stored in short-term memory, and eventually mapped into motor gestures. As with singing from memory, self-monitoring of vocal performance is made possible by mechanisms allowing feedback analysis, auditory–motor mapping, and in some cases error correction. It is worth noting that overt and covert pathways for pitch perception are possible. The covert pathway is involved in tasks requiring explicit judgments of pitch differences (e.g., pitch discrimination). In some cases, participants are very inaccurate in judging pitch differences, still exhibiting proficient singing. In this condition it is hypothesized that pitch differences are analyzed via covert mechanisms, thus affording proficient singing (Griffiths, 2008; Loui et al., 2008; Dalla Bella et al., 2009).

FIGURE 1

Figure 1. Schema of the vocal sensorimotor loop (VSL; see also Berkowska and Dalla Bella, 2009b). The brain regions likely associated with the functional components of the VSL are indicated in gray. SMA, supplementary motor area; ACC, anterior cingulate cortex; dPFC, dorsal prefrontal cortex; SPT, cortex of the dorsal Sylvian fissure at the parietal–temporal junction.

Measuring Singing Accuracy and Precision

Singing proficiency has been mostly assessed by asking expert musicians to subjectively rate recordings (e.g., Hébert et al., 2003; Schön et al., 2004; Racette et al., 2006; Wise and Sloboda, 2008). This method provides a general and fast assessment of vocal performance. However, raters are not always consistent in providing their judgments (Kinsella et al., 1988; Prior et al., 1990). In addition, peers can hardly provide fine estimates of proficiency on the pitch dimension, independent of the time dimension, such as the exact deviation from the model pitch, variability over repetitions, and so forth. This is mostly due to the fact that musicians often categorize pitch and duration information with respect to the closest musical value. An alternative which has proven successful is to compute objective measurement of accuracy with acoustic methods (e.g., Murayama et al., 2004; Terao et al., 2006; Dalla Bella et al., 2007, 2009; Pfordresher et al., 2010). This method consists in computing note pitch onsets and pitch height after acoustic segmentation of the auditory signal. This information can be used to compute measures of accuracy and precision in vocal performance (Pfordresher et al., 2010).

Accuracy and precision can be computed separately for absolute pitch (i.e., the absolute pitch height of musical notes) and for relative pitch (i.e., the discrepancy between two subsequent pitches, or interval, expressed in semitones). For absolute pitch, accuracy indicates the average difference between sung and target pitches. Typically, such difference does not take into account the direction of the change (i.e., whether the sung pitch is on average higher or lower than the target pitch; Pfordresher et al., 2010; but see Pfordresher and Brown, 2007, for a measure of signed error). Another measure of accuracy in terms of absolute pitch, referred to as “initial pitch deviation” (i.e., difference between the first produced pitch and the first note of a target melody), has been used in our laboratory (Berkowska and Dalla Bella, 2009a; Dalla Bella and Berkowska, 2009). Precision in terms of absolute pitch is the consistency in repeating the target pitch (i.e., whether a repeated note similarly deviates from the target across repetitions; Pfordresher et al., 2010). Another measure of variability related to precision, referred to as “pitch stability” (Dalla Bella et al., 2007, 2009), consists in computing the deviation of two reproductions of a single phrase in a melody. Similar measures of accuracy and precision of relative pitch can be computed for tasks where participants sang from memory or imitated pitch sequences. In this case, accuracy refers to the average difference between sung pitch intervals and target intervals based on the notation. This measure, sometimes referred to as “pitch interval deviation,” has been adopted in a few studies as a measure of singing proficiency (e.g., Dalla Bella et al., 2007, 2009; Berkowska and Dalla Bella, 2009a; Dalla Bella and Berkowska, 2009; Pfordresher et al., 2010). To our knowledge, precision of relative pitch (i.e., consistency in repeating the same target interval) has been examined in only one study (Pfordresher et al., 2010).

Objective acoustically based measures of singing accuracy and precision have the advantage of making explicit the criteria for teasing apart good from poor-pitch singers. To this aim, the choice of the measure(s) of singing proficiency has to be carefully made. Indeed, different criteria can lead to very different estimates of the prevalence of poor-pitch singing (much higher when pitch precision is considered instead of pitch accuracy; Pfordresher et al., 2010). In the majority of studies, accuracy in producing or imitating pitches is considered instead of precision. Individuals can be qualified as poor-pitch singers based on an absolute criterion, namely when in a pitch-matching task their produced pitches depart from a target pitch by more than a semitone (e.g., Pfordresher and Brown, 2007; Pfordresher et al., 2010). In other cases, individuals can be classified as poor-pitch singers relative to a control/comparison group, as often observed in single-case studies of patients with brain damage (e.g., Schön et al., 2004; Satoh et al., 2007), thus adopting a variable criterion. An alternative is to treat as poor-pitch singers those individuals who are outliers in a given group, for example deviating from the average of the group by more than 2 SD (e.g., Dalla Bella and Berkowska, 2009). A final distinction which is worth mentioning is between measures of accuracy based on absolute pitch differences (i.e., the deviation of produced pitch from the target pitch, in imitation tasks; e.g., Pfordresher and Brown, 2007) and measures based on relative pitch differences (i.e., the deviation of produced intervals from the target interval in singing from memory or imitation tasks; e.g., Dalla Bella et al., 2007). Due to the various criteria for defining poor-pitch singers, comparison of different studies is meaningful provided that poor-pitch singers have been selected using similar criteria.

Singing in the General Population

Until recently, relatively little was know about singing abilities in the general population. People generally tend to underestimate their ability to carry a tune. For example, almost 60% of 1000 university students reported that they cannot accurately imitate melodies (Pfordresher and Brown, 2007). Moreover, self-declared tone-deaf individuals, that is about 17% of the student population, believe that they cannot sing proficiently (Cuddy et al., 2005). Yet, systematic assessments of singing proficiency in the general population indicate that around 85–90% can sing in tune (Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Dalla Bella and Berkowska, 2009; but see Pfordresher et al., 2010, for a lower estimate when considering precision instead of accuracy).

We examined singing proficiency in the majority by testing a group of 62 occasional singers in Montreal (20 university students in the lab, and 42 participants recruited in a public park) as compared to 4 professional singers (Dalla Bella et al., 2007). Participants sang the refrain of a well-known song with lyrics. Renditions were submitted to acoustical analyses. Occasional singers were less accurate in producing pitch intervals (with a deviation of 0.6 semitones from the correct intervals, on average) than professional singers (deviation of 0.3 semitones). At the same time, occasional singers sang faster than professionals, a phenomenon tied to lower pitch accuracy. Further tests on 15 participants indicated that slowing down tempo typically enhances accuracy in producing pitch intervals. Yet, two participants (i.e., poor-pitch singers) did not exhibit any improvement as a result of the slower tempo. That the majority can carry a tune was confirmed more recently on a larger sample of occasional singers with familiar musical material, and comparing production (i.e., singing from memory) with imitation (Berkowska and Dalla Bella, 2009a; Dalla Bella and Berkowska, 2009). Occasional singers were less accurate when they sang from memory than in the imitation task. Moreover, their performance was more accurate when they sang on a syllable (i.e., with lower memory load) than with lyrics.

Other studies focused on vocal imitation abilities (e.g., pitch matching). The first studies examining single-pitch-matching (i.e., imitation of single pitches) revealed that adults perform poorly on this task (Ternström et al., 1988; Murbe et al., 2002; Amir et al., 2003). For example, non-musicians typically deviate by 1.3 semitones on average as compared to 0.5 semitones for musicians (Ternström et al., 1988; Murry, 1990; Murry and Zwiner, 1991; Amir et al., 2003). This estimate of accuracy in pitch imitation in non-musicians may be too defeatist, though. Low accuracy in imitating pitch does not characterize all individuals without musical training (Estis et al., 2009). Morever, poorer performance in non-musicians may partly result from using pure tones as models for imitation. When imitating synthesized voices or sung performances, non-musicians achieved higher accuracy, exhibiting pitch deviations around 0.5 semitones or less (Pfordresher and Brown, 2007; Wise and Sloboda, 2008; see also Watts and Hall, 2008). Hence, accuracy in pitch-matching depends on the acoustic features of the stimulus to be imitated (for similar results with children, see also Small and McCachern, 1983; Green, 1990). Target stimuli sharing acoustical properties (i.e., spectral and temporal features) with the vocal production are likely to facilitate mapping onto sensorimotor representations, thus entailing enhanced accuracy.

Imitation of single pitches, intervals, and short novel melodies has been systematically assessed by Pfordresher and Brown (2007). A large sample of university students without musical training imitated various pitch sequences (i.e., a single repeated note, a sequence including a single change of pitch, and short four-note melodies). Most participants were able to perform the task accurately (i.e., with renditions within 1 semitone from the target pitches). In addition, occasional singers were less accurate in terms of both absolute and relative pitch in imitating short melodies as compared to single pitches (as in Wise and Sloboda, 2008). That accuracy in pitch-matching decreases with the number of elements in a sequence is likely reflecting working memory constraints. Note that impaired working memory is a relevant factor in defining the profile of individuals with congenital music disorders (Tillmann et al., 2009; Williamson et al., 2010b), which is likely to affect their vocal production (Dalla Bella et al., 2009; Tremblay-Champoux et al., in press). The finding that occasional singers are typically quite accurate in imitating short unfamiliar melodies was replicated recently by Pfordresher et al. (2010). Nevertheless, these authors found that the majority was imprecise (i.e., the SD of the fundamental frequency for renditions of the same pitch class or interval exceeded 1 semitone). This intriguing finding suggests that precision, instead of accuracy, may be taken into consideration when subjectively judging our own performance. This may account for the very high percentage of individuals with singing difficulties by self-report. To sum up, even though early studies suggest that occasional singers are quite inaccurate in imitating single pitches, recent studies yielded more optimistic results. Nevertheless, accuracy in imitating pitch rapidly decreases with increasing sequence length and complexity. Moreover, even though occasional singers are quite accurate in imitating pitches, they may still be not very consistent over repetitions. The most recent studies devoted to singing proficiency in the general population are summarized in Table 1.

TABLE 1

Table 1. Summary of the main recent studies using acoustical measures of pitch accuracy and precision in normal individuals without musical training.

It is worth mentioning that some studies focused on the relation between accuracy in pitch-matching tasks and pitch discrimination skills. Indeed, it is possible that some occasional singers are particularly accurate in imitating pitch sequences due to fine monitoring of their own performance, allowing for efficient error correction. For example, pitch matching in untrained singers covaries positively with the ability to discriminate pitches (i.e., good singers are more accurate in discriminating pitches than poor singers; e.g., Watts et al., 2003, 2005). Yet other studies failed to replicate this finding (Bradshaw and McHenry, 2005; Moore et al., 2008). This situation is reminiscent of studies comparing perception and performance skills in accurate singing during development, similarly yielding conflicting results (for studies showing a link between pitch perception and production, see Phillips and Aitchinson, 1997; Demorest, 2001; Demorest and Clements, 2007; for lack of replication, see Roberts and Davis, 1975; Geringer, 1983; Apfelstadt, 1984). In sum, whether (and to what extent) pitch perception and production are linked in adult occasional singers is still a matter of debate. The possibility of a dissociation of perception and action in vocal performance is addressed when we will discuss poor-pitch singing in tone deafness.

Neuronal Underpinnings of the Song System

Singing is supported by a complex neural network involving motor and sensory areas, as well as auditory–motor integration regions. Several neuroimaging studies have been conducted with the goal of uncovering the neuronal underpinnings of the human song system. In this review we focus on the neuronal mechanisms underlying the main components of the VSL (see Figure 1). Motor areas (e.g., primary motor cortex), and in particular the mouth region (e.g., Brown et al., 2004) and the larynx/phonation area, are recruited during singing (by adduction/abduction and tension/relaxation of the vocal folds; see Brown et al., 2008). Sensory areas, such as the superior temporal gyrus, are also engaged by vocal performance, for example when repeating a single note (Perry et al., 1999), or singing more complex melodies (Brown et al., 2004; Kleber et al., 2007). Other cortical areas which systematically are recruited by vocal performance are the supplementary motor area (SMA), the anterior cingulate cortex (ACC), and the insula. For example, the SMA is notoriously engaged in high-level motor control, and needed for efficient motor planning in sequence production, such as in overt speech production (e.g., Turkeltaub et al., 2002). The ACC is a region associated with initiation of vocalization (see Jurgens, 2002, for a review), and activated during overt speech and singing (Perry et al., 1999; Paus, 2001). Finally, singing recruits the insula (Perry et al., 1999; Brown et al., 2004; Kleber et al., 2007; Zarate and Zatorre, 2008). This region, in particular the anterior insula, is mostly associated with articulation processes during vocalization (e.g., Dronkers, 1996), and given its connections with both the ACC and with auditory areas, it may play a role in integrating auditory feedback with motor output (Riecker et al., 2000; Ackermann and Riecker, 2004).

Other studies focused on the neuronal mechanisms acting as an interface between the sensory and the motor systems, thus affording sensorimotor mapping/integration (see Figure 1). Such areas are for example the dorsal prefrontal cortex, inferior sensorimotor cortex, and the superior temporal gyrus and sulcus, which are active both when speaking and singing (özdemir et al., 2006; Gunji et al., 2007; Zarate and Zatorre, 2008). Another region involved in auditory–motor integration in vocal performance is area SPT (i.e., cortex of the dorsal Sylvian fissure at the parietal–temporal junction), which is recruited by both covert speech and covert humming (Hickok et al., 2003; Pa and Hickok, 2008). Area SPT has gained particular attention as a key region functioning as a sensorimotor interface between speech perception, working memory, and speech production (Hickok and Poeppel, 2007; Hickok et al., 2009). To sum up, neuroimaging evidence points to a complex neuronal network supporting vocal performance in singing. Further studies are needed to clarify the involvement of this network in various vocal tasks (e.g., pitch matching and singing from memory) and its relations with observed individual differences in terms of accuracy and precision.

Since both singing and speech involve vocalization and analysis of auditory feedback, it is reasonable to ask to what extent they rely on dedicated processes or rather share the same neuronal network (for a review, see Gordon et al., 2006). Brain areas underlying speaking and singing significantly overlap in non-musicians (e.g., Brown et al., 2006; Wilson et al., 2010). Nevertheless, singing appears to predominantly recruit right-hemisphere regions whereas speech production recruits primarily areas in the left hemisphere. Covert singing of familiar tunes without lyrics is correlated with greater activation in right sensorimotor cortex; in contrast, speaking an overlearned word string involves left sensorimotor cortex (Wildgruber et al., 1996; Riecker et al., 2000; Ackermann and Riecker, 2004). A similar lateralization pattern was found when speaking and singing with lyrics were contrasted (Callan et al., 2006, with covert performance; Jeffries et al., 2003, with overt performance). These findings are partly supported by brain stimulation studies (with transcranial magnetic stimulation, TMS). Applying TMS over left-hemisphere regions associated with speech production (e.g., near Broca’s area) disrupts speech production. Comparable stimulation over homologous regions of the right hemisphere revealed inconsistent results across participants. Stimulation disrupted singing in only a minority of participants (Epstein et al., 1999; Lo and Fook-Chong, 2004), a finding recently replicated with subdural cortical stimulation (Suarez et al., 2010). Thus, song production shows, in general, more bilateral involvement than speech production.

Poor-Pitch Singing

Both a brain insult and neurogenetic (i.e., congenital) disorders can disrupt the functioning of the song system, thereby leading to poor-pitch singing. In the present review we selectively focus on poor-pitch singing in otherwise healthy participants without musical training (i.e., tone-deaf individuals). Studies on vocal amusia or oral-expressive amusia consequent to brain damage have been reviewed elsewhere (Marin and Perry, 1999; Gordon et al., 2006; Ackermann et al., 2006; Berkowska and Dalla Bella, 2009b; Stewart et al., 2009). About 10–15% of the population is inaccurate when asked to sing a melody from memory or when imitating a pitch sequence (Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Dalla Bella and Berkowska, 2009). Poor-pitch singing is often treated as a sign of more general lack of musicality, or tone deafness, a term which became widespread (see Sloboda et al., 2005, for a discussion). The term “tone deafness” literally suggests that poor-pitch singing may stem from a deficient perceptual system. That impoverished perception may lead to poor-pitch singing is consistent with the VSL schema, where impaired perception would hinder self-monitoring during performance, affect auditory–motor mapping, and thereby limit error correction (but see below for the possibility of covert perception).

Poor-pitch perception characterizes a condition referred to as “congenital amusia” (Peretz, 2001; Ayotte et al., 2002; Peretz et al., 2002; Peretz and Hyde, 2003). Amusics are typically unable to tell the difference between melodies differing by a single note, exhibit difficulties in pitch discrimination, and thereby cannot recognize familiar tunes (Ayotte et al., 2002; Peretz et al., 2002; Hyde and Peretz, 2004). Congenital amusia is a neurogenetic disorder observed in about 4% of the general population (Kalmus and Fry, 1980; Peretz and Hyde, 2003; Peretz et al., 2007) and is associated with brain anomalies in the auditory cortex, inferior frontal cortex, and reduced connectivity between these areas (Hyde et al., 2006, 2007, 2011; Mandell et al., 2007; Loui et al., 2009). In a recent study we showed that poor-pitch singing and perceptual deficits are in general associated in congenital amusia (Dalla Bella et al., 2009). Eleven individuals with congenital amusia (determined with the Montreal battery of evaluation of amusia, MBEA; Peretz et al., 2003) sang a familiar melody from memory. Nine of them were inaccurate in producing pitch intervals when singing with lyrics. However, more than half of them could not sing more than a few notes when asked to perform the same tune without lyrics (i.e., on a syllable), a condition which was expected to improve accuracy (Berkowska and Dalla Bella, 2009a). The pattern of results may have arisen because of weak memory traces of the musical components of songs (e.g., Dalla Bella et al., 2009). In general, amusics who were the least accurate in producing pitch intervals also exhibited the highest pitch discrimination thresholds (i.e., low sensitivity to pitch difference) in a perceptual task (Hyde and Peretz, 2004), a finding consistent with the hypothesis that perception and action are tightly coupled in vocal performance (but see below for exceptions). Similar impairments in pitch production in amusics were observed with pitch-matching tasks (Hutchins et al., 2010a), showing in addition that, because of their perceptual disorders, amusics do not benefit from perceptual information (e.g., additional feedback) to improve or correct their performance. In sum, in general congenital amusics are inaccurate in singing from memory and pitch-matching tasks, a deficit associated with their impoverished pitch perception.

Dissociations between Perception and Action in Tone Deafness

That poor-pitch singing is typically associated with perceptual disorders in congenital amusia does not entail that inaccurate singing cannot occur in isolation. Indeed, deficient motor planning or inaccurate auditory–motor mapping, regardless of spared perception, are sufficient conditions leading to poor-pitch singing (see the VSL schema). Accordingly, some individuals exhibit poor-pitch singing without deficient pitch perception (Bradshaw and McHenry, 2005; Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Wise and Sloboda, 2008), a condition referred to as “purely vocal tone deafness” (Dalla Bella et al., 2007). For example, in a previous study we found that among 15 occasional singers asked to sing a well-known melody at a slow tempo, a condition which should have increased pitch accuracy, 13 sang proficiently at a slow tempo; yet, 2 participants were still inaccurate singers (Dalla Bella et al., 2007), with sung intervals departing by at least 1 semitone from the notated intervals (vs. 0.3 semitones on average for other participants). Moreover, participants were able to detect pitch and time incongruities in unfamiliar melodies, thus showing normal perception. A similar dissociation between perception and action is found when considering pitch-matching tasks and imitation (Pfordresher and Brown, 2007). In a study by Pfordresher and Brown (2007), 13% of the tested sample of 79 non-musicians were poor-pitch singers, because the produced pitches departed by at least 1 semitone from the target. In spite of inaccurate pitch imitation, poor-pitch singers performed as accurately as proficient singers in a pitch discrimination task, confirming that poor-pitch singing is not merely the outcome of improverished perception (see also Wise and Sloboda, 2008, for additional evidence in favor of a dissociation between perception and action with pitch-matching tasks).

Surprisingly, cases of spared vocal performance with deficient perception are also described. In a study by Loui et al. (2008) congenital amusics imitated tone intervals and in a second task judged whether the second tone of a pair was higher or lower than the first. Both congenital amusics and controls could imitate pitch direction. Yet amusics were unable to detect pitch direction, thus suggesting that there may be two separate streams for auditory perception and action (Griffiths, 2008). The two streams are indicated in the VSL schema as overt and covert perceptual pathways. We replicated this finding in a group of five congenital amusics who had difficulties in discriminating melodies differing in terms of melodic contour. In spite of their perceptual deficit, they produced the correct pitch direction when they sang a melody from memory (Dalla Bella et al., 2009). Interestingly, the found dissociation between perception and action is not confined to pitch direction. In a study where we assessed singing proficiency in congenital amusia, we found two individuals who, in spite of severely deficient pitch perception as revealed by the MBEA, could sing with lyrics as proficiently as controls (Dalla Bella et al., 2009).

Dissociations in the pitch domain between perception and performance suggest that poor-pitch singing may stem from different sources of malfunctioning within the song system. The next step would be to try to clarify which mechanisms do not work properly within the VSL in different poor-singing “phenotypes” and whether (or to what extent) they are music-specific or rather general-purpose mechanisms which underlie vocal performance also in other domains such as language. A thorough description of the possible causes of poor-pitch singing goes beyond the scope of this article (for reviews, see Pfordresher and Brown, 2007; Berkowska and Dalla Bella, 2009b). Here we briefly summarize these accounts, which are most relevant to the question of domain-specificity, referring to the VSL (see Figure 1). Poor-pitch singing resulting from perceptual deficits (i.e., in congenital amusics) can be accounted for by the malfunctioning of (covert and overt) extraction of pitch information from the auditory input (herein, the “perceptual account”). The inability to extract pitch information hinders appropriate monitoring of the ongoing performance thereby leading to inadequate error correction and to diminished accuracy. Due to impaired perceptual monitoring congenital amusics are typically not aware of their deficit. However, the observation that production deficits can co-occur with spared perception (e.g., Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Wise and Sloboda, 2008) and that perception is spared in vocal amusia consequent to brain damage (Schön et al., 2004) suggest that in those cases post-perceptual mechanisms are faulty. Pfordresher and Brown(2007; but see also Mandell et al., 2007) have proposed that these processes rather involve sensorimotor integration (herein, the “sensorimotor account”), or memory retrieval/motor planning (e.g., Pfordresher and Brown, 2007; Wise and Sloboda, 2008).

The sensorimotor account deserves particular attention. According to this account, a correct auditory representation of the vocal performance (and, for example, of feedback) would be inaccurately mapped to motor representations for phonation, which is spared in itself. Inaccurate mapping may concern the reproduction of local musical features (absolute pitch, and secondarily pitch intervals) without affecting global features (e.g., melodic contour; see Pfordresher and Brown, 2007). In addition, the relative independence of measures of absolute and relative pitch accuracy reported in poor-pitch singers (Dalla Bella and Berkowska, 2009) suggest that mapping relative and absolute musical features to motor movement may be selectively disrupted. That production of absolute and relative pitch features may engage at least partly independent mechanisms is supported by differential effects of feedback on pitch accuracy (i.e., choral singing enhances pitch accuracy in producing intervals and contour, but is detrimental for producing absolute pitch; Pfordresher and Brown, 2007). The sensorimotor account is plausible from a neurobiological point of view. The idea that poor-pitch singing results from disrupted or underdeveloped pathways bridging perception and action is supported by recent evidence of abnormally reduced connectivity of the fasciculus arcuatus (i.e., a pathway connecting temporal and frontal brain areas) in tone deafness (Loui et al., 2009).

In the following section we will focus on the question of domain-specificity of the pitch deficits reported in poor-pitch singers; particular attention will be paid to perceptual and sensorimotor mechanisms.

Does Inaccuracy in Pitch Production Extend to Speech in Tone Deafness?

The findings reviewed so far indicate that there is a variety of deficits under the label “poor-pitch singing” and which refer to the malfunctioning of different components within the VSL. Are these deficits the result of disrupted mechanisms which are specifically engaged in vocal production of music, or rather equally subserving other functions, such as speech production? Indeed, pitch plays a prominent role in the structure of both speech and music (for a thorough review, see Patel, 2008). Whether music and language are subserved by independent or shared neuronal networks has been the object of a number of studies, mostly in the area of perception. For example, the recent modular account of melody perception and recognition proposed by Peretz and Coltheart (2003) assumes that music and language are mostly independent systems, including a network of processes triggered selectively by music or speech material (but see increasing evidence of shared mechanisms across the two domains, e.g., Hickok et al., 2003; Koelsch, et al., 2009; Williamson et al., 2010a). The separate domain view is based on long-standing neuropsychological evidence of double dissociations between music and speech processing in patients with brain damage and, more recently, in individuals with congenital amusia (for reviews, see Dalla Bella and Peretz, 1999; Peretz, 2001; Peretz and Hyde, 2003; Peretz and Zatorre, 2005). For example there are cases of patients who are unable to recognize familiar tunes, while they can normally recognize lyrics and speech in general. The opposite condition is also observed. A brain insult can disrupt the ability to recognize spoken words while leaving intact the ability to recognize music (see Peretz, 1993; Peretz and Zatorre, 2005; Stewart et al., 2009).

Nevertheless, far from reflecting complete integration or total independence of music and language mechanisms, current theories rather propose a more mitigated (and probably realistic) account. Even scholars strongly inclined toward a modular account of music and speech perception (e.g., Peretz and Coltheart, 2003) acknowledge that the representation of pitch direction is likely to be common to music and speech (i.e., melodic contour and intonation, respectively). This conclusion is supported by evidence from patients with brain damage, who display deficits in perceiving both melodic contour and speech intonation (e.g., Patel et al., 1998). Moreover, in spite of early claims that individuals with congenital amusia have spared pitch processing in speech perception (Ayotte et al., 2002; Peretz et al., 2002), further studies revealed impaired discrimination of speech intonation in this condition (e.g., Patel, 2008). Congenital amusics exhibit a deficit in processing fine-grained pitch differences, irrespective of the domain. Because pitch differences underlying prosodic differences in speech (e.g., between questions and statements) are coarser than in melodies, impaired pitch perception is more visible within a musical context than in speech. Deficits in the two domains are detectable in congenital musics provided that pitch variation is comparable across the domains (Hutchins et al., 2010b).

Dissociations between music and speech in pitch production are also reported in brain-damaged patients. A common observation in clinical neurology is that non-fluent aphasics exhibit major difficulty in speaking intelligibly whereas they can produce recognizable songs (e.g., Assal et al., 1977; Yamadori et al., 1977; but see Hébert et al., 2003; Racette et al., 2006; see also Gordon et al., 2006, for a review). In contrast there are cases of amusic patients who cannot sing while they can speak normally (e.g., Peretz et al., 1994). For example, Schön et al. (2004) reported the case of a tenor singer with lesions in right frontotemporoparietal regions, a pure case of vocal amusia exhibiting a specific deficit of the production of musical intervals. Interestingly, rhythm and contour were spared, as was musical perception and language abilities. In particular, perception and production of pitch variations in speech (i.e., intonation) were not impaired. For example, the patient was able to read texts using the appropriate accentuated expression. However, notice that poor singing often co-occurs with linguistic deficits resulting from left-hemisphere damage (e.g., Benton, 1977). Furthermore, bilateral hemispheric involvement in sung performance is substantiated by evidence that lesions in either of the two hemispheres impair sung performance (Kinsella et al., 1988; Prior et al., 1990; for a review of further evidence, see Berkowska and Dalla Bella, 2009b). In sum, data coming from the study of brain-damaged patients indicate, as observed in perception, that pitch production in music and language can be independently disrupted by a brain injury, pointing toward different underlying mechanisms. However, the co-occurrence of singing and speech deficits, and the fact that singing involves both hemispheres suggests that some mechanisms may be shared (e.g., production of prosody, as observed in perception).

Studies on poor-pitch singing in tone deafness are theoretically an important source of evidence to test the domain-specificity of the mechanisms underlying pitch production. An intriguing question would be whether poor imitation in poor-pitch singers is similarly observed when pitch variations occur in a linguistic context. Since poor-pitch singing is mostly the outcome of perceptual deficits (see the perceptual account, above) or from inaccurate auditory–motor mapping (see the sensorimotor account), it is possible that inaccurate pitch processing is observed in both speech and singing. Indeed, poor-pitch singers who do not perceive pitch accurately (i.e., congenital amusics, Dalla Bella et al., 2009) also show difficulties in treating pitch in a linguistic context (e.g., Hutchins et al., 2010b). Moreover, auditory–motor integration is likely to be underpinned by the same neuronal substrate during speech and singing (Hickok et al., 2003; Pa and Hickok, 2008). In sum, impaired processes within the VSL underlying poor-pitch singing may also affect pitch production in the context of speech. This possibility is consistent with recent evidence showing that linguistic background (e.g., comparing tone vs. non-tone languages) shapes both perception and imitation of musical pitch (Pfordresher and Brown, 2009).

In a recent study conducted in our laboratory (Dalla Bella and Berkowska, in preparation) we examined imitation of pitch in music and speech contexts in AZ, a tone-deaf individual. AZ is a university student with 14 years of general education and without musical training. Her performance on singing from memory tasks and familiar melody imitation tasks shows that AZ is highly inaccurate in pitch production. For example, she deviates from the correct pitch intervals on average by 1.4 semitones (vs. 0.4 semitones for matched controls), and makes on average 6.4 contour errors (vs. 1.5 for controls). Interestingly, poor-pitch singing is not accompanied by major perceptual deficits. AZ normally perceives interval differences; yet, she reveals slightly impaired perception of melodic contour as shown by the MBEA (Peretz et al., 2003). That AZ’s perception is mostly intact is confirmed by the fact that she enjoys music, and also paradoxically her own singing, which indicate that AZ, as is often the case in tone deafness, is not aware of her disorder. In order to assess whether AZ’s poor-pitch singing extends to imitation of pitch in speech context she was asked to perform an interval imitation task. AZ and control participants imitated a short spoken or sung fragment with words (e.g., “klub gra mecz,” eng. “the team is playing the game”). Sentences to be imitated were questions (i.e., with ascending intonation) and statements (i.e., with descending intonation). Similarly, sung stimuli had an ascending or descending contour. The material to be imitated was recorded by a professional singer and manipulated so that the spoken and sung fragments had the same pitch content. Accuracy of pitch imitation was computed using acoustic analyses (as in Dalla Bella et al., 2007, 2009). Accuracy in imitating relative pitch (i.e., pitch interval deviation) and absolute pitch (i.e., transposition error) for AZ and three control participants is reported in Figures 2 and 3 respectively. Pitch interval deviation is the absolute deviation of produced intervals from the intervals to be imitated (in semitones). Transposition error is the absolute deviation of the first produced pitch from the first pitch of the stimulus to be imitated (in semitones). As can be seen, AZ was very inaccurate both in terms of absolute and relative pitch when imitating pitch in a musical context. Yet, she was comparable to controls when imitating the same pitch intervals while repeating sentences. Note that in this case the observed differences cannot be accounted for by differences in interval size between speech and music material, since pitch variations were the same in both cases. To our knowledge, this dissociation between pitch production depending on the context (i.e., musical or linguistic) is reported for the first time in poor-pitch singers. This finding suggests that the mechanisms underlying imitation of pitch differences in music and language may enjoy functional separability.

FIGURE 2

Figure 2. Imitation of pitch intervals by a poor-pitch singer (AZ) and three matched control participants, (A) in spoken utterances and (B) in a musical context (i.e., singing). Accuracy in terms of relative pitch (pitch interval deviation) is reported.

FIGURE 3

Figure 3. Imitation of pitch intervals by a poor-pitch singer (AZ) and three matched control participants, (A) in spoken utterances and (B) in a musical context (i.e., singing). Accuracy in terms of absolute pitch (transposition error) is reported.

Conclusion

In the present article we sought to review evidence regarding poor-pitch singing, as compared to normal singing, and briefly overview the causes responsible for this condition. Poor-pitch singing may result from different sources, as indicated in the VSL schema. Particular attention was paid to perceptual mechanisms and sensorimotor mapping. Malfunctioning of these mechanisms can result in poor performance in tasks, such as singing familiar melodies from memory or vocal imitation, leading to a variety of disorders. Interestingly, these processes are similarly crucial for pitch production in a language context. Therefore, co-occurrence of pitch production deficits in music and language would be expected. In this case we focused on poor-pitch singing in a population of otherwise normal individuals (i.e., with tone deafness). Previous studies have demonstrated that pitch perception in music and language are not completely independent in tone deafness (e.g., the case of melodic contour and speech intonation). However, we provided here preliminary evidence in a tone-deaf individual showing that imitation of pitch intervals is very inaccurate in terms of both absolute and relative pitch while singing whereas no such deficits are observed while speaking. This intriguing finding points to an independence of mechanisms subserving imitation in music and language in the production domain. Further studies on a larger sample of tone-deaf individuals are required to examine whether these findings more generally characterize this condition and eventually to clarify which mechanisms within the VSL are domain-specific.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This research was supported by a research grant from the European Commission (ITN-EBRAMUS, n. 238157), and by internal grants from the University of Finance and Management in Warsaw. We thank two anonymous reviewers for helpful comments on an earlier version of this manuscript.

References

Ackermann, H., and Riecker, A. (2004). The contribution of the insula to motor aspects of speech production: a review and a hypothesis. Brain Lang. 89, 320–328.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Ackermann, H., Wildgruber, D., and Riecker, A. (2006). “Singing in the (b)rain: cerebral correlates,” in Music, Motor Control and the Brain, eds E. Altenmüller, M. Wiesendanger, and J. Kesselring (Oxford: Oxford University Press), 205–222.

Amir, O., Amir, N., and Kishon-Rabin, L. (2003). The effect of superior auditory skills on vocal accuracy. J. Acoust. Soc. Am. 113, 1102–1108.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Apfelstadt, H. (1984). Effects of melodic perception instruction on pitch discrimination and vocal accuracy of kindergarten children. J. Res. Music Educ. 32, 15–24.