Skip to main content


Front. Commun., 28 July 2021
Sec. Language Sciences
Volume 6 - 2021 |

Swallowing in Conversation

  • Centre for Advanced Studies in Language and Communication, Department of Language and Linguistic Science, University of York, York, United Kingdom

Swallowing—a complex physical process that involves closure of the mouth and nasal cavities, as well as the glottis, and the raising and lowering of the larynx—is at the boundary between speech and the body, yet almost nothing is known about how it works in conjunction with speech in spoken interaction. Research into swallowing, mostly in speech therapy, has explored the articulations required, how long it takes the bolus to pass through the mouth to the stomach, and the sounds that occur on the way. In the phonetics literature, swallowing is regularly excluded from study: in experiments, tokens with swallowing are excluded; and while swallowing is used to set up certain experiments, its effect on speech is not the object of such studies, though it is sometimes mentioned as a possible action during a stretch of silence, as in word search. Although speaking and swallowing are mutually incompatible, in conversation, swallowing has to be coordinated around the processes of speaking. It can be part of the preparations for speech; it can also occur within and after stretches of speech. While swallowing has been marked in conversation analytic transcripts in several languages, it is almost never commented on. Like sniffing, crying or laughing, swallowing occurs in the vocal tract and may accompany speech, but is not considered as part of the stream of speech. It is clearly related to drinking, which (Hoey, 2015; Hoey, 2017; Hoey, 2020b) shows is strategically placed in the sequential unfolding of talk. In the same spirit, this paper will treat swallowing as an interactional resource which is bound up with language, and which has particular affordances and demands. This paper fills a gap in our knowledge, by focusing on swallowing that is embedded within, before, or after stretches of speech. It considers the phonetic, linguistic and interactional features of swallowing. It thus explores how verbal conduct is intertwined with one aspect of bodily conduct.


Swallowing—a complex physical process that involves closure of the mouth and nasal cavities, as well as the glottis, and the raising and lowering of the larynx—is at the boundary between speech and the body, yet almost nothing is known about how it works in conjunction with speech in spoken interaction.

Like sniffing, crying or laughing, swallowing occurs in the vocal tract and may accompany speech, but is considered marginal to speech (see Keevallik and Ogden, 2020, and papers therein). It is clearly central to eating and drinking, which (Hoey, 2015; Hoey, 2017; Hoey, 2020b) shows can be strategically placed in the sequential unfolding of talk. In the same spirit, this paper treats swallowing as an interactional resource which is bound up with language, and which has particular affordances and demands.

Studies of swallowing in speech therapy focus on the physical processes of swallowing, mostly in isolation, or swallowing food or drink, but not alongside or within talk. In the phonetics literature, swallowing is regularly excluded from study: in experiments, tokens with swallowing are excluded; and while swallowing is used to set up certain experiments (e.g. Faucher et al., 2019), its effect on speech is not the object of such studies, though it is sometimes mentioned as a possible action during a stretch of silence, as in word search (Ogden, 2013; Belz and Trouvain, 2019).

This study fills a gap in what is known about swallowing, by considering how it works in one of its indigenous environments: talk-in-interaction. The paper draws on a variety of data, including audio and video data, primarily from the United Kingdom. The examples are tokens of swallowing where participants are not also eating or drinking, or indeed tasting, of which swallowing may be a visible and prominent element (Mondada 2020: 149).

Background offers a brief survey of what is already known about swallowing. I describe the physical process of swallowing and its audible and visible effects, and review what is known about swallowing from studies in both Conversation Analysis and elsewhere.

A primary question of the study is where in talk people audibly (and visibly) swallow. I show the placement of swallowing relative to the online phonological and syntactic construction of a turn at talk. I show that swallows that project more talk (Swallows in the Context of Projecting More Talk) and swallows that project no more talk cooccur with different syntactic, prosodic and phonetic features. Swallowing and Affective Displays looks at examples of swallowing embedded with affective displays, including sobbing and facial and verbal displays of “trouble”.


The Physiological Process of Swallowing

Swallowing is the process of moving a ball of food or liquid (bolus) from the mouth to the esophagus and then into the stomach. This is accomplished by a complex series of voluntary and involuntary actions which are tightly coordinated with each other. Firstly, the tongue pushes the bolus to the back of the mouth. Secondly, the bolus is passed into the pharynx. At this point, the soft palate is raised, sealing off the nasal cavities and making nasal airflow (including therefore breathing) impossible; the vocal folds close, the larynx rises, and the epiglottis covers and protects the larynx (forming an epiglottal stop: Esling et al., 2019: 53), and prevents the bolus passing into the lungs. Finally, the bolus moves to the esophagus, and from there it is pushed into the stomach through muscle contractions.

The action of swallowing is incompatible with speaking, because the closures at the lips, glottis and velum mean that the vocal tract is temporarily sealed off, and the airflow required for speech is not possible. Later sections will show how swallowing affects surrounding speech, and how swallowing is placed within talk.

Sounds of Swallowing

Although speech is not possible during swallowing, the biomechanical movements of swallowing do produce a number of sounds. These sounds are generally rather quiet, or inaudible; and they have much lower amplitude than speech. In speech therapy studies they have mostly been examined by using a stethoscope placed above the larynx while being asked to swallow something, usually a thickened liquid; or by placing a microphone in the same location (Ferruci et al., 2013).

A study by Morinière et al. (2008), on 75 recordings of 15 individuals, identified three common acoustic components during swallowing: 1) the laryngeal ascension sound, 2) the upper-sphincter opening sound, which was found in all their recordings, and 3) the laryngeal release sound. The laryngeal ascension sound is rather low in intensity, so is heard as quiet. The upper-sphincter opening sound was found in all their recordings, and is the sound of the bolus flowing through the pharynx, and corresponds to the “gulping” sound most commonly associated with swallowing. On average it lasts 185 ms in their data (approximately the duration of a long vowel in English). The laryngeal release sound, like the ascension sound, is quiet and not always present. The laryngeal ascension and release sounds are shorter (average 106 and 72 ms respectively), transient, click-like sounds.

Swallowing can take between 0.25 and 0.8 s. The average total duration of a swallow is around 0.4 s, with an average intensity of around 44 dB, which is quiet (Cichero and Murdoch, 2002). On average, the swallowing sounds of females are higher in timbre than those of males; for males, there is more variability in the timbre depending on the size of the bolus (Cichero and Murdoch, 2002: 630). The same study showed that subjective discrimination of swallowing sounds was fairly reliable: they were recognized more than 70% of the time, and when the bolus was 15 ml, they were distinct 90% of the time.

These findings mean that it is reasonable to use auditory data to detect swallowing, and that swallowing may be audible for participants in conversation.

Although swallowing is not compatible with speaking, it affects the production of speech before and after the swallow occurs. During swallowing itself, the vocal folds are closed, so exhalation–a prerequisite for the vast majority of speech sounds–is not possible. In addition, the lips are closed and the velum raised, so neither ingressive nor egressive airflow can occur. In short, speech is physically not possible during swallowing. However, swallowing can take place before, during or after the act of speaking, and sometimes its effects are audible within speech.

The acoustic properties of speech can be affected by swallowing shortly before its onset and offset. The raising of the larynx required while swallowing shortens the vocal tract. The movement of the larynx produces changes in the voice quality; a raised larynx is associated with higher F0 (Honda, 2004; cited in Esling et al., 2019: 95). The change of the length of the vocal tract changes the natural resonances of the vocal tract. Since the movement of the larynx is pretty rapid, these resonance changes are also rapid. The data in this paper does not allow further investigation into the acoustic effects of swallowing on speech.

Once the swallow is complete, adjustments need to be made to the vocal tract to produce speech. These adjustments include e.g. separation of the lips, and the removal of the tongue from the roof of the mouth, resulting in lipsmacks and clicks.


Example 1:. Vegtalk BBC Radio 4 19.12.03 forage


Example 2:. Rinder 18/01/2016:[11:50]1


Example 3:. RCE 25 Bench 16:04 no funding


Example 4:. Repair Shop [20/07/2019, 24:04] Jewellery box2


Example 5:. RCE25 Bench 19:11 grant


Example 6:. RCE25 Bench 06:14 Lawrence Sterne’s burial place


Example 7:. Repair Shop [21/04/2019, 40:01] China cup3


Example 8:. Repair Shop 7/8/19 [36:50] Portuguese guitar4


Example 9:. Repair shop 20/07/2019 [23:13] Jewellery box5


Example 10:. RCE14 Colleagues 00:22:42 ELR


Example 11:. Rinder 24/04/2018 [21:33]6

The sounds of swallowing are illustrated in Example 1. The speaker, Sue, has projected a two-parted answer to a question from Charlie about why Britons do not forage. The swallow comes at the end of the first part of her answer, and just before the second, already projected, part.

The final [m] of “and-uhm” is relatively short, and there is an abrupt drop in volume, so it sounds cut off. Between the end of [m] and the onset of “I think” is a gap of 620 ms, during which the swallow occurs. Two transients (audible as momentary popping sounds) are visible, marked as T1 and T2 in Figure 1. T1 is the laryngeal ascension sound. T2 which is louder, and whose energy is in the F2 region, is the upper sphincter opening sound, and the sound of saliva passing down the esophagus. It lasts about 100 ms. Both of these sounds are low in intensity in comparison with the speech that surrounds them. The swallow is released with a click (marked C) just after 4.8 s.


FIGURE 1. Spectrogram and waveform of a swallow + click combination from Example 1. T1 and T2: transients relating to phases of swallowing. C: click sound on release of the swallow.

This stretch of talk has a very noticeable rhythmical organization. The asterisks in Figure 1, have been placed at amplitude and f0 peaks in the signal (see Ogden and Hawkins, 2015 for a complete description of the method). These mark the approximate location of rhythmical beats. “And-uhm” has two beats; the next beat in talk comes on “I” at around 4.9 s. The swallow occurs during a silent beat, marked (*). Rhythmicity can be seen in the approximately equal intervals in time between the marked beats: i.e. the beats are isochronous, and this generates a sense of rhythmicity. Rhythmicity in turn generates coherence across the gap, projecting moments in time with which further speech events can be coordinated (cf. Ogden, 2013: 314–316, on clicks used as metronomes with the same function), and tying the talk after the swallow with the talk before it. Interestingly, the swallow is timed in such a way that the return to talk happens on beat with prior talk, so while the swallow disrupts the flow of surrounding talk, it is also fitted to aspects of the production of that talk.

The swallow comes just after the second reason of two—already projected in line 1, with “two main reasons”—has been projected with “and-uhm”: it occurs at a place of “maximal grammatical control” (Schegloff, 1996: 93). The click, which occurs immediately before the second reason is presented, bears some resemblance to a “new sequence indexing click” (Wright, 2007; Wright, 2011), in that the swallow and the click are placed at a structural juncture, where the material after the swallow + click is the start of something new (in this case the second projected reason).

As we will see from later examples, swallows are quite regularly positioned within speech so as to accommodate the action of speaking, on both the syntactic and prosodic front.

Swallowing as Silence

Although swallowing may produce noises, swallows are often inaudible. Silent or inaudible swallows cannot therefore be transcribed from audio data; in addition, transcribers may decide a priori that such events are not worthy of transcription. Belz and Trouvain (2019) and Trouvain et al. (2020) note that many things labeled as “silences” in phonetic studies in fact include sounds such as in-breaths and clicks—swallows could be added to this list.

Visible Effects of Swallowing

While the sounds of swallowing are often hard to observe, visible signs of swallowing are often more accessible. The upward then downward movement of the larynx is accompanied by movements of muscles and bones in the neck. The following things can commonly be seen during swallowing:

• the lips may be tightly pressed together (cf. Peräkylä and Ruusuvuori, 2012: 77)

• tendons in the neck may be visible as the larynx is raised and lowered

• the upward and downward movement of the larynx may be seen

• there may be a forward movement of the chin, straightening out the pharynx

Some of these features are visible in Figure 2, which is taken from Example 5.


FIGURE 2. Images of swallowing from Example 5. The speaker (pictured) says “Belinda got-uhm:① (0.7 SWALLOW ②) CLICK ③ a ([ei]) (0.6) grant”. (A): taken at the end of “uhm”. Note the tightly pressed lips with the outer surfaces pressed inwards. (B): taken during the swallow. Note the visible tendons in the neck as the larynx is raised. (C): the swallow is released into a click, and the lips are opened.

The visibility of swallowing in video data is contingent on the positioning of the camera relative to the speaker, the visibility of the neck (perhaps because of clothing), and the speaker’s own physiology. Such contingencies mean that swallows may not be visibly accessible to the analyst, depending on the data recording.

Swallowing in Spoken Interaction

In the main disciplines to have considered swallowing–phonetics and speech therapy studies–swallowing is dislocated from speech, and is treated as an action by itself.

In phonetic studies, swallowing is predominantly mentioned in two speech contexts. The first one is in setting up ultrasound experiments, where swallowing liquid helps the experimenter to establish the line of the hard palate. However, this is only part of the set-up, and not an element of any study, so any data on swallowing is discarded. Secondly, swallowing is mentioned as a reason to exclude data samples from experimental study, since it is treated as a disfluency, and experiments in general require speech to be fluent.

In speech therapy studies, the main area of interest is dysphagia, where one or more aspect of swallowing is not working properly. Most of these studies are interested in the physiology of swallowing, and so they focus on what happens when a participant attempts to swallow something that has been ingested. Swallowing is therefore treated as a process by itself, separate from speech.

In Conversation Analysis, swallowing has rarely been commented on, although examples of it appear in published transcriptions in several languages. It has been mentioned in the context of crying (Hepburn, 2004; Hepburn and Potter, 2012: 200) and drinking (Hoey, 2020b); but little is said about the placement of swallowing in speech, or its effects on speech.

This paper fills a gap in our knowledge, by focusing on swallowing that is embedded within, before, or after stretches of speech. It considers the phonetic, linguistic and interactional features of swallowing. It thus explores how verbal conduct is intertwined with one aspect of bodily conduct.

The Syntactic Placement of Swallows in Talk

Swallowing has been marked in conversation analytic transcripts in several languages: e.g. English (Schegloff, 1988: 226), Estonian (Laanesoo and Keevallik, 2017: 294–5), German (Selting, 2012: 405), Italian (Rossi, 2015: 41–2), and Norwegian (Sikveland and Ogden, 2012: 176). However, it is almost never commented on. A survey of the placement of swallows in these transcripts shows that they can occur before the verbal components of a TCU (Hepburn, 2004: 260; Laanesoo and Keevallik, 2017: 294–5); in the middle of a syntactic clause (Schegloff, 1988: 226; Hepburn, 2004: 285; Sikveland and Ogden, 2012, 176; Ogden, 2013; 311); or as a standalone (Hepburn, 2004: 273). Thus swallows occur either in places which do not disrupt the syntactic structures of the talk in progress (e.g. where placed in pre-TCU position), or in positions of what Schegloff calls “maximal grammatical control” (Schegloff, 1996: 93).

One of the goals of this paper is to explore where swallows are embedded within talk, and what the affordances of swallowing in such positions are. In addition to the positions noted above, we will show examples of swallows that are produced post-completion, making them similar to some clicks (Ogden, 2013; Ogden, 2020), sniffs (Hoey, 2020a) or sighs (Hoey, 2014).

Swallowing and Displays of Emotional Affect

As well as being a somatic necessity, swallowing is associated with heightened affective states and crying or sobbing. The spontaneous swallowing rate has been shown to increase with emotional arousal (Fonagy and Calloway, 1985; Ritz and Thöns, 2006). In an experimental setting, Cuevas et al. (1995) found that heart rate, limb movement, sweat production and swallowing all increased in conditions of heightened emotional arousal, whereas they all dropped in a low arousal condition.

Roach et al. (1998): 87 treat “gulping” (which we take as a form of a loud, audible, swallow) as a reflex:

… an involuntary indication of genuine emotional stress. Extreme emotional states produce altered patterns in respiration, the endocrine system, and the metabolism in general, which may result in audible changes to speech.

There exists the possibility that such reflexes are not always involuntary, but may be consciously used to convey a particular emotional state. Scherer (1985) makes this distinction in his discussion of unconscious “push-effects” versus conscious “pull-effects”.

There seem to be no empirical studies exploring how swallowing is connected to displays of affective states in natural speech. If experimental findings translate to everyday settings, we would expect swallowing to be more frequent in affective displays. Hepburn (2004) is one of the few CA studies which mentions swallowing explicitly, in the context of crying.

If swallowing can be recruited as part of a display of an affective state, as a “pull-effect”, then we would expect to find that there are orderly practices for embedding it within language, alongside other linguistic practices around the display of emotion. While this paper does not contain enough data to provide an unequivocal analysis of the association between swallowing and displays of emotional affective states, it does contain cases where swallowing prefigures such a display, or avoids one.

Data and Methods

Sources of Data

The language of the data is British English. The examples presented in this paper come from three main sources:

(1). Rossi Corpus of English (RCE). RCE was recorded in York in 2011. It consists of conversations between colleagues and friends in a natural setting. Most of the data comes from RCE14, Colleagues (two British speakers, one male, one female), and RCE25, Bench (two female speakers, one North American, the other British), because these two recordings provide clear visual access to the participants’ necks, so that swallowing is visible. The RCE data includes high quality audio files, which make closer acoustic analysis possible. Altogether, RCE14 and RCE 25 amount to 56 min of data, and they yielded 14 clear examples of swallowing.

This data was complemented by publicly available sources of data which contain other kinds of social interactions. These are from edited, but unscripted, British reality TV shows:

(2). Repair Shop. Repair Shop is a British TV program where people bring in objects that are broken, to get them mended. They present their items and tell a brief story about their sentimental value. They return to the repair shop to collect these items some time later. The collection draws especially from the return visit, where the repaired and restored items are revealed. This is often a moment for a display or outpouring of emotion. In total, 12 episodes were inspected (a total of 8 h 45 min), with 35 objects repaired and a total of eight swallowing episodes on the return of repaired items. The data is British English.

(3). Judge Rinder. Judge Rinder is a British TV program mimicking a small claims court. While it has entertainment value, it often puts the plaintiffs and defendants in emotionally charged positions. Two episodes yield three examples of swallowing; the data is British English.

The figures provided in this list should be treated with caution: given the limitations of both audibility and visibility of swallowing, they certainly do not capture all instances of swallowing, and it is not possible to draw robust conclusions about the frequency of swallowing from this data.

None of these sources allow for control over factors important to traditional sociolinguistics, such as gender, age or origin of the speaker. As with other “liminal” phenomena within speech (Dingemanse, 2020; Keevallik and Ogden, 2020), it is possible that there is individual variation in the frequency with which such items are produced. For swallowing, any variation may not be consistent for a given individual, for physiological reasons, such as temporarily having a dry mouth, or crying.

Data for Repair Shop and Judge Rinder were collected from broadcasts available via Box of Broadcasts. Ethical approval was granted by the ethics committee of the Department of Language and Linguistic Science at the University of York in accordance with the University’s ethical framework.

Selection Criteria

Like breathing, swallowing is a somatic function which mostly goes unnoticed. Not all in- or exhalations are audible; and not every swallow is audible or visible either. Therefore the focus of this paper is moments in talk-in-interaction where swallowing is either noticeably (which is not to say deliberately) visible or audible, or both. This means that there are many instances of swallowing in the data sources which are not (and cannot be) included in this collection. This is an inevitable consequence of the fact that swallowing is only sometimes perceptible to an observer. While it means that the analysis is not exhaustive and does not account for all occasions on which people swallow in interaction, the resulting situation is comparable with that of breathing in conversation: the in- or out-breaths that can be observed are the ones which are transcribed, and are available for analysis. It is a reasonable assumption that swallows which cannot be observed are predominantly vegetative.


Transcripts mark accentuation and intonation following the GAT conventions for English (Couper-Kuhlen and Barth-Weingarten, 2011). Swallowing and other physical activities are presented between double parentheses, with the duration, where available, presented first. Concurrent bodily activities are shown with a “+”.


The data were analyzed using the methods of Conversation Analysis and Interactional Linguistics (see e.g. Clift, 2016; Couper-Kuhlen and Selting, 2017). The main task of this paper, as in Ogden (2020), is to establish what the more general principles are by which such events are understood by participants, such as the sequential and rhythmical positioning already seen in Example 1. For this reason, individual pieces of data were considered with respect to aspects of their linguistic design, sequential positioning, and participants’ orientations to swallowing. Both visual and audible information were taken into account in the analysis in the case of video data.

Swallows in the Context of Projecting More Talk

Swallows can occur where more talk is projected through syntactic, prosodic and turn organizational structures. In these cases, they are placed at points in the emerging talk that suggest a sensitivity to syntactic and prosodic structures, and to the progressivity of talk.

In Example 2, talk is projected through the sequential organization of an adjacency pair. Judge Rinder (JR) is questioning a young man (YM) about his education. In this example, YM does a swallow in pre-beginning position after JR’s first pair part.

The Judge’s question at line 3 presupposes that YM left school with qualifications. The first part of YM’s answer in line 5 implies that he left without qualifications, thus indirectly rejecting the presupposition of the question. The second part of the answer in line 7 mentions a BA, not the kind of qualification obtainable at school; so in the end the answer does refer to qualifications, but not the kind targeted by the Judge’s question. YM’s answer overall, then, is a complex one, which among other things has to deal with a problem in the presuppositions of the question.

This complex answer is preceded in pre-beginning position by a number of audible and visible articulations: he turns his head and opens his mouth to breathe in overlap with JR’s question; this results in a percussive with an in-breath (.thh), and is followed by a hesitation particle (“uhm”). These index incipient speakership, and thus display an orientation to the relevance of talk. There is then a swallow that is released into a click (arguably the most audibly salient part of the swallow from the participants’ perspective), then another hesitation particle and a self-repair. So in this case the swallow is part of a cluster of objects in pre-beginning position (Schegloff, 1996) which serve to delay the verbal part of the answer, a typical feature of turns with dispreferred formats (Pomerantz, 1984; Sacks, 1987; for a more phonetically grounded account, see; Kendrick and Torreira, 2015). The swallow itself is not audible, and so could be transcribed as a silence; but it is clear from visual evidence and auditory evidence through the click that there is a swallow.

Swallows in this context are part of a family of practices like in-breaths, clicks and changes in body posture: they index “preparing the vocal tract for speech”, so displaying an orientation to the relevance of speaking now, while simultaneously delaying but projecting talk.

In the next example, a swallow is placed between two clauses. Here, a subordinate clause initiated with when is first extended with two conjunctions, then the speaker produces a swallow (line 9), released into some lip smack noises, before the main clause (line 10).

In this example, the swallow is placed at a syntactic and prosodic boundary between two clauses within a multi-clause sentence. The “when” clause, extended with two “and” conjunctions, projects a main clause which has not yet been produced. The first clause at line 7 sets the scene for the story projected at lines 4–6. It is extended with two subsequent clauses in line 8, which extend the “when” clause again. So the ends of the clauses in lines 7 and 8 project more talk syntactically and pragmatically, and there is no TRP in these places. B does not make any move to come in during the gap where A swallows at line 9. The syntactic positioning of this swallow is different from the one in Example 1, as it occurs between two sentential clauses; it is closer syntactically to Example 2, where a swallow was placed at a high-level syntactic boundary.

It is hard to ascribe an action to the swallow in this case. If swallowing is a somatic requirement, then timing it so that it falls at a clause boundary means that it is less exposed in the interaction than if embedded within a lower-level constituent such as between “we” and “went” or “went” and “out”. This seems to be such a place: the coparticipant does not treat this as a TRP, and the current speaker, A, treats this as a suspension of her talk which is resolved by the syntactically fitted clause at line 10.

In Example 4, a swallow appears embedded within a TCU, at a major phrase boundary. Will has repaired a jewellery box which he is returning to Karen. This box belonged to Karen’s grandmother, but Karen did not know the box’s origin. Will has just opened the box before he explains to Karen that he discovered a scrap of paper in the box which they take as confirmation of the origin of the box.

In this case, the swallow is positioned within a sentence, at the boundary between a fronted prepositional phrase and the rest of the sentence. Although this is a major phrase boundary, the sentence itself is incomplete.

The two fronted adverbial phrases “interestingly enough” and “on the inside” are produced as separate intonational phrases, each with a final fall-rize intonation contour, which is commonly used to project more talk. The repetition of the contour facilitates the hearing of these two phrases as belonging to the same larger hierarchical unit, while at the same time projecting the rest of the sentence. Thus the placement of the swallow here displays an orientation to the unfolding syntactic and prosodic units: it is located at major boundaries where continued talk is projected through prosodic and syntactic structures, and Karen makes no move to come in at this point.

The swallow is positioned before material that completes the sentence, “there’s some old newspaper”. This turns out to be the key “news item” in Will’s turn in line 1: he goes on to explain how this discovery of the newspaper is what enabled him to establish the provenance and date of the jewellery box. This turns out to be news which receives a strongly positive assessment from Karen (line 17). As we will see in later examples, swallowing is frequently placed before talk which reveals something that is given an affective value by the participants.

Example 5 is an example of swallowing during a word search, where the swallow is positioned within a syntactic phrase and not at a major phrase boundary. A and B are sitting next to each other on a bench. They have been talking about how someone they both know has failed to get a research grant. The extract starts with B’s contrasting story in response, about how Belinda has been awarded a prestigious research grant. The swallow appears in a word search initiated with “uhm” and ended with a click before the searched-for word—see Wright (2005) for further details of similar practices.

In line 2, B is part-way through a TCU when she signals suspension of her talk with “uhm”. “Uhm” often indexes upcoming problems in production (Jefferson, 1974; Fox Tree and Clark, 1997), and as in other cases noted by Wright (2005) it marks the onset of a word search stretch.

The [t] of “got” is released with aspiration. “Got-uhm” has two syllables of equal metrical weight, and mid level tones. Wright (2005: 191) notes that this is a common intonational feature of pre word search stretches, and that it is a device for projecting an upcoming focal accent. It matches many of the features described in Local (2004) for “and-uhm” (see also Example 1). The talk is suspended at a point where the syntactic structure is also incomplete: the verb “got” requires a noun phrase as an object. Thus the syntactic and phonetic design serve to suspend the progressivity of the talk while simultaneously projecting certain features.

After the [m], B presses her lips tightly together (a more extreme articulation than for [m]; see Figure 2), then swallows. As she swallows, her head and her gaze direction tilt downwards. The swallow is released into a click, and the indefinite article that follows this is in full form (reminiscent of Jefferson’s 1974 observations on the full form of “the”, [ði], as part of an error correction device). During the silence that follows this, the articulations are visibly prepared for “grant”–in particular, the lips can be seen to be rounded in anticipation of [r].. (It is interesting to note that Wright, 2011: 220, on the basis of audio data, notes other cases where speakers produce tight bilabial closures which are held for quite a while before being released into percussives and/or clicks, often with an in-breath).

B’s gaze up to this point is away to the distance. However, she blinks and turns her head toward B as she reaches from the…. and her gaze is to A as she says “Paul Mellon Center”. So B’s gaze ehavior during the part of the turn where the click is produced suggests that she is still working on the production of her turn.

Swallows in word searches are one feature among others: hesitation particles, suspended prosodic and syntactic features, a click on release of the swallow. Wright’s (2005) observations on audio data match these observations very closely: she notes that features like these (including audible glottal closure, which must be present for swallowing) serve to retain the turn, and a co-participant does not generally come in. As noted earlier, many swallows are inaudible, and it is very likely that swallowing is a more common feature of word searches than can be gleaned from transcriptions, where they are probably under-represented, especially in audio-only data.

Examples 25 show that swallows can be placed at a point where talk by the same speaker is projected. In pre-beginning position (as in Examples 2), there are other features of delayed but incipient speakership, and usually before the swallow. A swallow in pre-turn position may function as a preparation for speaking: if audible or visible, it may be considered as removing the vocal tract of unwanted liquid before speaking is possible. It may thus come to index incipient speakership.

Where the swallows are located at syntactic and prosodic boundaries, these boundaries have syntactic, prosodic or sequence-organizational features that project more talk. These features appear before the swallow, making the silence during the swallow less susceptible to incoming talk from a co-participant. Although the progressivity of talk in these cases is temporarily halted, its completion is projected. It is noticeable that most of these swallows have an audible release, with clicks and lip smacks quite common. These sounds have been shown to project further talk (Ogden, 2013; Paschen, 2019; Pinto and Vigil, 2019; Kosmala, 2020).

Co-participants do not treat the gaps in talk that result from swallowing as TRPs.

All these features suggest that speech and swallowing are planned together: swallowing is not merely a somatic feature, independent of speech; but is rather intertwined with it. Swallows seem to come at a point after which further talk by the same speaker has already been projected.

Swallows in the Context of Projecting no More Talk

Swallowing also occurs in the context of projecting no more talk by the same speaker, thereby yielding the turn space. Many of these cases feature tightly closed lips, without subsequent lip smacks or clicks (an audible sign of release). Such swallows occur at points of syntactic and/or prosodic completion, including turn-final position. In these cases, swallowing serves as a non-verbal extension of a prosodically and syntactically complete TCU, similar to other post-completion expansions such as sighs (Hoey, 2014), clicks (Ogden, 2020) or sniffs (Hoey, 2020a), or a change of facial expression (Kaukomaa et al., 2015). According to Schegloff (1996: 90) minimal post-expansions bring a TCU to a close and offer a speaker to display “retroactive alignment toward it, or the consequences of it”. Swallows seem to index again that the just-finished TCU is in fact complete.

Example 6 illustrates this well, where a sequence-closing third is followed by a swallow (line 21), and then a new sequence of action is initiated.

A initiates an adjacency pair in line 1. There is a rather complex and non-aligned sequence in response, but “I’m not entirely certain … if he’s still there” in lines 13–19 provides a lexically and syntactically fitted answer from B, and is identifiable as the second pair part to line 1. A’s “yeah” in line 21 is a sequence closing third (Schegloff, 2007). It is followed by a swallow which is not accompanied by any click, lipsmack or in-breath, i.e. there are no signs that this swallow prefaces further talk immediately. Then there is a lapse during which B drinks, and both A and B look away from each other. Hoey (2020b: 110 ff.) shows that drinking can be used “as a display of the speaker’s commitment to unit completion”, and in this case it is an alternative to expanding the sequence. At line 25 A initiates a new topic. Thus A’s swallow at line 9, and B’s drinking at line 10, serve to underscore the closure of the question-answer sequence which is started at line 1 and verbally finished at line 21: the swallow is a physical action done on completion of a sequence-closing turn, and is one of the non-verbal features that mark the closing of the sequence.

In Example 7, Valerie is having a prize cup returned to her which her dad had won as a young man, and is the only such item she has left of his athletics career. For her the value of the repair to the cup makes up for not being able to “indulge him” while he was alive (line 23).

Valerie’s turn, lines 16–25, is complex. It starts with a recollection of an earlier interaction with Brenton, and launches a longer sequence where she contrasts her current feelings with her feelings earlier. In line 23, she contrasts her relationship with her mother with the one with her father, and introduces a sense of regret about her relationship to her father. At lines 24–25, she starts to describe how her feelings have changed. In just the place where she might verbalize her feelings (“it makes me feel … ”), there is a gap, and an in-breath initiated by an opening of her lips (.pth): this perturbation in the progress of the TCU already hints that she has trouble putting her feelings into words; it is clear from her face that she is starting to cry.

The TCU at lines 24–25 is syntactically and prosodically complete, though fragmented. It ends with her sobbing as she speaks, and at the end of the TCU she closes her lips tightly, and swallows.

Brenton treats this TCU (and with it, the longer telling started at line 17) as complete by producing a summary assessment at line 27 which Valerie’s brother acknowledges at line 30. The tight lips and swallow at line 26 seem to display Valerie’s inability to say more while displaying (but not verbalizing) in post-completion position her emotional investment in the repair she has had done: the swallow comes in the context of what for her is an emotional event. Brenton orients to Valerie’s display of strong emotions by going to hug her (lines 28–30).

In this case, then, swallowing is treated as marking the ending of a longer turn, which is a telling about strong and complex emotions, which are not easily verbalized by the speaker and which are interwoven with sobs. We consider the affective work of swallows more in the next section.

Given that swallowing requires complete lip closure and is incompatible with speech, post-completion swallows indexically reinforce the completion of a turn. In Examples 67 swallows present the talk in the prior turn as finished: the TCUs are complete syntactic and prosodic units, and they present complete recognisable actions which are treated as such by the participants.

In this section, I have shown that the positioning of swallows displays sensitivity to ongoing sequential, syntactic and prosodic units. In the next section, I will show how swallowing contributes to the display of affect within turns: that is, swallowing can laminate turns at talk to display something ostensibly about the speaker’s inner state.

Swallowing and Affective Displays

In some of the examples considered already, swallows are present in turns where a speaker displays an affective stance. Example 2, “neglected young man” is not merely an answer that challenges the presuppositions of the question; in challenging the presupposition of the judge’s question–that normally one leaves school with qualifications–the young man also publicly admits failure to a person in authority, before explaining a success. In Example 7, China Cup, Valerie talks about her satisfaction in making up for something they had not been able to do for her father before he died. There are elements of pleasure, gratitude and sadness in her response to the repaired cup.

In the examples considered in this section, I look more closely at some of the affective displays in the context of the swallowing. Common to several of these examples is a temporary display of being “lost for words”. Other co-occurring features are facial expressions that display trouble; and lexical choices that tend toward extreme case formulations (Pomerantz, 1986). There are also instances of sobbing or crying, which both generate fluid in the vocal tract. This fluid needs to be removed from the vocal tract in order for speech to be possible; so swallowing commonly occurs in this environment (cf. Hepburn, 2004).

In several of the cases we will see, the swallow comes before the display of affect, and so can be seen as a kind of projection device. This is reminiscent of the “guttural” sounds observed by Jefferson (2010), which she analyses as sometimes “laugh-premonitory” (Jefferson, 2010: 1,478). Swallows, in a similar way, may be understood as connected to sobbing or crying, though of course the kinds of laryngeal and pharyngeal constrictions that Jefferson described as “guttural” are associated with laughter are compatible with speaking (Chafe, 2007; Esling, 2007), while swallowing is not.

We start with an example with a swallow in pre-turn position. In Example 8, Michael is collecting a Portuguese guitar that had belonged to his grandmother. When he brought the guitar in, he told how his grandfather had serenaded his grandmother with this guitar; and he described his grandmother as his “hero”, “best friend”, and the guitar was one of her “treasures”.

At line 9 Michael sees the repaired guitar. Initially he produces two assessments of it (“complete” and “shiny”), which are coproduced with smiles ($). At line 12, his smile changes to a frown. He then produces a number of syntactic frames for assessments, all of which have perturbations in the production, and there is no assessment term in the slot where one term could be placed (lines 12, 14, 18 and 20)––he displays difficulties in verbalizing how he feels.

At lines 21–22, David invites Michael to reminisce about the guitar’s connection to his grandmother. This reminiscence is already projected as an emotionally charged one with the word “treasure” to refer to the guitar–the term that Michael himself used when bringing the guitar in and describing his affection for his grandmother, and her relationship to the guitar. This turn is framed as an assessment where the speaker has lower epistemic authority than the recipient, thus making a response from Michael relevant. Michael’s response at line 23 is initiated with his lips visibly closed and pressed tight together, nodding–an embodied and immediate confirming response–and then a swallow, which displays a temporary inability to talk, and serves to delay the verbal part of his response. His “yeah” is produced quiet, and low in his pitch range, a contrast with his prior talk, perhaps marking that this talk is on a different footing from earlier talk.

As we saw in Example 7, at a moment where an affective display has been made relevant, Michael displays a temporary inability to verbalize, which is also congruent with his earlier difficulties (cf. Wilkinson and Kitzinger, 2006, who consider some cases where people are “lost for words”). Michael’s turn at lines 25–31 is an account of his lack of knowledge of precise details. In the turn, he uses a strongly valenced term, “heartbreaking” to express regret; he reminisces about how his grandmother related to the guitar (“she glowed”); and he expresses his gratitude for the repair.

In this example, a swallow comes in response to an invitation to share an emotionally charged memory. While the detail of Michael’s affective stance is unspoken, the swallow seems to be one device, in pre-turn position, that projects something about the quality of the upcoming talk.

In Example 9, the swallow is postpositioned. Karen has returned to collect a wooden jewellery box that has been repaired. The box has some inlaid birds, which are fragile. When the box was first brought in for repair, Will expressed worries that he would not be able to clean the box without damaging the birds: so there is a risk that the repair has not been successful. This is alluded to in lines 9–13.

At line 5, Will projects a news delivery (Freese and Maynard, 1998; Maynard and Freese, 2012), the first part of which comes in line 6. The revealing of the repaired box is being delayed, so lines 5–6 could be heard as a prefatory account for disappointing news, given the warning when the box was brought in that cleaning it might damage the birds. Karen’s “yeah(p)” at line 7 acknowledges this preface to news, in a lexically minimal way; with no lexical material, this turn has a provisional character in response to the projected news (Freese and Maynard, 1998: 209). It also lacks many of the features identified by Freese and Maynard (1998) as associated with the receipt of “good” news, such as high amplitude and high pitch register. The post-positioned swallow, with the tightly closed lips, displays that Karen has no more to say (see Raymond, 2010 for discussion of “nope” with similarly minimal features and noticeable bilabial closure). While it gives the go-ahead for Will’s next turn, the minimal design of this turn seems to mark her readiness to receive news that might not be good, i.e. treating Will’s pre at line 6 as a preface to potentially bad news. Will’s next turns also orient to the potential for a bad outcome through his description of his careful cleaning process (lines 8–14).

In fact, when Will reveals his work at line 15, it turns out to be treated as “good” news (lines 16–17, 19–20), and is receipted with dynamic intonation contours, a wider pitch span, and strong lexical formulations (“flabbergasted”, “absolutely lovely”, “fantastic”).

So in this case, a post-positioned swallow with tightly closed lips indexes both “nothing more to say” and in conjunction with the minimality of the turn and its absence of high pitch, high register intonation, it displays an orientation to the possibility that Will’s projected news delivery will be “bad” news.

Swallowing as Part of a Display of Trouble

Example 10 contains an example of a swallow which is embedded within a longer turn that displays trouble. Anne and John are discussing what Anne can do with a chapter she has written.

The sequence begins with Anne making a pre-request (line 1). This is followed by an account for the upcoming request at lines 2–3, which ends with the name of the journal she plans to send the paper to. John does not respond to this pre-sequence. Anne follows it at line 4 with a swallow, along with other physical, visible evidence of “trouble”: scrunched up eyes (Figure 3), and her hand is moved to being clenched.


FIGURE 3. (A): End of line 3, “ELR”. (B): Swallow at line 4. (C): Line 6: sides of the mouth turned down, neck tightened, displaying “trouble”.

As in other examples, the swallow is placed after a syntactic and prosodic boundary, in this case after a point of syntactic and prosodic completion. There are no obvious signs of trouble in the talk-so-far, though there are a few possible candidates. First, a request for help may in itself be a sign of trouble, something that the requester cannot do for themself. Secondly, by identifying the journal, Anne might be drawing on shared knowledge about the challenges of a successful submission; but that is not explicit.

The next verbal part of her turn, lines 5–6, identifies her trouble (“very confused”) and explains what is causing her difficulty, and is followed in line 6 by another facial expression that displays trouble (Figure 3). John’s offer at line 7 orients to Anne’s verbal account and visual display of trouble. Anne then orients to the possible imposition his offer will cause him (lines 8, 9, 11).

So the swallow at line 4, along with other physical displays, is part of a gestalt that embodies and projects a trouble which is later verbalized, and brings it to the surface of the interaction.

In this case the swallow, along with other physical features of production, laminates the evolving action of making a request, displaying “trouble” or “difficulty” with something she needs help with. The physical display and verbal account of trouble contribute to recruiting John’s offer in response (line 7) (Kendrick and Drew, 2016). The swallow and accompanying facial expression, and the facial expression in lines 4 and 6 (Figure 3) form a gestalt that display “trouble” in a way that is much less obvious from the linguistic design of Anne’s turn. Thus the swallow, with its accompanying facial expression, and then the facial expression at line 8 contributes to the addition of a sequentially relevant affective dimension to the formulation of the ongoing action. As in other cases, the position of the swallow is sensitive to the unfolding syntactic and prosodic structures, and to the actions that they implement.

Swallowing and Crying

It has been claimed that swallowing commonly co-occurs with crying (Hepburn, 2004: 286). This is perhaps unsurprizing, since crying generates fluids that need to be removed from the vocal tract, and swallowing does this. Crying is a sign of a heightened emotional state; so swallowing can be part of such a display. In Example 7, Valerie’s swallowing comes before she sobs, but sometimes crying and swallowing are concurrent.

Example 11 illustrates one such case. Here, a young man has used a large sum of his mother’s money to have his back tattooed with an image she finds obscene. This image has just been shown to the court, and the mother has just wiped a tear from her eye.

The judge first enquires about the mother’s emotions (line 1). This is done so as to present the young man’s behavior as blameworthy (line 3), i.e. siding with the mother’s stance toward her son. In response to this question, the mother describes her feelings using the strong terms “fuming” and “disgusted” (lines 4–5), and the grave, unforgivable nature of what he has done (line 7).

At line 8, she starts another TCU with “he knew”, but then her speech becomes indistinct as she begins to cry. Unlike many cases of swallowing, where the swallow seems to be carefully placed so as not to disrupt the syntax, the crying here is embedded within an ongoing turn, which continues alongside the crying. It thus seems to be a spontaneous outpouring of emotion (cf. Wilkinson and Kitzinger, 2006), or at least performed as such.

At lines 11–13 the judge acknowledges her assessment by recycling her extreme case formulation (“wrecked his body”), and with his question at line 13 provides her with an opportunity to focus on her feelings. She makes a summary assessment (“devastated”, line 14), which is followed by a swallow.

The Judge treats this swallow at line 14 as a sign that the TCU is complete. He initiates a next action at line 15, with a new first pair part on the effect of the young man’s actions, and the sum of money.

M’s post-completion swallow comes in the context of strong emotions identified verbally and displayed physically throughout the sequence through crying. While the crying co-occurs with speech in line 8, the swallow is post-positioned after a prosodically, pragmatically and syntactically complete TCU in line 14. It occurs at what turns out to be the termination of question sequence and the progression to the next. Thus this swallow handles both matters of sequential organization and affective display.

Examples in this section and elsewhere in the paper show swallows as a part of displays of affective stance. Experimental findings that the rate of swallowing increases with heightened emotional arousal cannot be verified through this data, but the data support the finding that swallowing occurs in such environments. What conversational data adds is an understanding of the complex of linguistic and bodily resources available to participants in such displays; and CA more particularly shows that bodily actions like swallowing are precisely and delicately timed with other ongoing activities in interaction. Swallowing is by no means the only resource for laminating an ongoing activity with an affective stance; but because of its association with sobbing and crying, it is reasonable to claim that swallowing can index the same kinds of emotional states as sobbing and crying.


In this paper, I have considered the positioning of swallows in talk. I have focused on three main aspects: swallows in the context of projecting more talk; swallows in the context of projecting no more talk; and the association of swallows with affective displays.

Like sighs (Hoey, 2014), sniffs (Hoey, 2020a), and clicks (Wright, 2011; Ogden, 2013; Ogden, 2020; Li, 2020; Pinto and Vigil, 2020), swallows are placed in ongoing talk in a way that displays sensitivity to emerging syntactic and phonological structures. This placement suggests at the very least that linguistic and somatic functions are planned in parallel: swallows do not occur randomly distributed in speech, but are rather precisely placed with respect to the linguistic and turn constructional units of organization.

Many cases of swallowing in talk are inaudible, or barely audible. It seems very likely that some “silences” are in fact occasions on which participants swallow: silence does not necessarily mean inactivity, as we know from multimodal studies of interaction.

While the sounds of swallowing are low in amplitude, swallows can be made audible by the events just before and after the occurrence of the swallow.

I showed that it is common for swallows that occur in a context where more talk is projected to be released with audible clicks. A stretch of talk like that shown in Example 1 (“and-uhm ((0.62 SWALLOW CLICK)) I think”) is a specialized kind of “closure piece” (Kelly and Local, 1986): an intonation contour is suspended at the onset of the piece; the lips are closed for [m] in “uhm” and simultaneously to produce the swallow. Whereas Kelly and Local’s “closure pieces” have silence at their center, these stretches of talk have a swallow in the portion where talk is suspended: so while there might be silence, there is physical activity which temporarily makes speech impossible. The closure for the swallow is released with a click when the talk is resumed.

Swallows are frequently released into lip smacks or clicks, which have been shown elsewhere to project further talk. Arguably, because clicks and lip smacks are more audible than swallows (which are often also difficult to see), prior research has underplayed or ignored some swallows, focusing on the auditorily salient clicks instead. Rather than think of such stretches as (silence + click), it is probably more accurate in many cases to treat them as (swallow + release), where the release may be noisy. Some clicks, then, may be best understood as the audible release features of a swallow.

On the other hand, the inaudibly released bilabial closures in “yeap ((SWALLOW))” and “nope ((SWALLOW))” serve to mark no continued talk by the speaker: these cases have phonetic features of turn-finality (Local and Walker, 2012) and that includes the absence of an audible release to the closure required for a swallow. So the phonetic and prosodic details of talk around swallowing–before, during and after–make a significant contribution to the progressivity or suspension of talk.

Swallowing removes liquid from the vocal tract. Since a clear vocal tract is a precondition for speaking, swallows form a natural class with other visible or audible preparations for talking, and can be used as a practice to delay the onset of talk, while simultaneously displaying an orientation to the relevance of talk. Seeing swallows and other preparations for speaking (like taking an in-breath, adjusting the body posture, or the audible separation of articulators) as a natural class that displays an orientation to the relevance of talk while not talking (yet) gives an explanation for their positioning in pre-beginning position, and provides co-participants with a way to understand one another’s behavior and adjust their own conduct accordingly.

In the absence of instrumental data, or imaging, it is not possible to speculate on what is happening inside a speaker’s vocal tract, e.g. whether it is dry, or how saliva builds up. A more thorough-going phonetic and physiological study would be needed to answer this question. Nonetheless, the point remains that the audible and/or visible removal of fluid from the vocal tract by swallowing seems to be one way to index incipient speakership.

These observations point to the kinds of resources and practices participants in interaction have to make sense of a bodily activity which may be somatic in origin, but which may come to be implicated in other kinds of communicative practice. They also highlight the importance of observing the phonetic details not just of swallowing per se, but of the surrounding talk, and relating these observations to more general knowledge about the phonetic features of talk.

Swallowing can often be seen: tightly closed lips, the rise and fall of the larynx and accompanying facial expressions have all been noted in the data in this paper.

Closed lips–normally visible even when the rise and fall of the larynx during swallowing is not–can be used to make visible that the speaker is unavailable to speak or (when positioned after the end of a turn) has nothing more to say. This basic feature of swallowing provides coparticipants with a visual cue as to what is going on in the current speaker’s vocal tract. It was also shown that the lips are not just closed, but often tightly closed in a posture that is not used for the production of bilabial speech sounds like [m], [b], or [p].

The rise and fall of the larynx, and straightening of the pharynx, are (like the sounds of swallowing itself) not necessarily available: the swallow might be too fast, or there might be clothing that obscures sight of the swallower’s neck, or the camera angle might not allow it. However, where this is visible, it can form part of the audible/visible gestalt of swallowing. The visible cues of swallowing can thus index unavailability to speak.

Facial expressions are sometimes used alongside swallowing (as in Example 10) to laminate the unfolding talk with a visible affective display along the lines of Peräkylä and Ruusuvuori (2012). Experimental findings that show that the rate of swallowing increases with emotional arousal (Fonagy and Calloway, 1985; Cuevas et al., 1995). In these cases, swallows seem to form a gestalt with other bodily actions. The absence and unavailability of speech coupled with other bodily conduct accompanying swallowing is a resource that participants can use to display trouble without verbalizing it.

In short: the semiotic affordances of the audible and visible aspects of swallows can be exploited in speech: the incompatibility of speaking with swallowing, visibly tightly closed lips, and aspects of the release of swallows such as clicks, all have indexical value in speech.

When it comes to the placement of swallows relative to syntactic structures, there is a close relation between possible syntactic completion points and issues of projection, which are also intimately bound up with prosodic design. I present simplified versions of the data here, and use square brackets with labels, XP[….]XP, to surround syntactic phrasal units: noun phrase (NP), verb phrase (VP), prepositional phrase (PP), adjectival phrase (AP).

Firstly, swallows occur in pre-turn position, before the onset of lexical material:

Example 2: .thh uhm SWALLOW CLICK uhm S[I didn’t- I didn’t do very well in school] S

Example 8: SWALLOW RespToken[yeah] RespToken

Secondly, swallows occur on the completion of talk:

Example 7: S[We’ve done something for dad as well]S SWALLOW

Example 11: AP[Devastated over it]AP SWALLOW

Example 9: RespToken[Yeah]RespToken SWALLOW

In both these positions, the swallow does not interrupt the progress of the current unit, and it is positioned after the syntactic phrase boundary; and the current unit is recognizable as a complete TCU.

In other cases, swallows are embedded within TCUs. In principle, swallows could occur anywhere, but they always occur between words (and in this data never in the middle of a word). This alone displays that “word” is treated an indivisible unit by the person who swallows.

Swallows may be positioned within a phrasal constituent, such as within a verb phrase (VP):

Example 5: S[NP[Belinda]NPVP[V[got]V -uhm SWALLOW NP[a (0.6) grant]NP]VP]S

Taking a rather classical approach, the swallow here is positioned between the verb (V) “got”, which requires a noun phrase (NP) as an object to make a verb phrase (VP), which is an obligatory element of a sentence (S) in English. So here the swallow is located at a point of syntactic incompletion: in the middle of a VP. The presence of “uhm” indicates the suspension of the ongoing VP; and the intonation is suspended at this point too.

In Example 4, the swallow is placed between a fronted prepositional phrase before the subject and complement of the sentence. This is not at a point of syntactic completion (and not at a TRP), but at the boundary of a prepositional phrase (PP), and before one of the obligatory elements of a sentence:

Example 4: S[AdvP[Interestingly enough]AdvPPP[on the inside]PP SWALLOW NP[there]NPVP[’s some old newspaper … ]VP]S

Other examples like these, with different kinds of syntactic units but all of the general form XP (to generalize over NP, VP, AP, etc), are also found in examples in the literature:

Schegloff (1988: 226): S[NP[A member of your own staff, Mr Craig Fuller]NP SWALLOW VP[has testified … ]VP]S

Rossi (2015: 41–42):

S[NP[Io e la Lidia]NP SWALLOW VP[abbiamo prima raccolto i soldi]VP]S

S[NP[Lidia and I]NP SWALLOW VP[collected the money first]VP]S

In all these cases, the syntax projects more to come, and the talk contains other features that project that further talk. In cases like Example 5, where the swallow comes within a VP and after “uhm”, the intonation contour is suspended, whereas in examples like Example 4, where the swallow comes after an PP boundary, the intonation contour (a fall-rize) is complete, but together with the syntactic incompleteness serves to project further talk.

This sketch of the syntactic positioning of swallows suggests that swallowing is sensitive at least to words; and also to higher-level syntactic constituents than words. It is also clear that syntax and prosody work in parallel, since matters of unit construction and unit completion are, for participants, complex emergent. Further work and more data are needed to explain how exactly this syntactic phrasing maps to intonation phrases and boundaries and how together they serve to project more talk to come.

In some cases, swallowing is a practice that physically displays not just unavailability to speak but perhaps an inability to speak. Some of the examples of swallowing in this paper are in the context of displays of sobbing or crying. Because of its association with crying, swallowing can be recruited as part of a display of a heightened affective stance, and sometimes the inability of a speaker to find the right words—swallowing can be one way to display “lost for words”. In other cases, swallows are in or associated to turns accompanied by strong lexical formulations. There remains much to do to understand how and on what occasions swallowing works in such displays, and more ecologically valid data is needed.

In their distribution, swallows bear some resemblance to other sounds and actions like sniffs, sighs and clicks, which use some or all of the vocal tract. This paper shows that swallows are similarly liminal events, and that language and speech are intertwined with such events in orderly ways in everyday interaction, providing participants with non-verbal semiotic resources.

Data Availability Statement

The data analyzed in this study is subject to the following licenses/restrictions: Some data used are taken from public broadcasts; other data has been taken from recordings made for the purposes of research. The author can be contacted about access to data. Requests to access these datasets should be directed to

Ethics Statement

The studies involving human participants were reviewed and approved by the Ethics committee, Department of Language and Linguistic Science, University of York. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

RO is responsible for the entirety of the manuscript.


This work was conducted without specific grant support. Publication fees have been paid by the Department of Language and Linguistic Science, at the University of York.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


I would like to thank the reviewers and the editors for their constructive feedback and advice on the paper. I am grateful to colleagues in the Center for Advanced Studies in Language and Communication–Paul Drew, Kobin Kendrick, John Local, Merran Toerien and especially to Marina Cantarutti–and to the Department of Language and Linguistic Science at the University of York for their support with this paper.


Belz, M., and Trouvain, J. (2019). “Are ‘Silent’ Pauses Always Silent?,” in International Congress of Phonetic Sciences ICPhS 2019, 5-9 August 2019, Melbourne, Australia: 2744–2748.

Google Scholar

Chafe, W. (2007). The phonetics of laughter - a linguistic approach. In Interdisciplinary Workshop on The Phonetics of Laughter, Saarbrücken. Available at:

Cichero, J. A. Y., and Murdoch, B. E. (2002). Acoustic Signature of the normal Swallow: Characterization by Age, Gender, and Bolus Volume. Ann. Otol. Rhinol. Laryngol. 111 (7), 623–632. doi:10.1177/000348940211100710

PubMed Abstract | CrossRef Full Text | Google Scholar

Clift, R. (2016). Conversation Analysis. Cambridge: Cambridge University Press. doi:10.1017/9781139022767

CrossRef Full Text

Couper-Kuhlen, E., and Barth-Weingarten, D. (2011). A System for Transcribing Talk-In-Interaction: GAT 2. Gespraechsforschung 12 (12), 1–51. Available at:

CrossRef Full Text | Google Scholar

Couper-Kuhlen, E., and Selting, M. (2017). Interactional Linguistics: An Introduction to Language in Social Interaction. Cambridge: Cambridge University Press. doi:10.1017/9781139507318

CrossRef Full Text

Cuevas, J. L., Cook, E. W., Richter, J. E., McCutcheon, M., and Taub, E. (1995). Spontaneous Swallowing Rate and Emotional State. Dig. Dis. Sci. 40 (2), 282–286. doi:10.1007/BF02065410

PubMed Abstract | CrossRef Full Text | Google Scholar

Dingemanse, M. (2020). Between Sound and Speech: Liminal Signs in Interaction. Res. Lang. Soc. Interaction 53 (1), 188–196. doi:10.1080/08351813.2020.1712967

CrossRef Full Text | Google Scholar

Esling, J. H., Moisik, S. R., Brenner, A., and Crevier-Buchman, L. (2019). Voice Quality. The Laryngeal Articulator Model. Cambridge: Cambridge University Press. doi:10.1017/9781108696555

CrossRef Full Text

Esling, J. H. (2007). “States of the Larynx in Laughter,” in Interdisciplinary Workshop on The Phonetics of Laughter, Saarbrücken, 4-5 August 2007. Retrieved from:

Google Scholar

Faucher, G., Karimi, E., Ménard, L., and Laporte, C. (2019). “Automatic Palate Delineation in Ultrasound Videos,” in Proceedings of the 19th International Congress of Phonetic Sciences. Editors S. Calhoun, P. Escudero, M. Tabain, and P. Warren (Melbourne), 422–426. Retrieved from: (Accessed August 5-9, 2019).

Google Scholar

Ferrucci, J. L., Mangilli, L. D., Sassi, F. C., Limongi, S. C. O., and Andrade, C. R. F. d. (2013). Sons da deglutição na prática fonoaudiológica: análise crítica da literatura. Einstein (São Paulo) 11 (4), 535–539. doi:10.1590/S1679-45082013000400024

CrossRef Full Text | Google Scholar

Fonagy, P., and Calloway, S. P. (1986). The Effect of Emotional Arousal on Spontaneous Swallowing Rates. J. Psychosomatic Res. 30, 183–188. doi:10.1016/0022-3999(86)90048-6

CrossRef Full Text | Google Scholar

Fox Tree, J. E., and Clark, H. H. (1997). Pronouncing “The” as “Thee” to Signal Problems in Speaking. Cognition 62, 151–167. doi:10.1016/S0010-0277(96)00781-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Freese, J., and Maynard, D. W. (1998). Prosodic Features of Bad News and Good News in Conversation. Lang. Soc. 27 (02), 195–219. doi:10.1017/S0047404500019850

CrossRef Full Text | Google Scholar

Hepburn, A. (2004). Crying: Notes on Description, Transcription, and Interaction. Res. Lang. Soc. Interact. 37 (3), 251–290. doi:10.1207/s15327973rlsi3703

CrossRef Full Text | Google Scholar

Hepburn, A., and Potter, J. (2012). “Crying and Crying Responses,” in Emotion in Interaction. Editors A. Peräkylä, and M.-L. Sorjonen (Oxford: Oxford University Press), 195–210. doi:10.1093/acprof:oso/9780199730735.003.0009

CrossRef Full Text | Google Scholar

Hoey, E. M. (2017). Lapse Organization in Interaction. Max Planck Institute for Psycholinguistics. Retrieved from:

Hoey, E. M. (2015). Lapses: How People Arrive at, and Deal with, Discontinuities in Talk. Res. Lang. Soc. Interact. 48 (4), 430–453. doi:10.1080/08351813.2015.1090116

CrossRef Full Text | Google Scholar

Hoey, E. M. (2014). Sighing in Interaction: Somatic, Semiotic, and Social. Res. Lang. Soc. Interact. 47 (2), 175–200. doi:10.1080/08351813.2014.900229

CrossRef Full Text | Google Scholar

Hoey, E. M. (2020a). Waiting to Inhale: On Sniffing in Conversation. Res. Lang. Soc. Interact. 53 (1), 118–139. doi:10.1080/08351813.2020.1712962

CrossRef Full Text | Google Scholar

Hoey, E. M. (2020b). When Conversation Lapses. The Public Accountability of Silent Copresence. Oxford: Oxford University Press. doi:10.1093/oso/9780190947651.001.0001

CrossRef Full Text

Honda, K. (2004). Physiological factors causing tonal characteristics of speech: from global to local prosody. Speech Prosody. Available at: (Accessed March 23‐26, 2004).

Jefferson, G. (1974). Error Correction as an Interactional Resource. Lang. Soc. 3, 181–199. doi:10.1017/s0047404500004334

CrossRef Full Text | Google Scholar

Jefferson, G. (2010). Sometimes a Frog in Your Throat Is Just a Frog in Your Throat: Gutturals as (Sometimes) Laughter-Implicative. J. Pragmatics 42 (6), 1476–1484. doi:10.1016/j.pragma.2010.01.012

CrossRef Full Text | Google Scholar

Kaukomaa, T., Peräkylä, A., and Ruusuvuori, J. (2015). How Listeners Use Facial Expression to Shift the Emotional Stance of the Speaker's Utterance. Res. Lang. Soc. Interact. 48 (3), 319–341. doi:10.1080/08351813.2015.1058607

CrossRef Full Text | Google Scholar

Keevallik, L., and Ogden, R. (2020). Sounds on the Margins of Language at the Heart of Interaction. Res. Lang. Soc. Interact. 53 (1), 1–18. doi:10.1080/08351813.2020.1712961

CrossRef Full Text | Google Scholar

Kendrick, K. H., and Drew, P. (2016). Recruitment: Offers, Requests, and the Organization of Assistance in Interaction. Res. Lang. Soc. Interact. 49 (1), 1–19. doi:10.1080/08351813.2016.1126436

CrossRef Full Text | Google Scholar

Kendrick, K. H., and Torreira, F. (2015). The Timing and Construction of Preference: A Quantitative Study. Discourse Process. 52, 255–289. doi:10.1080/0163853X.2014.955997

CrossRef Full Text | Google Scholar

Kosmala, L. (2020). On the distribution of clicks and inbreaths in class presentations and spontaneous conversations: blending vocal and kinetic activities. In Laughter and Other Non-Verbal Vocalisations Workshop, 76–79.

Laanesoo, K., and Keevallik, L. (2017). Noticing Breaches with Nonpolar Interrogatives: EstonianKes(“Who”) Ascribing Responsibility for Problematic Conduct. Res. Lang. Soc. Interact. 50 (2), 286–306. doi:10.1080/08351813.2017.1340721

CrossRef Full Text | Google Scholar

Li, X. (2020). Click-Initiated Self-Repair in Changing the Sequential Trajectory of Actions-In-Progress. Res. Lang. Soc. Interact. 53 (1), 90–117. doi:10.1080/08351813.2020.1712959

CrossRef Full Text | Google Scholar

Local, J. (2004). “Getting Back to Prior Talk,” in Sound Patterns in Interaction: Cross-Linguistic Studies from Conversation. Editors E. Couper-Kuhlen, and C. E. Ford (Amsterdam: John Benjamins), 377–400. doi:10.1075/tsl.62.18loc

CrossRef Full Text | Google Scholar

Local, J., and Kelly, J. (1986). Projection and ?silences?: Notes on Phonetic and Conversational Structure. Hum. Stud. 9 (2–3), 185–204. doi:10.1007/BF00148126

CrossRef Full Text | Google Scholar

Local, J., and Walker, G. (2012). How Phonetic Features Project More Talk. J. Int. Phonetic Assoc. 42 (03), 255–280. doi:10.1017/S0025100312000187

CrossRef Full Text | Google Scholar

Maynard, D. W., and Freese, J. (2012). “Good News, Bad News, and Affect,” in Emotion in Interaction. Editors A. Peräkylä, and M.-L. Sorjonen (Oxford, New York: Oxford University Press), 92–111. doi:10.1093/acprof:oso/9780199730735.003.0005

CrossRef Full Text | Google Scholar

Morinière, S., Boiron, M., Alison, D., Makris, P., and Beutter, P. (2008). Origin of the Sound Components during Pharyngeal Swallowing in normal Subjects. Dysphagia 23 (3), 267–273. doi:10.1007/s00455-007-9134-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Ogden, R. (2020). Audibly Not Saying Something with Clicks. Res. Lang. Soc. Interact. 53 (1), 66–89. doi:10.1080/08351813.2011.61930910.1080/08351813.2020.1712960

CrossRef Full Text | Google Scholar

Ogden, R., and Hawkins, S. (2015). Entrainment As a Basis for Co-Ordinated Actions in Speech. In Proceedings of ICPhS XXVIII. Glasgow. Available at:

Ogden, R. (2013). Clicks and Percussives in English Conversation. J. Int. Phonetic Assoc. 43 (03), 299–320. doi:10.1017/S0025100313000224

CrossRef Full Text | Google Scholar

Paschen, L. (2019). “On Clicks in Russian Everyday Communication,” in Urban Voices: The Sociolinguistic, Grammar and Pragmatics of Spoken Russian. Editors N. Thieliemann, and N. Richter (Vienna: Peter Lang), 237–257.

Google Scholar

Peräkylä, A., and Ruusuvuori, J. (2012). “Facial Expression and Interactional Regulation of Emotion,” in Emotion in Interaction. Editors A. Peräkylä, and M.-L. Sorjonen (Oxford: Oxford University Press), 64–91. doi:10.1093/acprof:oso/9780199730735.003.0004

CrossRef Full Text | Google Scholar

Pinto, D., and Vigil, D. (2019). Searches and Clicks in Peninsular Spanish. Prag 29 (1), 83–106. doi:10.1075/

CrossRef Full Text | Google Scholar

Pinto, D., and Vigil, D. (2020). Spanish Clicks in Discourse Marker Combinations. J. Pragmatics 159, 1–11. doi:10.1016/j.pragma.2020.01.009

CrossRef Full Text | Google Scholar

Pomerantz, A. (1984). “Agreeing and Disagreeing with Assessments: Some Features of Preferred/dispreferred Turn Shapes,” in Structures of Social Action: Studies in Conversation Analysis. Editors J. M. Atkinson, and J. Heritage (Cambridge: Cambridge University Press), 57–101.

Google Scholar

Pomerantz, A. (1986). Extreme Case Formulations: A Way of Legitimizing Claims. Hum. Stud. 9, 219–229. doi:10.1007/bf00148128

CrossRef Full Text | Google Scholar

Raymond, G. (2010). “Prosodic Variation in Responses,” in Prosody in Interaction. Editors D. Barth-Weingarten, E. Reber, and M. Selting (Amsterdam: Benjamins), 109–130. doi:10.1075/sidag.23.12ray

CrossRef Full Text | Google Scholar

Ritz, T., and Thöns, M. (2006). Affective Modulation of Swallowing Rates: Unpleasantness or Arousal?. J. Psychosomatic Res. 61, 829–833. doi:10.1016/j.jpsychores.2006.05.008

CrossRef Full Text | Google Scholar

Roach, P., Stibbard, R., Osborne, J., Arnfield, S., and Setter, J. (1998). Transcription of Prosodic and Paralinguistic Features of Emotional Speech. J. Int. Phonetic Assoc. 28 (1–2), 83–94. doi:10.1017/S0025100300006277

CrossRef Full Text | Google Scholar

Rossi, G. (2015). The Request System in Italian Interaction. Max Planck Institute for Psycholinguistics. Retrieved from:

Sacks, H. (1987). “On the Preferences for Agreement and Contiguity in Sequences in Conversation,” in Talk and Social Organisation. Editors G. Button, and J. R. E. Lee (Clevedon: Multilingual Matters), 54–69.

Google Scholar

Schegloff, E. A. (1988). From Interview to Conrontation: Observations of the bush/rather Encounter. Res. Lang. Soc. Interact. 22 (1–4), 215–240. doi:10.1080/08351818809389304

CrossRef Full Text | Google Scholar

Schegloff, E. A. (2007). Sequence Organization in Interaction. A Primer in Conversation Analysis. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511791208

CrossRef Full Text

Schegloff, E. A. (1996). “Turn Organization: One Intersection of Grammar and Interaction,” in Interaction and Grammar. Editors E. Ochs, E. A. Schegloff, and S. A. Thompson (Cambridge: Cambridge University Press), 52–133. doi:10.1017/cbo9780511620874.002

CrossRef Full Text | Google Scholar

Scherer, K. R. (1985). “Vocal affect signalling: a comparative approach,” in Advances in the Study of Behavior. Editors J. Rosenblatt, C. Beer, M.-C. Busnel, and P.J.B. Slater 15, 189–244.

Selting, M. (2012). Complaint Stories and Subsequent Complaint Stories with Affect Displays. J. Pragmatics 44 (4), 387–415. doi:10.1016/j.pragma.2012.01.005

CrossRef Full Text | Google Scholar

Sikveland, R. O., and Ogden, R. (2012). Holding Gestures across Turns. Gest 12 (2), 166–199. doi:10.1075/gest.12.2.03sik

CrossRef Full Text | Google Scholar

Trouvain, J., Werner, R., and Möbius, B. (2020). “An Acoustic Analysis of Inbreath Noises in Read and Spontaneous Speech,” in 10th International Conference on Speech Prosody 2020, 789–793. doi:10.21437/speechprosody.2020-161

CrossRef Full Text | Google Scholar

Wilkinson, S., and Kitzinger, C. (2006). Surprise as an Interactional Achievement: Reaction Tokens in Conversation. Soc. Psychol. Q. 69 (2), 150–182. doi:10.1177/019027250606900203

CrossRef Full Text | Google Scholar

Wright, M. (2007). “Clicks as Markers of New Sequences in English Conversation,” in International Congress of the Phonetic Sciences XVI, 6-10 August, Saarbrücken, 1069–1072. Retrieved from:

Google Scholar

Wright, M. (2005). Studies of the Phonetics-Interaction Interface: Clicks and Interactional Structures in English Conversation. Doctoral dissertation. University of York.

Google Scholar

Wright, M. (2011). The Phonetics-Interaction Interface in the Initiation of Closings in Everyday English Telephone Calls. J. Pragmatics 43 (4), 1080–1099. doi:10.1016/j.pragma.2010.09.004

CrossRef Full Text | Google Scholar

Keywords: swallowing, conversation, non-verbal communication, emotion, phonetics, conversation analysis, talk in interaction

Citation: Ogden R (2021) Swallowing in Conversation. Front. Commun. 6:657190. doi: 10.3389/fcomm.2021.657190

Received: 22 January 2021; Accepted: 24 June 2021;
Published: 28 July 2021.

Edited by:

Simona Pekarek Doehler, Université de Neuchâtel, Switzerland

Reviewed by:

Elisabeth Reber, Heidelberg University, Germany
Sally Wiggins, Linköping University, Sweden

Copyright © 2021 Ogden. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Richard Ogden,