<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Hum. Neurosci.</journal-id>
<journal-title>Frontiers in Human Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Hum. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5161</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnhum.2024.1399316</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Human Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Quick speech motor correction in the absence of auditory feedback</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bourhis</surname> <given-names>Morgane</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2676434/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Perrier</surname> <given-names>Pascal</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/143355/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Savariaux</surname> <given-names>Christophe</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/142888/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ito</surname> <given-names>Takayuki</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/112797/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/funding-acquisition/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
</contrib>
</contrib-group>
<aff><institution>Univ. Grenoble Alpes, CNRS, Grenoble-INP, GIPSA-Lab</institution>, <addr-line>Grenoble</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0001">
<p>Edited by: Douglas M. Shiller, Montreal University, Canada</p>
</fn>
<fn fn-type="edited-by" id="fn0002">
<p>Reviewed by: Kwang S. Kim, Purdue University, United States</p>
<p>Donald Derrick, University of Canterbury, New Zealand</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Morgane Bourhis, <email>morgane.bourhis@grenoble-inp.fr</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>06</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>18</volume>
<elocation-id>1399316</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>03</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>05</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2024 Bourhis, Perrier, Savariaux and Ito.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Bourhis, Perrier, Savariaux and Ito</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>A quick correction mechanism of the tongue has been formerly experimentally observed in speech posture stabilization in response to a sudden tongue stretch perturbation. Given its relatively short latency (&#x003C; 150&#x2009;ms), the response could be driven by somatosensory feedback alone. The current study assessed this hypothesis by examining whether this response is induced in the absence of auditory feedback. We compared the response under two auditory conditions: with normal versus masked auditory feedback. Eleven participants were tested. They were asked to whisper the vowel /e/ for a few seconds. The tongue was stretched horizontally with step patterns of force (1&#x2009;N during 1&#x2009;s) using a robotic device. The articulatory positions were recorded using electromagnetic articulography simultaneously with the produced sound. The tongue perturbation was randomly and unpredictably applied in one-fifth of trials. The two auditory conditions were tested in random order. A quick compensatory response was induced in a similar way to the previous study. We found that the amplitudes of the compensatory responses were not significantly different between the two auditory conditions, either for the tongue displacement or for the produced sounds. These results suggest that the observed quick correction mechanism is primarily based on somatosensory feedback. This correction mechanism could be learned in such a way as to maintain the auditory goal on the sole basis of somatosensory feedback.</p>
</abstract>
<kwd-group>
<kwd>speech motor control</kwd>
<kwd>tongue afferents</kwd>
<kwd>speech perturbation</kwd>
<kwd>somatosensory feedback</kwd>
<kwd>compensatory response</kwd>
<kwd>noise masking</kwd>
</kwd-group>
<contract-num rid="cn1">ANR-21-CE28-0022</contract-num>
<contract-num rid="cn2">R01-DC017439</contract-num>
<contract-sponsor id="cn1">Agence Nationale de la Recherche<named-content content-type="fundref-id">10.13039/501100001665</named-content></contract-sponsor>
<contract-sponsor id="cn2">National Institute on Deafness and Other Communication Disorders<named-content content-type="fundref-id">10.13039/100000055</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="27"/>
<page-count count="9"/>
<word-count count="7202"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Speech and Language</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec1">
<label>1</label>
<title>Introduction</title>
<p>Speech production can be assumed to be auditory in nature since the goal is to produce phonemic-relevant acoustic signals. This view is strongly supported by the huge difficulty hearing-impaired individuals have in learning to speak without hearing-aids (<xref ref-type="bibr" rid="ref8">Gold, 1980</xref>; <xref ref-type="bibr" rid="ref15">Kral et al., 2019</xref>). The importance of auditory inputs in speech motor control has also been demonstrated in the experimental paradigm of speech motor learning with altered auditory feedback. When speakers receive auditory feedback with an alteration of the phonemic-relevant acoustic characteristics, they adapt their speech according to this auditory alteration (<xref ref-type="bibr" rid="ref10">Houde and Jordan, 1998</xref>; <xref ref-type="bibr" rid="ref13">Jones and Munhall, 2005</xref>; <xref ref-type="bibr" rid="ref21">Purcell and Munhall, 2006</xref>; <xref ref-type="bibr" rid="ref22">Rochet-Capellan and Ostry, 2011</xref>). Somatosensory inputs which contains kinesthetic information are known to be important in human motor control. During speech production, speakers adapt to mechanical disturbances that affect somatosensory feedback even in situations in which the disturbance does not induce any auditory error in the speech sounds produced (<xref ref-type="bibr" rid="ref25">Tremblay et al., 2003</xref>). It is thus important to know how somatosensory and auditory inputs interact in the speech production process.</p>
<p>In our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), the tongue showed a quick compensatory response when the tongue posture was suddenly disturbed by an external force during steady vowel production. The tongue was first pulled forward up to a maximum deviation, then moved back to compensate. This response movement had two phases. We considered the first phase to be the consequence of the passive elasticity of tongue tissue. In the second phase, the velocity of the backward movement increased. The latency of the onset of this second phase was about 130&#x2009;ms after the onset of the perturbation. We considered the second phase of the response to tongue stretch to be the outcome of the influence of the neural sensory feedback. In this context, the crucial question was to clarify which sensory feedback &#x2013; somatosensory or/and auditory&#x2014;was involved in this phase of the response. An EMG study of tongue muscles involved in the control of the front part of the tongue, mainly the Anterior Genioglossus, was carried out on another set of participants for the same tongue stretch under similar experimental conditions (<xref ref-type="bibr" rid="ref11">Ito et al., 2024</xref>). A significant increase in the EMG magnitude was observed there in response to a tongue stretch with a latency of around 50&#x2009;ms. Computer simulations with a simplified linear mass-spring-damper model which included a delay between EMG signal and the force produced, showed that the latency of the EMG response (50&#x2009;ms) is compatible with the latency of the onset of the second phase of the kinematic response. All these elements support our hypothesis that the observed kinematic response starting at 130&#x2009;ms after the onset of the stretch could primarily be due to somatosensory feedback. However, given the latency of 130&#x2009;ms, which has also been observed in perturbations of the auditory feedback (<xref ref-type="bibr" rid="ref26">Xu et al., 2004</xref>; <xref ref-type="bibr" rid="ref3">Cai et al., 2011</xref>), we cannot fully rule out a potential influence of the auditory feedback in the observed second phase of the response to tongue stretch.</p>
<p>To address this question, we examined whether the compensatory response of the tongue could be induced in the absence of auditory feedback. We carried out the test using the same tongue perturbation as in our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>). To mask auditory feedback, a pink noise was presented during the speech production task. To maximize the effect of auditory masking, participants were asked to whisper the vowel and not to voice it. We compared the compensatory responses both in the articulatory and the acoustic domains between two auditory conditions, i.e., normal and masked auditory conditions. In line with our hypothesis of the predominant role of somatosensory feedback in the generation of the response to the tongue stretch perturbation, we expected that the compensatory response would be induced similarly and consistently regardless of auditory conditions. Two phases are thus expected in the response, and the latter phase, which is hypothesized to be driven by somatosensory feedback alone, is the focus of the data analysis.</p>
</sec>
<sec sec-type="materials|methods" id="sec2">
<label>2</label>
<title>Materials and methods</title>
<sec id="sec3">
<label>2.1</label>
<title>Participants</title>
<p>Twelve na&#x00EF;ve young adults (5 females, 20&#x2013;40 y.o.) participated in the experiment. They were all French native speakers and had no known speech or hearing impairments. They also had no history of profound injury that could impair somatosensation in the orofacial region. Participants signed the consent form approved by the local ethic committee, CERGA (Comit&#x00E9; d&#x2019;&#x00E9;thique pour la recherche, Grenoble-Alpes) (CERGA-AvisConsultatif-2021-18). One participant was excluded for not following the instructions. Eleven participants were included in the current analysis.</p>
</sec>
<sec id="sec4">
<label>2.2</label>
<title>Movement and sound data acquisition</title>
<p>Electromagnetic Articulography (EMA, Wave, Northern Digital Inc.) was used to record articulatory movements in synchrony with the recording of the produced sounds. For the production of a front vowel such as /e/, the articulatory movement is primarily characterized by the tongue position in the mid-sagittal plane. Hence, EMA sensors were attached to the tongue in the mid-sagittal plane (<xref ref-type="fig" rid="fig1">Figure 1A</xref>): tongue tip (TT), tongue blade (TB), and tongue dorsum (TD). We planned to set at 1&#x2009;cm the distance between two neighbor sensors on the tongue. However, this target distance was slightly adjusted in each participant depending on the size of the tongue. Finally, the resultant distances were 1.1&#x2009;&#x00B1;&#x2009;0.68&#x2009;cm between TT and TB, and 1.1 &#x00B1;&#x2009;0.85&#x2009;cm between TB and TD. Additional sensors were attached to the upper and lower lips (UL and LL), and to the jaw (J), to record potential movements of articulators other than the tongue, which might affect vowel acoustics after the application of tongue stretch. For head movements&#x2019; correction in the off-line analysis, four reference sensors were attached to the upper incisors in the mid-sagittal plane, and the nasion and the left and right mastoids. The participant&#x2019;s head was held in place with a head holder. After each recording, the palate contour in the mid-sagittal plane was recorded by tracing the surface of the palate using an EMA sensor attached to the experimenter&#x2019;s finger. The sensor data were recorded at a 200&#x2009;Hz sampling rate, and the speech sound produced was recorded synchronously at a 22.05-kHz sampling rate.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p><bold>(A)</bold> Lateral view of the experimental setup (adapted from <xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>). In the mid-sagittal view of the head (center of the panel), gray dots represent the positions of the EMA sensors: TT, tongue tip; TB, tongue blade; TD, tongue dorsum; J, jaw; UL, upper lip, and LL, lower lip and ref., reference markers on the nasion, left and right mastoids, and upper incisor. <bold>(B)</bold> Time sequence of each repetition of the speech task with tongue stretch perturbation (PTB) and auditory masking.</p>
</caption>
<graphic xlink:href="fnhum-18-1399316-g001.tif"/>
</fig>
</sec>
<sec id="sec5">
<label>2.3</label>
<title>Speech task and auditory masking</title>
<p>The speech task consisted of the sustained production of the whispered vowel /e/ for about 3.5&#x2009;s. Vowel /e/ was selected on the basis of the results of our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), in which systematic and large compensatory responses to tongue stretch were observed for this vowel. Since our previous study also showed no reliable difference between voiced and whispered conditions, we selected a whispered production in order to increase the likelihood that the masking noise actually masks the auditory feedback. For this masking, we used a pink noise, which was presented through earphones (Natus Tip 300) at 80&#x2009;dB SPL. In every trial, the speech task started with the mouth closed and ended by closing the mouth.</p>
</sec>
<sec id="sec6">
<label>2.4</label>
<title>Tongue stretch perturbation</title>
<p>For the tongue stretch perturbation, we used the same experimental setup as in <xref ref-type="bibr" rid="ref12">Ito et al. (2020)</xref> (<xref ref-type="fig" rid="fig1">Figure 1A</xref>). A small robotic device (Phantom Premium 1.0, Geomagic) was placed in front of the participant and connected to the tongue surface with a thin thread. The thread has two small anchors, which were attached to the tongue surface on both lateral sides of the tongue blade sensor (TB). The distance between each anchor and the TB sensor was set to be 1&#x2009;cm on either side of the sensor. This target distance was adjusted in each participant depending on the size of the tongue. The tongue stretch was applied by pulling the tongue forward with a 1&#x2009;N force for 1&#x2009;s. The force was applied as a step function, with rise and fall phases of 5&#x2009;ms to avoid mechanical noise in the robot.</p>
</sec>
<sec id="sec7">
<label>2.5</label>
<title>Experimental procedure</title>
<p>The time sequence of each trial is represented in <xref ref-type="fig" rid="fig1">Figure 1B</xref>. Each trial was triggered manually by the experimenter after visually checking that the participants were ready with their mouths closed and with their throats cleared if necessary. The participants carried out the speech task in response to a visual cue (green circle) presented on a monitor. Under the auditory masking condition, the noise was presented from the end of the previous trial to accustom the participants to the noise before the speech task and then lasted to the end of the trial at hand. The onset of the tongue stretch perturbation occurred between 1&#x2009;s and 1.5&#x2009;s after the presentation of the visual cue that launched the speech task.</p>
<p>The recording was divided into three sessions. Each session included 5 voiced trials and 50 whispered trials. The participants were asked to speak the vowel /e/ aloud in the first 5 trials and to whisper it in the following 50 trials. The 5 voiced trials were used to make sure that the participants actually produced vowel /e/ (and not vowel /&#x025B;/ or /&#x0153;/). Hence, we did not apply any perturbation during these voiced productions. In the 50 whispered trials, both auditory conditions were tested (25 trials each) in randomized order. The tongue stretch perturbation was applied in a fifth of the pseudo-randomly selected trials, with the constraint that it never occurred in two consecutive trials and that it was applied in the same number of trials in both auditory conditions (5 trials each in one session). In total, 165 trials were carried out (150 repetitions of the whispered speech task and 15 trials with voicing). 15 perturbed trials were recorded in each auditory condition. Despite these precautions, one participant did not correctly sustain the vowel /e/ during the main task and was excluded from the analysis.</p>
<p>Before the main recording, we also carried out a practice session to ensure that the pronunciation of vowel /e/ was correct and not influenced by a regional accent. The participants practiced to whisper the vowel with and without the auditory masking. We also asked the participants to make sure that the level of masking noise actually did mask their whispered sounds properly.</p>
</sec>
<sec id="sec8">
<label>2.6</label>
<title>Data preprocessing</title>
<p>We only analyzed the perturbed trials in the articulatory and acoustic domains. For each trial, time zero was aligned with the onset of the perturbation, and the analysis was applied to the time interval from 1&#x2009;s before the perturbation onset to 1.5&#x2009;s after the perturbation onset.</p>
<p>The articulatory movement data were first preprocessed by correcting the movement of the head with the reference sensors. Considering the inter-individual variability in articulatory positioning for vowel /e/ production, we evaluated the relative changes from baseline. Baseline was defined as the average position computed over the 50&#x2009;ms preceding the perturbation onset, for each sensor and each participant. Since the recorded tongue movements include the influence of jaw movement, we subtracted the position of the jaw sensor (J) from the recorded positions of the tongue sensors (TT, TB and TD). Finally, an average movement signal was computed for all the perturbed trials and all the participants for each sensor and for each auditory condition.</p>
<p>For sound data, the first three formants (F1, F2 and F3) were extracted over the same time interval as for the articulatory data using LPC analysis (<xref ref-type="bibr" rid="ref9001">Rabiner and Schafer, 1978</xref>). In this extraction, the acoustic signals were first under sampled at 10&#x2009;kHz in order to focus on the frequency range ([0, 5&#x2009;kHz]) of the first four formants in adult speakers. Sliding time Hanning windows of 25&#x2009;ms with a shift of 2&#x2009;ms were used, and an LPC analysis of order 12 was carried out for each window. Four possible formant frequencies were thus extracted at a sample rate of 500&#x2009;Hz, and the time variations of the first three formants were then computed on the basis of these four frequencies, using basic smoothness and continuity principles. As with the movement data, time zero was aligned with the perturbation onset; the variation of each formant over time was computed for each participant relative to baseline, which was computed as the average value over the 50&#x2009;ms preceding the perturbation. Finally, an average variation over time was computed for all the perturbed trials and all the participants for each formant and for each auditory condition.</p>
</sec>
<sec id="sec9">
<label>2.7</label>
<title>Data analysis</title>
<p>In line with our previous findings (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), we expected the response induced by the tongue stretch perturbation to consist of two phases according to the response latency. We interpreted the early phase as the result of the passive mechanical characteristics of tongue tissues, and the later phase as the response induced by neural sensory feedback. In the current study, we focused only on the later phase of response and examined whether the magnitude and timing of the response differ across the two auditory conditions.</p>
<p>Using the displacement, velocity and acceleration along the horizontal direction, we determined relevant time points of interest: the onset of displacement in response to tongue stretch; the time of the maximum displacement in response to tongue stretch; the onset and offset of the compensatory response induced by neural sensory feedback (See Results for the details). The horizontal displacements did not significantly differ between the two auditory conditions as shown in <xref ref-type="fig" rid="fig2">Figure 2B</xref>. So the data were averaged over the two auditory conditions and this grand-average was used to determine these time points. All the detected time points are represented by vertical dashed lines in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p><bold>(A)</bold> Temporal pattern of the grand-averaged responses to tongue stretch perturbation. The top panel represents the horizontal displacement of TB sensor and the bottom three panels represent the first, second and third formants (F1, F2, and F3). The shaded area represents the standard error across participants. Time 0 corresponds to the onset of the perturbation. <italic>Ini</italic> represents the initial response directly induced by the perturbation and <italic>Comp</italic> represents the period of the compensatory response of interest induced by neural sensory feedback. The four vertical dotted lines represent the onsets and offsets in <italic>Ini</italic> and <italic>Comp</italic> periods. <bold>(B)</bold> Magnified views of the grand-averaged displacements of TB sensor (<bold>A</bold>, top panel), and their first (velocity) and second (acceleration) derivatives. The solid lines correspond to the data averaged within each of the two auditory conditions. The dashed lines represent the data averaged across two auditory conditions. The vertical dashed lines and the periods <italic>Ini</italic> and <italic>Comp</italic> are the same as in <bold>A</bold>.</p>
</caption>
<graphic xlink:href="fnhum-18-1399316-g002.tif"/>
</fig>
<p>We measured the amplitudes in the sagittal plane of the initial displacement (from the onset of displacement to the time of maximum displacement, a period called &#x201C;<italic>Ini</italic>&#x201D; henceforth, <xref ref-type="fig" rid="fig2">Figure 2</xref>) and of displacement during the compensatory response (a period called &#x201C;<italic>Comp</italic>&#x201D; henceforth, <xref ref-type="fig" rid="fig2">Figure 2</xref>). For the sound data, we compared the formant frequencies F1, F2 and F3 at two time points: the time of maximal displacement in response to tongue stretch, and the offset of the compensatory response. For each measure, in the articulatory and acoustic domains, a one-way repeated measures ANOVA was applied to compare between two auditory conditions.</p>
<p>In line with our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), we expected the compensatory movement measured by the TB sensor in the mid-sagittal plane to take the shortest path to the original tongue contour, rather than returning to the exact original position before tongue stretch perturbation. We verified this behavior graphically by estimating the original tongue contour in the sagittal plane as the concatenation of two segments going from TT sensor to TB sensor and from TB sensor to TD sensor. To do this, we averaged the raw sensors&#x2019; positions over the 50&#x2009;ms preceding the stretch onset.</p>
<p>We also assessed whether auditory masking affected the baseline articulatory posture for the production of the vowel /e/, bearing in mind that auditory masking might modify articulatory movement and posture &#x2013; a phenomenon known as the Lombard effect (<xref ref-type="bibr" rid="ref18">Luo et al., 2018</xref>). The typical behavior of the Lombard effect is that the energy of the produced sound increases, which is particularly marked in the energy of vowels, mostly without consciousness. We assessed possible consequences of this effect in terms of articulatory posture. We compared the baseline articulatory posture of the tongue including TT, TB and TD between two auditory conditions using repeated measures ANOVA.</p>
</sec>
</sec>
<sec sec-type="results" id="sec10">
<label>3</label>
<title>Results</title>
<sec id="sec11">
<label>3.1</label>
<title>Assessing possible influence of Lombard effect on articulatory postures</title>
<p>We first verified whether the auditory masking affected the baseline articulatory posture for the production of the task utterance in our experiment. We compared the positions of the three tongue sensors (TT, TB and TD) in both auditory conditions. <xref ref-type="fig" rid="fig3">Figure 3</xref> shows a representative example obtained for one participant, which is the baseline articulatory positions in the two auditory conditions. Repeated-measures ANOVA showed no reliable difference between the two auditory conditions (<italic>F</italic>(1,50)&#x2009;=&#x2009;0.025, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.8), and no significant interaction effect between sensors and auditory conditions (<italic>F</italic>(2,50)&#x2009;=&#x2009;0.016 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.9). This indicates that auditory masking did not affect the basic achievement of the articulatory position for vowel /e/.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Baseline articulatory posture during the production of vowel /e/ for a representative participant in the mid-sagittal plane. The posture is obtained by taking the average across the 50&#x2009;ms before the perturbation onset and across all the perturbed trials. The black line represents the recorded palate contour. TT, TB, and TD correspond to the tongue tip, blade and dorsum sensors, UL and LL, to the upper and lower lips sensors, and J, to the jaw sensor (see also <xref ref-type="fig" rid="fig1">Figure 1</xref>).</p>
</caption>
<graphic xlink:href="fnhum-18-1399316-g003.tif"/>
</fig>
</sec>
<sec id="sec12">
<label>3.2</label>
<title>Compensatory responses to tongue stretch perturbation</title>
<p><xref ref-type="fig" rid="fig2">Figure 2A</xref> shows the horizontal displacement of the TB sensor (top panel) and the corresponding F1, F2 and F3 changes (bottom three panels) observed in response to tongue stretch over time. As in our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), the compensatory response had consequences in the articulatory and acoustic domains alike.</p>
<p>The tongue stretch perturbation first induced a fast forward displacement of the tongue, as characterized by the TB sensor, up to a maximum (first period, called &#x201C;<italic>Ini</italic>,&#x201D; in <xref ref-type="fig" rid="fig2">Figure 2</xref>). In a second period of the response, the amplitude of displacement decreased as the result of the combination of passive and compensatory effects. As in our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), we identified three time points of interest to characterize these two periods in the response. The time of <italic>maximum displacement</italic> characterizes the end of the first period, which is purely due to passive effects. For the second period we relied on the velocity and acceleration profiles, and identified a compensatory response that we consider to be induced by neural sensory feedback. <xref ref-type="fig" rid="fig2">Figure 2B</xref> shows a magnified view of the displacement, velocity and acceleration of the TB sensor. We observed an increase in the velocity, which we characterized in time by <italic>the peak of acceleration</italic>, which corresponds to an inflection point in the displacement. This second time point marks the onset of the compensatory response due to neural sensory feedback. The offset of this compensatory response is characterized by the subsequent <italic>velocity zero-crossing</italic>, which is the third time point of interest. These points are indicated in both panels of <xref ref-type="fig" rid="fig2">Figure 2</xref> by three vertical dashed lines points of interest. The compensatory response due to neural sensory feedback (&#x201C;<italic>Comp</italic>&#x201D;) is the focus of our analysis.</p>
<p>When the articulatory responses in the two auditory conditions were compared, we observed that the averaged values are mostly similar across participants in all variables. This can be seen in the top panel of <xref ref-type="fig" rid="fig2">Figure 2A</xref>, where the shaded areas represent standard-errors across participants for both auditory conditions. The clear overlap of the shaded areas between auditory conditions suggests no significant difference specifically in the <italic>Ini</italic> and <italic>Comp</italic> periods. This is quantitatively assessed below with repeated measures ANOVA.</p>
<p>The averaged displacement of TB sensor in the mid-sagittal plane for each auditory condition&#x2014;from the onset of the perturbation to the offset of the compensatory response induced by neural sensory feedback is presented in <xref ref-type="fig" rid="fig4">Figure 4A</xref>. As observed in our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>), the tongue was first displaced horizontally in the forward direction due to the horizontal force applied to the tongue, and then the compensatory response occurred. Importantly, as in our previous study the compensatory movement did not follow the path back to the position before the tongue stretch perturbation. Instead it went in the downward direction so as to take the shortest path to a posture that preserved the original tongue contour (as estimated by the dotted lines in <xref ref-type="fig" rid="fig4">Figure 4A</xref>) in the alveo-palatal region, in which the constriction of the vocal tract occurs during the production of vowel /e/. In line with our observations in the horizontal direction, the trajectories in the mid-sagittal plane are similar for both auditory conditions. <xref ref-type="fig" rid="fig4">Figure 4B</xref> shows the amplitude of the movement response in the mid-sagittal plane for both periods <italic>Ini</italic> and <italic>Comp</italic>. The repeated measures ANOVA did not reveal any significant difference between the two auditory conditions in the amplitudes of both periods of the response (<italic>F</italic>(1,10)&#x2009;=&#x2009;0.518 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.4 for <italic>Ini</italic> and <italic>F</italic>(1,10)&#x2009;=&#x2009;0.191 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.6 for <italic>Comp</italic>). The results indicate that the auditory condition did not significantly affect the amplitudes of the initial changes and of the compensatory response.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p><bold>(A)</bold> Grand-averaged displacements of TB sensor in the mid-sagittal plane from the onset of the perturbation (Time 0) to the offset of the compensatory response induced by neural sensory feedback, for each auditory condition. The dashed lines represent the estimated original tongue contours in each auditory condition. <bold>(B)</bold> Amplitude of the articulatory displacements of TB sensor during the <italic>Ini</italic> and <italic>Comp</italic> periods in each auditory condition. Error bars represent the standard errors across the participants.</p>
</caption>
<graphic xlink:href="fnhum-18-1399316-g004.tif"/>
</fig>
<p>The articulatory changes observed in response to the tongue stretch perturbation resulted in acoustical changes, as revealed in the variations of F1, F2 and F3 values. The rapid deflection of the tongue observed in the first period of the response induced a rapid decrease of F1 and F2, and a rapid increase of F3. The main effect associated with the decrease of the tongue deflection in the second period of the response is observed on F3 which clearly decreased and tended to return to its value before the perturbation onset. In F1, we essentially observe, for the same period of time, a stabilization with a slight non-significant trend to return to its original value, while F2 continued to decrease, but at a lower rate than in the first part of the response.</p>
<p>The averaged amplitudes of the normalized formant variations during the <italic>Ini</italic> and <italic>Comp</italic> periods together with the standard-errors are presented in <xref ref-type="fig" rid="fig5">Figure 5</xref>. At first glance, the figure confirms that significant formant changes were induced in the first period of the response to the perturbation (<italic>Ini</italic>), that only F3 shows significant compensation effects in both auditory conditions during the compensatory response due to neural sensory feedback (<italic>Comp</italic>), and that F1 does not show a clear trend toward a compensation for the effect of the perturbation. These observations are quantitatively confirmed by the statistical analysis: two-way repeated measures ANOVA show that only F3 underwent a significant change between the onset and offset of the compensatory response <italic>Comp</italic> [<italic>F</italic>(1,30)&#x2009;=&#x2009;11.021 <italic>p</italic>&#x2009;&#x003C;&#x2009;0.003]. There was no interaction effect with the auditory condition for either formants [F1: <italic>F</italic>(1,30)&#x2009;=&#x2009;0.358 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.5, F2: <italic>F</italic>(1,30)&#x2009;=&#x2009;0.088 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.7, and F3: <italic>F</italic>(1,30)&#x2009;=&#x2009;0.001 <italic>p</italic>&#x2009;&#x003E;&#x2009;0.9]. These changes were consistent in both auditory conditions. A one-way repeated measure ANOVA of the amplitude of formant variations during the <italic>Ini</italic> period showed no significant effect between the auditory conditions for each of the three formants [F1: <italic>F</italic>(1,10)&#x2009;=&#x2009;0.276, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.6&#x2009;F2: <italic>F</italic>(1,10)&#x2009;=&#x2009;0.021, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.8, F3: <italic>F</italic>(1,10)&#x2009;=&#x2009;0.006, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.9]. Similarly, the two-way repeated measures ANOVA on the onset and offset times of the <italic>Comp</italic> period show no significant difference between the two auditory conditions in all three formants [F1: <italic>F</italic>(1,30)&#x2009;=&#x2009;0.196, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.6, F2: <italic>F</italic>(1,30)&#x2009;=&#x2009;0.312, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.5 and F3: <italic>F</italic>(1,30)&#x2009;=&#x2009;1.395, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.2]. The absence of difference between auditory conditions indicates that auditory masking did not affect the compensation in response to the tongue perturbation.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Amplitude of frequency change in F1, F2 and F3 during the <italic>Ini</italic> (left) and <italic>Comp</italic> (right) periods in each auditory condition. The error bars represent standard errors across the participants.</p>
</caption>
<graphic xlink:href="fnhum-18-1399316-g005.tif"/>
</fig>
<p>Note that a large increase of F1 was only observed after the <italic>Comp</italic> period in normal auditory conditions. Although this may possibly due to the compensation by auditory feedback, we did not pursue this further since this was beyond the current target hypothesis.</p>
</sec>
</sec>
<sec sec-type="discussion" id="sec13">
<label>4</label>
<title>Discussion</title>
<p>Our main finding in the current study is that the response to the tongue stretch perturbation did not significantly differ between the normal and masked auditory conditions in both articulatory and acoustic domains. The results indicate that the current response is not dependent on auditory feedback and instead mainly relies on a somatosensory-basis. The compensatory mechanism driven by somatosensory inputs may be acquired through the development of speech production system, in order to integrate the cognitive requirements of speech communication. Auditory feedback is certainly important for speech production acquisition and development. However, the acoustic characteristics of speech sounds that are phonetically relevant for speech communication could be maintained in healthy adults by somatosensory inputs alone without using auditory inputs thanks to the learning of the association between somatosensory and auditory characteristics of speech sounds.</p>
<p>In the current experiment it is noteworthy that the participants faithfully replicated the quick compensatory response to the tongue stretch perturbation that we observed during vowel production in our previous study (<xref ref-type="bibr" rid="ref12">Ito et al., 2020</xref>). More specifically we observed the two distinct phases of the compensatory response which we considered to be, respectively, mostly influenced by passive characteristics of the tongue and neural sensory feedback. We also confirmed another important finding of our previous study&#x2014;when the tongue perturbation was applied in the forward direction during the vowel production, the response was induced to maintain the original tongue contour in the alveo-palatal region where the constriction of the vocal tract occurs, rather than to go back to the exact original position of the tongue.</p>
<p>Tongue stretch perturbation also changed the produced vowel sounds as evidenced in the F1, F2 and F3 variations. F3 was recovered as the result of the articulatory compensatory response, but not F1 and F2. This observation slightly differs from the results of our previous study, in which the compensatory effect was significant on both F1 and F3 (F2 variation was similar). In French vowel /e/ formant F1 (Helmholtz resonance of the resonator &#x201C;back cavity + constriction&#x201D; of the vocal tract) is mainly influenced by the change in the area of the constriction. F2 and F3 are mostly influenced by the position of the constriction along the antero-posterior direction (<xref ref-type="bibr" rid="ref1">Apostol et al., 2004</xref>). F2 largely depends on the length of the back cavity of the vocal tract (half-wavelength resonance) which is quite large since /e/ is a front vowel, while F3 depends on the short length of the front cavity (quarter-wavelength resonance). Hence, the impact of the forward displacement of the tongue induced by the horizontal force applied by the robotic device can be interpreted as follows. In the majority of participants, the forward displacement narrows the tongue surface to the alveolar region of the palate, which reduces the constriction area and lowers F1. Also, the anteroposterior displacement of the constriction increases the length of the back cavity which lowers F2 and decreases the length of the front cavity, which in turn increases F3. Given the difference in the lengths of the front and back cavities, F3 is more sensitive to the front/back movement of the constriction (see nomograms in <xref ref-type="bibr" rid="ref6">Fant, 1960</xref>). The effect of the articulatory change on F1 certainly varies among our participants depending on the shape of the alveolar part of the palate, which can be more or less curved along the front/back direction (see for example <xref ref-type="bibr" rid="ref2">Brunner et al., 2009</xref>). Phonetically, for the French vowel /e/, a decrease in F1 associated with an increase in F3 endangers the correct perception of the vowel which could be identified as an /i/. In French, the spectral center of gravity between F2 and F3 is the most important cue to perceptually separate vowel /i/ from its neighbor vowels /e/ and /y/ (<xref ref-type="bibr" rid="ref23">Schwartz and Escudier, 1987</xref>). The difference in F1 compensation between the current and the previous studies may be due to a difference in the effect of tongue-stretch perturbation. The change in F1 induced by the tongue perturbation in the current study is indeed smaller than the one observed with our previous study. This can be related to differences in the experimental setup and procedure. For the setup, the large interparticipant variability of the tongue structure prevents us to precisely control the sites of the recording sensors and anchors, and the direction of the tongue-stretch perturbation. For the procedure, the usage of whispered speech alone may affect the amplitude of initial change due to the tongue-stretch perturbation, because of differences in tongue postures. Different studies have indeed suggested that small differences exist in vocal tract configurations associated with the same vowels produced in whispered versus voiced condition (see for example <xref ref-type="bibr" rid="ref14">Kallail and Emanuel, 1984</xref>). In addition, the whispered source, because it is a pressure source located at the back end of the vocal tract has been shown to excite the low resonance frequencies of the vocal tract with less efficiency than the voiced source (see <xref ref-type="bibr" rid="ref24">Stevens, 1998</xref>). This can impact both the sensitivity of the participants to change in F1, which in turn alters their compensatory strategies, and the accuracy of the measure of F1 based on LPC. Hence our finding that the largest effect of the compensatory response is on F3 rather than on F2, which is less sensitive to front/back articulatory variations, or on F1, which is phonetically less critical, supports our idea that the compensatory response efficiently preserves the phonetically most relevant acoustic characteristics of the sound. This statement is all the more important since our current results show that this mechanism occurs without any influence of auditory feedback. It supports our hypothesis of a somatosensory based neural feedback mechanism tuned to preserve the auditory characteristics of the sound throughout the process of speech acquisition and development.</p>
<p>The use of auditory masking with a loud pink noise to cancel the influence of auditory feedback on the compensatory response could induce articulatory consequences due to the Lombard effect (<xref ref-type="bibr" rid="ref18">Luo et al., 2018</xref>). This could have dramatically altered the articulatory strategies of the participants and thus the articulatory responses to the perturbation. However, this was clearly not the case in our experiment, since the comparison of the participants&#x2019; postures in normal and masked auditory conditions did not reveal any significant difference in the region of interest. The Lombard effect was negligible for articulatory movement in our participants.</p>
<p>The role of somatosensory inputs in speech production, independent of acoustic feedback, has been demonstrated in <xref ref-type="bibr" rid="ref20">Patri et al. (2020)</xref>, <xref ref-type="bibr" rid="ref25">Tremblay et al. (2003)</xref> and <xref ref-type="bibr" rid="ref19">Nasir and Ostry (2008)</xref>. Jaw perturbation during speaking has led to somatosensory adaptation even in the absence of auditory error (<xref ref-type="bibr" rid="ref25">Tremblay et al., 2003</xref>). <xref ref-type="bibr" rid="ref19">Nasir and Ostry (2008)</xref> have also shown that cochlear implanted participants adapted to such a jaw perturbation even when their implants were turned off. These studies showed that somatosensory error was corrected in speech motor adaptation independently of auditory error. We also showed somatosensory based on-line feedback control here. Conversely to the above cited studies, the current tongue stretch perturbation was found to change both somatosensory and auditory feedback. These two errors are presumably compensated by the correction of somatosensory error alone. This suggests that the speech specific auditory goal can be achieved by somatosensory-based control.</p>
<p>This reliance on somatosensory feedback could be due to the benefit of shorter latency than auditory feedback which can take a longer neural loop. The current reflex was induced with the latency of 135&#x2009;ms. Similar latency was found in auditory-based compensation in response to an auditory feedback perturbation. <xref ref-type="bibr" rid="ref3">Cai et al. (2011)</xref> showed that, when formants were altered on line during the production of a sequence of vowels, auditory compensation was induced with a latency of around 160&#x2009;ms. In addition, a study using pitch shift perturbation showed a latency of around 120&#x2009;ms in the compensation during the production of disyllabic sequences (<xref ref-type="bibr" rid="ref26">Xu et al., 2004</xref>) and of multi-syllabic nonsense words (<xref ref-type="bibr" rid="ref5">Donath et al., 2002</xref>). However, the speech tasks used in those studies involved dynamic speech and were thus different from the task used in the current study, namely static speech consisting of sustaining a vowel for a few seconds. In cases involving similar static speech tasks, auditory compensations in response to altered auditory feedback were shown to involve longer latencies. <xref ref-type="bibr" rid="ref21">Purcell and Munhall (2006)</xref> found a latency longer than 460&#x2009;ms when the formant was perturbed during a sustained vowel production. <xref ref-type="bibr" rid="ref17">Larson et al. (2000)</xref> also showed a latency longer than 200&#x2009;ms in response to a pitch-shift perturbation occurring during the production of a steady &#x201C;ah&#x201D; sound. Hence, consistent experimental results suggest that in case of an on-line control of sustained vowel production, the contribution of auditory feedback can involve latencies longer than 200&#x2009;ms after the perturbation. In this context, somatosensory compensation is important to correct earlier phases of speech sound.</p>
<p>The importance of auditory feedback in speech production is known. For individuals with congenital deafness, it is difficult (or mostly impossible) to learn to speak without being equipped with hearing devices. Post-lingually deafened individuals also show degradations in their speaking performance along the course of deafness&#x2019; evolution (<xref ref-type="bibr" rid="ref4">Cowie et al., 1982</xref>). This has also been demonstrated in the experimental paradigm of speech motor adaptation with altered auditory feedback (<xref ref-type="bibr" rid="ref10">Houde and Jordan, 1998</xref>; <xref ref-type="bibr" rid="ref21">Purcell and Munhall, 2006</xref>; <xref ref-type="bibr" rid="ref9">Guenther and Hickok, 2015</xref>). When a somatosensory perturbation is applied simultaneously with a perturbation of the auditory feedback, this key-role of the audition in speech motor control may be affected. <xref ref-type="bibr" rid="ref16">Lametti et al. (2012)</xref> have shown in such an experiment that 21% of their participants did not compensate for the auditory perturbation and focused on the somatosensory one. On the other hand, <xref ref-type="bibr" rid="ref7">Feng et al., 2011</xref> found that error monitoring in the auditory domain plays a dominant role as compared to error monitoring in the somatosensory domain. Beyond these considerations, in the case of an on-line feedback control, as mentioned above, the latency of auditory compensation may be too long for efficient and stable control. Thus, it could be expected that auditory feedback plays a role in correction loops with a long delay, while stability of the control is ensured via another channel. This potential long latency contribution of the auditory feedback can be seen in the current dataset in which we observed different recoveries in F1 between normal and masked auditory conditions in a period later than 300&#x2009;ms after the perturbation. Although we did not pursue this in the current study, both somatosensory and auditory feedback contribute to on-line control in the different periods. It would be interesting to clarify this point by altering on line the auditory feedback to increase or reduce the auditory error shift that is naturally induced by the tongue stretch perturbation.</p>
</sec>
<sec sec-type="data-availability" id="sec14">
<title>Data availability statement</title>
<p>Due to ethical agreement the data will be share upon request in context of a non disclosure agreement (<email>takayuki.ito@grenoble-inp.fr</email>).</p>
</sec>
<sec sec-type="ethics-statement" id="sec15">
<title>Ethics statement</title>
<p>The studies involving humans were approved by comit&#x00E9; d&#x2019;&#x00E9;thique pour la recherche Grenoble Alpes (CERGA). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.</p>
</sec>
<sec sec-type="author-contributions" id="sec16">
<title>Author contributions</title>
<p>MB: Writing &#x2013; original draft, Writing &#x2013; review &#x0026; editing. PP: Writing &#x2013; review &#x0026; editing, Supervision. CS: Writing &#x2013; review &#x0026; editing. TI: Writing &#x2013; review &#x0026; editing, Funding acquisition, Supervision.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="sec17">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by grants from the Agence Nationale de la Recherche (ANR-21-CE28-0022, PI. TI) and the National Institute on Deafness and Other Communication Disorders (grant R01-DC017439).</p>
</sec>
<ack>
<p>We thank Silvain Gerber for his help in discussions on the statistical analysis.</p>
</ack>
<sec sec-type="COI-statement" id="sec18">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="sec19">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="sec20">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fnhum.2024.1399316/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fnhum.2024.1399316/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Apostol</surname> <given-names>L.</given-names></name> <name><surname>Perrier</surname> <given-names>P.</given-names></name> <name><surname>Bailly</surname> <given-names>G.</given-names></name></person-group> (<year>2004</year>). <article-title>A model of acoustic interspeaker variability based on the concept of formant&#x2013;cavity affiliation</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>115</volume>, <fpage>337</fpage>&#x2013;<lpage>351</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.1631946</pub-id>, PMID: <pub-id pub-id-type="pmid">14759026</pub-id></citation>
</ref>
<ref id="ref2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brunner</surname> <given-names>J.</given-names></name> <name><surname>Fuchs</surname> <given-names>S.</given-names></name> <name><surname>Perrier</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>On the relationship between palate shape and articulatory behavior</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>125</volume>, <fpage>3936</fpage>&#x2013;<lpage>3949</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.3125313</pub-id>, PMID: <pub-id pub-id-type="pmid">19507976</pub-id></citation>
</ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>S.</given-names></name> <name><surname>Ghosh</surname> <given-names>S. S.</given-names></name> <name><surname>Guenther</surname> <given-names>F. H.</given-names></name> <name><surname>Perkell</surname> <given-names>J. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Focal manipulations of formant trajectories reveal a role of auditory feedback in the online control of both within-syllable and between-syllable speech timing</article-title>. <source>J. Neurosci.</source> <volume>31</volume>, <fpage>16483</fpage>&#x2013;<lpage>16490</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3653-11.2011</pub-id>, PMID: <pub-id pub-id-type="pmid">22072698</pub-id></citation>
</ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cowie</surname> <given-names>R.</given-names></name> <name><surname>Douglas-Cowie</surname> <given-names>E.</given-names></name> <name><surname>Kerr</surname> <given-names>A. G.</given-names></name></person-group> (<year>1982</year>). <article-title>A study of speech deterioration in post-lingually deafened adults</article-title>. <source>J. Laryngol. Otol.</source> <volume>96</volume>, <fpage>101</fpage>&#x2013;<lpage>112</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S002221510009229X</pub-id>, PMID: <pub-id pub-id-type="pmid">7057081</pub-id></citation>
</ref>
<ref id="ref5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donath</surname> <given-names>T. M.</given-names></name> <name><surname>Natke</surname> <given-names>U.</given-names></name> <name><surname>Kalveram</surname> <given-names>K. T.</given-names></name></person-group> (<year>2002</year>). <article-title>Effects of frequency-shifted auditory feedback on voice F0 contours in syllables</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>111</volume>, <fpage>357</fpage>&#x2013;<lpage>366</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.1424870</pub-id>, PMID: <pub-id pub-id-type="pmid">11831808</pub-id></citation>
</ref>
<ref id="ref6">
<citation citation-type="book"><person-group person-group-type="author">
<name><surname>Fant</surname> <given-names>G.</given-names></name>
</person-group> (<year>1960</year>). <source>Acoustic theory of speech production</source>. <publisher-loc>The Hague, The Netherlands</publisher-loc>: <publisher-name>Mouton</publisher-name>.</citation>
</ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feng</surname> <given-names>Y.</given-names></name> <name><surname>Gracco</surname> <given-names>V. L.</given-names></name> <name><surname>Max</surname> <given-names>L.</given-names></name></person-group> (<year>2011</year>). <article-title>Integration of auditory and somatosensory error signals in the neural control of speech movements</article-title>. <source>J. Neurophysiol.</source> <volume>106</volume>, <fpage>667</fpage>&#x2013;<lpage>679</lpage>. doi: <pub-id pub-id-type="doi">10.1152/jn.00638.2010</pub-id>, PMID: <pub-id pub-id-type="pmid">21562187</pub-id></citation>
</ref>
<ref id="ref8">
<citation citation-type="journal"><person-group person-group-type="author">
<name><surname>Gold</surname> <given-names>T.</given-names></name>
</person-group> (<year>1980</year>). <article-title>Speech production in hearing-impaired children</article-title>. <source>J. Commun. Disord.</source> <volume>13</volume>, <fpage>397</fpage>&#x2013;<lpage>418</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0021-9924(80)90042-8</pub-id></citation>
</ref>
<ref id="ref9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Role of the auditory system in speech production</article-title>&#x201D; in <source>Handbook of clinical neurology</source>, vol. <volume>129</volume> (Amsterdam, The Netherlands: <publisher-name>Elsevier</publisher-name>), <fpage>161</fpage>&#x2013;<lpage>175</lpage>.</citation>
</ref>
<ref id="ref10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Houde</surname> <given-names>J. F.</given-names></name> <name><surname>Jordan</surname> <given-names>M. I.</given-names></name></person-group> (<year>1998</year>). <article-title>Sensorimotor adaptation in speech production</article-title>. <source>Science</source> <volume>279</volume>, <fpage>1213</fpage>&#x2013;<lpage>1216</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.279.5354.1213</pub-id></citation>
</ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ito</surname> <given-names>T.</given-names></name> <name><surname>Bouguerra</surname> <given-names>M.</given-names></name> <name><surname>Bourhis</surname> <given-names>M.</given-names></name> <name><surname>Perrier</surname> <given-names>P.</given-names></name></person-group> (<year>2024</year>). <article-title>Tongue reflex for speech posture control</article-title>. <source>Sci. Rep.</source> <volume>14</volume>:<fpage>6386</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-024-56813-9</pub-id>, PMID: <pub-id pub-id-type="pmid">38493261</pub-id></citation>
</ref>
<ref id="ref12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ito</surname> <given-names>T.</given-names></name> <name><surname>Szabados</surname> <given-names>A.</given-names></name> <name><surname>Caillet</surname> <given-names>J.-L.</given-names></name> <name><surname>Perrier</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Quick compensatory mechanisms for tongue posture stabilization during speech production</article-title>. <source>J. Neurophysiol.</source> <volume>123</volume>, <fpage>2491</fpage>&#x2013;<lpage>2503</lpage>. doi: <pub-id pub-id-type="doi">10.1152/jn.00756.2019</pub-id>, PMID: <pub-id pub-id-type="pmid">32432505</pub-id></citation>
</ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>J. A.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name></person-group> (<year>2005</year>). <article-title>Remapping auditory-motor representations in voice production</article-title>. <source>Curr. Biol.</source> <volume>15</volume>, <fpage>1768</fpage>&#x2013;<lpage>1772</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2005.08.063</pub-id></citation>
</ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kallail</surname> <given-names>K. J.</given-names></name> <name><surname>Emanuel</surname> <given-names>F. W.</given-names></name></person-group> (<year>1984</year>). <article-title>Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>27</volume>, <fpage>245</fpage>&#x2013;<lpage>251</lpage>. doi: <pub-id pub-id-type="doi">10.1044/jshr.2702.251</pub-id>, PMID: <pub-id pub-id-type="pmid">6738036</pub-id></citation>
</ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kral</surname> <given-names>A.</given-names></name> <name><surname>Dorman</surname> <given-names>M. F.</given-names></name> <name><surname>Wilson</surname> <given-names>B. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Neuronal development of hearing and language: Cochlear implants and critical periods</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>42</volume>, <fpage>47</fpage>&#x2013;<lpage>65</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev-neuro-080317-061513</pub-id></citation>
</ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lametti</surname> <given-names>D. R.</given-names></name> <name><surname>Nasir</surname> <given-names>S. M.</given-names></name> <name><surname>Ostry</surname> <given-names>D. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback</article-title>. <source>J. Neurosci.</source> <volume>32</volume>, <fpage>9351</fpage>&#x2013;<lpage>9358</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0404-12.2012</pub-id>, PMID: <pub-id pub-id-type="pmid">22764242</pub-id></citation>
</ref>
<ref id="ref17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larson</surname> <given-names>C. R.</given-names></name> <name><surname>Burnett</surname> <given-names>T. A.</given-names></name> <name><surname>Kiran</surname> <given-names>S.</given-names></name> <name><surname>Hain</surname> <given-names>T. C.</given-names></name></person-group> (<year>2000</year>). <article-title>Effects of pitch-shift velocity on voice F0 responses</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>107</volume>, <fpage>559</fpage>&#x2013;<lpage>564</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.428323</pub-id>, PMID: <pub-id pub-id-type="pmid">10641664</pub-id></citation>
</ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>J.</given-names></name> <name><surname>Hage</surname> <given-names>S. R.</given-names></name> <name><surname>Moss</surname> <given-names>C. F.</given-names></name></person-group> (<year>2018</year>). <article-title>The Lombard effect: from acoustics to neural mechanisms</article-title>. <source>Trends Neurosci.</source> <volume>41</volume>, <fpage>938</fpage>&#x2013;<lpage>949</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tins.2018.07.011</pub-id>, PMID: <pub-id pub-id-type="pmid">30115413</pub-id></citation>
</ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nasir</surname> <given-names>S. M.</given-names></name> <name><surname>Ostry</surname> <given-names>D. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Speech motor learning in profoundly deaf adults</article-title>. <source>Nat. Neurosci.</source> <volume>11</volume>, <fpage>1217</fpage>&#x2013;<lpage>1222</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn.2193</pub-id>, PMID: <pub-id pub-id-type="pmid">18794839</pub-id></citation>
</ref>
<ref id="ref20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patri</surname> <given-names>J.-F.</given-names></name> <name><surname>Ostry</surname> <given-names>D. J.</given-names></name> <name><surname>Diard</surname> <given-names>J.</given-names></name> <name><surname>Schwartz</surname> <given-names>J.-L.</given-names></name> <name><surname>Trudeau-Fisette</surname> <given-names>P.</given-names></name> <name><surname>Savariaux</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Speakers are able to categorize vowels based on tongue somatosensation</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>117</volume>, <fpage>6255</fpage>&#x2013;<lpage>6263</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1911142117</pub-id>, PMID: <pub-id pub-id-type="pmid">32123070</pub-id></citation>
</ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Purcell</surname> <given-names>D. W.</given-names></name> <name><surname>Munhall</surname> <given-names>K. G.</given-names></name></person-group> (<year>2006</year>). <article-title>Compensation following real-time manipulation of formants in isolated vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>119</volume>, <fpage>2288</fpage>&#x2013;<lpage>2297</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.2173514</pub-id>, PMID: <pub-id pub-id-type="pmid">16642842</pub-id></citation>
</ref>
<ref id="ref9001">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rabiner</surname> <given-names>L. R.</given-names></name> <name><surname>Schafer</surname> <given-names>R. W.</given-names></name></person-group> (<year>1978</year>). Digital Processing of Speech Signals..</citation>
</ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rochet-Capellan</surname> <given-names>A.</given-names></name> <name><surname>Ostry</surname> <given-names>D. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Simultaneous acquisition of multiple auditory-motor transformations in speech</article-title>. <source>J. Neurosci.</source> <volume>31</volume>, <fpage>2657</fpage>&#x2013;<lpage>2662</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.6020-10.2011</pub-id>, PMID: <pub-id pub-id-type="pmid">21325534</pub-id></citation>
</ref>
<ref id="ref23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>J. L.</given-names></name> <name><surname>Escudier</surname> <given-names>P.</given-names></name></person-group> (<year>1987</year>). &#x201C;<article-title>Does the human auditory system include large scale spectral integration?</article-title>&#x201D; in <source>The psychophysics of speech perception</source>. ed. <person-group person-group-type="editor">
<name><surname>Schouten</surname> <given-names>M. E. H.</given-names></name>
</person-group> (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer Netherlands</publisher-name>), <fpage>284</fpage>&#x2013;<lpage>292</lpage>.</citation>
</ref>
<ref id="ref24">
<citation citation-type="book"><person-group person-group-type="author">
<name><surname>Stevens</surname> <given-names>K. N.</given-names></name>
</person-group> (<year>1998</year>). <source>Acoustics phonetics</source>. <publisher-loc>Cambridge, Massachusetts</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tremblay</surname> <given-names>S.</given-names></name> <name><surname>Shiller</surname> <given-names>D. M.</given-names></name> <name><surname>Ostry</surname> <given-names>D. J.</given-names></name></person-group> (<year>2003</year>). <article-title>Somatosensory basis of speech production</article-title>. <source>Nature</source> <volume>423</volume>, <fpage>866</fpage>&#x2013;<lpage>869</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nature01710</pub-id></citation>
</ref>
<ref id="ref26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Larson</surname> <given-names>C. R.</given-names></name> <name><surname>Bauer</surname> <given-names>J. J.</given-names></name> <name><surname>Hain</surname> <given-names>T. C.</given-names></name></person-group> (<year>2004</year>). <article-title>Compensation for pitch-shifted auditory feedback during the production of mandarin tone sequences</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>116</volume>, <fpage>1168</fpage>&#x2013;<lpage>1178</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.1763952</pub-id>, PMID: <pub-id pub-id-type="pmid">15376682</pub-id></citation>
</ref>
</ref-list>
</back>
</article>