<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Hum. Neurosci.</journal-id>
<journal-title>Frontiers in Human Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Hum. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5161</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnhum.2023.893785</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Human Neuroscience</subject>
<subj-group>
<subject>Hypothesis and Theory</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Learning and change in a dual lexicon model of speech production</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Davis</surname> <given-names>Maya</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/805996/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Redford</surname> <given-names>Melissa A.</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/614952/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Linguistics, University of Oregon</institution>, <addr-line>Eugene, OR</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: John Houde, University of California, San Francisco, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Pascal Perrier, UMR5216 Grenoble Images Parole Signal Automatique (GIPSA-lab), France; Connor Mayer, University of California, Irvine, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Melissa A. Redford &#x02709; <email>redford&#x00040;uoregon.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Speech and Language, a section of the journal Frontiers in Human Neuroscience</p></fn></author-notes>
<pub-date pub-type="epub">
<day>15</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>893785</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Davis and Redford.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Davis and Redford</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Speech motor processes and phonological forms influence one another because speech and language are acquired and used together. This hypothesis underpins the Computational Core (CC) model, which provides a framework for understanding the limitations of perceptually-driven changes to production. The model assumes a lexicon of motor and perceptual wordforms linked to concepts and whole-word production based on these forms. Motor wordforms are built up with speech practice. Perceptual wordforms encode ambient language patterns in detail. Speech production is the integration of the two forms. Integration results in an output trajectory through perceptual-motor space that guides articulation. Assuming successful communication of the intended concept, the output trajectory is incorporated into the existing motor wordform for that concept. Novel word production exploits existing motor wordforms to define a perceptually-acceptable path through motor space that is further modified by the perceptual wordform during integration. Simulation results show that, by preserving a distinction between motor and perceptual wordforms in the lexicon, the CC model can account for practice-based changes in the production of known words and for the effect of expressive vocabulary size on production accuracy of novel words.</p></abstract>
<kwd-group>
<kwd>computational model</kwd>
<kwd>development</kwd>
<kwd>exemplar theory</kwd>
<kwd>schema theory</kwd>
<kwd>speech motor plan</kwd>
</kwd-group>
<contract-sponsor id="cn001">Eunice Kennedy Shriver National Institute of Child Health and Human Development<named-content content-type="fundref-id">10.13039/100009633</named-content></contract-sponsor>
<counts>
<fig-count count="11"/>
<table-count count="0"/>
<equation-count count="14"/>
<ref-count count="63"/>
<page-count count="18"/>
<word-count count="14329"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>How do we produce an unfamiliar word that we have just heard? One answer is that we hear and encode the word as a sequence of phonemes; when the sequence is activated for production, the phonetic aspect is filled in, syllable structure is imposed, and the corresponding motor programs are selected and executed (Levelt, <xref ref-type="bibr" rid="B26">1989</xref>; Levelt et al., <xref ref-type="bibr" rid="B25">1999</xref>; Guenther, <xref ref-type="bibr" rid="B17">2016</xref>). But, if our production of the unfamiliar word is inaccurate, how exactly do we improve on it over time? The Computational Core (CC) model presented in this paper was built to address this question and others that arise from the developmental problem of learning and change in production&#x02014; learning and change that occurs across the lifespan.</p>
<p>One approach to the problem of learning and change in production is to assume both perceptual representations linked to phonemes and online control over execution (e.g., Houde and Nagarajan, <xref ref-type="bibr" rid="B21">2011</xref>; Parrell et al., <xref ref-type="bibr" rid="B43">2019</xref>). Under these assumptions, predictive control can be used to adjust a planned articulation that will miss the acoustic goal linked to a phoneme (Niziolek et al., <xref ref-type="bibr" rid="B42">2013</xref>). But what if the unfamiliar word that a speaker attempts makes use of familiar phonemes linked to unfamiliar sounds arranged according to an unfamiliar timing pattern? The standard approach to this problem, encountered in adult second language learning, is to assume perceptual learning at the level of the acoustic categories that define speech motor goals (Flege, <xref ref-type="bibr" rid="B12">1995</xref>; Samuel and Kraljic, <xref ref-type="bibr" rid="B52">2009</xref>; Holt and Lotto, <xref ref-type="bibr" rid="B20">2010</xref>; Flege and Bohn, <xref ref-type="bibr" rid="B13">2021</xref>). Such learning could induce change in production based on online control. Yet, studies on second language acquisition indicate that accurate perceptual learning does not result in production accuracy (Nagle and Baese-Berk, <xref ref-type="bibr" rid="B37">2022</xref>), especially if the newly learned acoustic category cannot be mapped onto a speaker&#x00027;s prior production experience (Nielsen, <xref ref-type="bibr" rid="B40">2011</xref>; Nagle, <xref ref-type="bibr" rid="B36">2018</xref>). Despite learning, changes in production accuracy are constrained.</p>
<p>Also, even if an unfamiliar sound can be attained based on perceptual learning, how is an unfamiliar timing pattern achieved? Native-like production of relative timing patterns within a word are acquired early by first language speakers, but not nearly as easily&#x02014;if ever&#x02014;by adult second language speakers (e.g., Redford and Oh, <xref ref-type="bibr" rid="B50">2017</xref>). The question of how relative timing patterns are acquired is especially difficult to address within a framework where word production and perception are mediated by phonemes. An alternative approach is to assume that learning is instead mediated by wordform representations. For example, the detailed acoustic-perceptual wordform representations of exemplar-based theories (Johnson, <xref ref-type="bibr" rid="B23">1997</xref>, <xref ref-type="bibr" rid="B24">2006</xref>; Pierrehumbert, <xref ref-type="bibr" rid="B45">2002</xref>; Smith and Hawkins, <xref ref-type="bibr" rid="B57">2012</xref>) necessarily include time-varying information about acoustic goals that could be referenced during execution. Predictive control could be used to adjust planned articulations accordingly, which would result in changes to production. But, if accurate production of unfamiliar words with unfamiliar sounds and timing patterns can be attained simply with reference to whole-word perceptual representations, then why is the correlation between perception and production in second language acquisition so far from perfect? Put another way: What constrains production during learning? Relatedly, why does production accuracy, measured against perceptual input, appear to plateau in adult second language speakers?</p>
<p>The typical explanation for constrained production accuracy in second language speech is that unfamiliar words are not directly read off from perceptual representations; rather, they are filtered through a speaker&#x00027;s phonology (Major, <xref ref-type="bibr" rid="B29">1998</xref>, <xref ref-type="bibr" rid="B30">2001</xref>). In exemplar-based theories, the phonology is language-specific knowledge about phonemes, phonotactics, and other suprasegmental patterns abstracted from across the perceptual wordforms of the lexicon (Bybee, <xref ref-type="bibr" rid="B6">2002</xref>; Pierrehumbert, <xref ref-type="bibr" rid="B46">2003</xref>). When these abstractions are stored (&#x0201C;labeled&#x0201D;) separately from the lexicon, an exemplar-based model of production makes assumptions similar to phoneme-driven models of production (see, e.g., Pierrehumbert, <xref ref-type="bibr" rid="B44">2001</xref>; Wedel, <xref ref-type="bibr" rid="B63">2006</xref>); that is, it assumes acoustic goals linked to phonemes and so it assumes phoneme-guided production. Given that time-varying information must also be learned and implemented by the motor system to effect change in production, this type of model is unsatisfactory. The CC model presents a word-based alternative to the phoneme-driven model of production. The goal of the model is to account for perceptually-driven learning and change in production and for the constraints on said change.</p>
<p>The CC model addresses learning and change from a developmental perspective. This perspective is adopted because (a) the problem of learning and change is especially acute in early language development, and (b) the adult&#x00027;s production system emerges from the child&#x00027;s and so should be derived from it. The latter reason constitutes a working hypothesis that has led us to propose a developmentally sensitive theory of speech production (Redford, <xref ref-type="bibr" rid="B48">2015</xref>, <xref ref-type="bibr" rid="B49">2019</xref>)&#x02014;a framework for understanding the evolution of speech production across the lifespan. The CC model details an important piece of the theory: the idea that speech motor processes and phonological forms influence one another because speech and language are acquired together. The model instantiation of this idea captures language-specific limits on perceptually-driven motor learning and change in production.</p></sec>
<sec id="s2">
<title>Background to the CC model</title>
<p>The CC model assumes a dual lexicon. More specifically, it assumes a lexicon comprised of separate perceptual and motor wordforms that are jointly linked to shared concepts. The CC model also assumes whole-word production. These assumptions are motivated by our developmental perspective. Both extend specific ideas from child phonology to provide the basis for a developmentally sensitive account of adult production.</p>
<p>The shapes of children&#x00027;s first words deviate markedly from adult wordforms. Work in child phonology shows that these deviations are idiosyncratic. For example, one child will say [b&#x00251;b&#x00251;] for <italic>bottle</italic> (Velleman, <xref ref-type="bibr" rid="B58">1998</xref>; cited in Velleman and Vihman, <xref ref-type="bibr" rid="B59">2002</xref>, p. 20) while another says [b&#x00251;di] (Vihman, <xref ref-type="bibr" rid="B61">2014</xref>, p. 80) and a third says [papm:] (Jaeger, <xref ref-type="bibr" rid="B22">1997</xref>; Vihman and Croft, <xref ref-type="bibr" rid="B62">2007</xref>, p. 702). The idiosyncratic productions of single words are associated with child-specific systematicities across multiple words. For example, the 18-month-old who says [p&#x00251;pm:] for &#x0201C;bottle&#x0201D; replaces voiced stops with voiceless ones in &#x0201C;baby&#x0201D; and &#x0201C;byebye,&#x0201D; rendering these as [peipi] and [(p&#x00259;)pa:i], respectively; she also produces word-final nasals in other words where they are not required (e.g., [k&#x0028C;k&#x0014B;] for &#x0201C;cracker&#x0201D; and [tak&#x0014B;] for &#x0201C;doggie&#x0201D;; see Table 9 in Vihman and Croft, <xref ref-type="bibr" rid="B62">2007</xref>, p. 702). In general, children&#x00027;s deviations from adult-like wordforms are interpreted to suggest strong motor constraints on first word production (Menn, <xref ref-type="bibr" rid="B33">1983</xref>; Nittrouer et al., <xref ref-type="bibr" rid="B41">1989</xref>; McCune and Vihman, <xref ref-type="bibr" rid="B31">2001</xref>; Davis et al., <xref ref-type="bibr" rid="B7">2002</xref>). Ferguson and Farwell (<xref ref-type="bibr" rid="B11">1975</xref>) proposed that individual children overcome these constraints by applying their favored sound patterns to best approximate whole word targets, resulting in systematic patterns of individual difference in production. McCune and Vihman (<xref ref-type="bibr" rid="B31">2001</xref>) went further to specify that a child&#x00027;s favored patterns are selected from among their vocal motor schemes that are established with vocal-motor practice during the pre-speech period. Redford (<xref ref-type="bibr" rid="B48">2015</xref>) combined this idea with the ideas of generalized motor programs from schema theory (see Schmidt, <xref ref-type="bibr" rid="B53">1975</xref>, <xref ref-type="bibr" rid="B54">2003</xref>) and gestural scores from Articulatory Phonology (Browman and Goldstein, <xref ref-type="bibr" rid="B3">1986</xref>, <xref ref-type="bibr" rid="B2">1992</xref>) to propose that, even beyond the first word period, the child continues to rely on established motor representations to guide production and that this reliance continues on through adulthood.</p>
<p>In Redford (<xref ref-type="bibr" rid="B48">2015</xref>), the motor representations that guide production were defined as temporally-structured memories built up from motor traces associated with the successful communication of concepts. They are first established when communication of a new concept is first attempted. Of course, this first attempt requires that the child also have stored a perceptual representation of the wordform that denotes a concept. This representation serves as the goal for production. Its presence in the lexicon allows for developmental change in the direction of the adult form (Redford, <xref ref-type="bibr" rid="B49">2019</xref>). But, with a hypothesis of whole-word production, comes the problem of how to explain the emergence of segment-like control over speech articulation. Davis and Redford (<xref ref-type="bibr" rid="B8">2019</xref>) proposed the Core model to address this problem. In brief, Core demonstrated that segment-like control could emerge under the assumption of whole-word production with practice-based structuring of the perceptual-motor map. This specific solution to the problem entailed formalizing a number of concepts that are also central to the CC model. <xref ref-type="fig" rid="F1">Figure 1</xref> itemizes and illustrates these concepts for quick reference. More complete descriptions of the concepts follow.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Informal definitions of Core concepts are provided (see text for detail). The illustrations to the right of the definitions depict several of the concepts. The top right panel depicts 2-dimensional motor <bold>(left)</bold> and perceptual <bold>(right)</bold> spaces that have already been structured by the trajectory crossings that occur with vocal-motor exploration and speech practice. Junctures are represented as dots, clusters as groups of identically colored dots. Each cluster of a particular color in motor space corresponds to one of the same color in perceptual space. Links between the motor and perceptual spaces are assumed but not shown. The bottom right panel depicts a silhouette <bold>(left)</bold> and an exemplar <bold>(right)</bold> in relation to the motor and perceptual spaces, respectively. The depiction of the silhouette highlights the idea that it describes a broad path through motor space. The depiction of an exemplar highlights its status as a specific trajectory through perceptual space. The distinct layouts of clusters in the simplified motor and perceptual spaces illustrates that these spaces have different topologies.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0001.tif"/>
</fig>
<sec>
<title>Core concepts</title>
<p>The CC model assumes that motor wordforms are established with reference to perceptual wordforms and that, once established, the motor and perceptual forms are integrated during production (Redford, <xref ref-type="bibr" rid="B49">2019</xref>). We first formalized this hypothesis in the Core model (Davis and Redford, <xref ref-type="bibr" rid="B8">2019</xref>). In so doing, we defined a lexicon of perceptual and motor wordforms with respect to a <italic>perceptual space</italic> and a <italic>motor space</italic>.</p>
<p>The perceptual space is the set of all possible instantaneous sounds, along with a distance metric and subsequent topology. The motor space is the set of all possible articulatory configurations, along with a distance metric and subsequent topology. The perceptual and motor spaces are grounded in the acoustic and articulatory dimensions of speech. This grounding is assumed but not defined in the CC model. In Davis and Redford (<xref ref-type="bibr" rid="B8">2019</xref>) the dimensions were as follows. A point in perceptual space was represented by coordinates measuring sound periodicity, Bark-transformed formant values, the spectral center of gravity, the width of the spectral peak, and the time derivatives of the formant and other spectral measures, as well as the time derivative of amplitude. A point in motor space was represented by coordinates measuring glottal width, the cross-sectional areas of 8 regions of the vocal tract from lips to larynx, the time derivatives of each of the cross-sectional areas, velum height, the time derivative of velum height, and the direction and force of the opening/closing movement of the jaw. Euclidean distance metrics were used to calculate the relationship between points in these spaces.</p>
<p>The perceptual wordform, defined with respect to perceptual space, is called an <italic>exemplar</italic>. The label indicates our embrace of exemplar-based accounts of phonology, sociolinguistic knowledge, and perceptual learning. None of these topics are explicitly addressed here. Instead, the exemplar is merely a precise whole-word perceptual representation. It is a function that takes a moment in time as an input and gives as an output a point in perceptual space. Such a function describes a trajectory through perceptual space; it is called an exemplar only when linked to a concept.</p>
<p>The motor wordform, defined with respect to motor space, is called a <italic>silhouette</italic>. It is a temporally-structured memory of the movements needed to achieve a wordform that communicates a concept. It is built up over time whenever its concept is successfully communicated. It is most analogous to the idea of a generalized motor program (GMP) for skilled action (Schmidt, <xref ref-type="bibr" rid="B53">1975</xref>, <xref ref-type="bibr" rid="B54">2003</xref>), except that it is a more specific representation than the GMP. Unlike a GMP, a silhouette is effector-dependent: it is defined along dimensions determined by possible movements of the speech articulators.</p>
<p>In first-word production, exemplars are purely exogenous representations. Silhouettes are endogenous representations that begin to emerge when the infant first successfully communicates a concept <italic>C</italic> by targeting the exemplar, <italic>e</italic><sub><italic>C</italic></sub>. The silhouette for the concept, <sc>SIL</sc><sub><italic>C</italic></sub>, is a function that takes a point in time as an input, and gives as an output a region in motor space that describes a general vocal tract configuration to be targeted by the motor system at that time. As with the exemplar, the subscript <italic>C</italic> denotes the silhouette&#x00027;s link to the concept <italic>C</italic>. Each time <italic>C</italic> is successfully communicated, <sc>SIL</sc><sub><italic>C</italic></sub> expands to include a trace of the motor trajectory, <italic>m</italic>, that was executed. More specifically, for each time <italic>t</italic>, the region <sc>SIL</sc><sub><italic>C</italic></sub>(<italic>t</italic>) expands the smallest amount possible such that (1) the new region also includes <italic>m</italic>(<italic>t</italic>) (as well as the old region) and (2) the new region is convex. In the CC model, new and old regions are also weighted over time with the addition of new traces representing successful communication of <italic>C</italic>, which effectively skews the silhouette in the direction of the most frequently used motor trajectories. An illustration of motor silhouette expansion is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. Silhouette weighting is not shown; it is instead described at length later in this paper.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Each panel shows the silhouette expanded from the previous panel to include an additional motor trajectory, whose path is shown in red. The regions of the silhouette at four time steps are drawn&#x02014;the region at the first time is shown in green, at the second time in blue, at the third time in purple, and at the fourth time in pink&#x02014;but theoretically infinitely many regions exist along the whole length of the silhouette. Silhouette expansion is a continual process. Motor trajectory traces are added whenever communication succeeds.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0002.tif"/>
</fig>
<p>First word production is the effective communication of a novel concept <italic>C</italic> that has been learned along with <italic>e</italic><sub><italic>C</italic></sub> from the ambient language. The infant first achieves communication of <italic>C</italic> through a matching and selection process that leverages motor trajectories established through babbling and other vocal-motor exploration. Because the trajectories in motor space are self-produced, they are automatically linked to perceptual trajectories in perceptual space. The linked motor and perceptual trajectories make up the <italic>perceptual-motor map</italic> that is exploited during the matching and selection process used to attempt a new word. This process computes the distance in perceptual space between an exemplar and the perceptual aspect of established motor trajectories through motor space. The computation allows for the combination of multiple established trajectories, one after another in time, to best approximate the intended exemplar. Along the way, the matching and selection process structures the perceptual-motor map by creating <italic>junctures</italic>, which are motor points at which the speaker shifts from one established trajectory to another nearby one.</p>
<p>Even during the initial stages of vocal-motor exploration, very specific regions of motor space are passed over multiple times in a variety of trajectories (e.g., the [&#x00251;] region in babbled utterances &#x0201C;b&#x00251;b&#x00251;&#x0201D; and &#x0201C;d&#x00251;d&#x00251;&#x0201D;). In Davis and Redford (<xref ref-type="bibr" rid="B8">2019</xref>), we proposed that frequently traversed regions in motor space become populated with junctures through the matching and selection process during the first word stage of development. The specific suggestion was that children create junctures when they combine chunks of previously experienced perceptually-linked motor trajectories in their first word attempts. For example, a child will first link the perceptual and motor spaces of speech during the pre-linguistic period, including with trajectories such as &#x0201C;b&#x00251;b&#x00251;&#x00022; and &#x0201C;d&#x00251;d&#x00251;&#x00022; produced during the babbling phase. When this child first attempts the word &#x0201C;bottle&#x0201D; they may seek to match its perceptual form by leveraging the &#x0201C;b&#x00251;b&#x00251;&#x00022; or &#x0201C;d&#x00251;d&#x00251;&#x00022; trajectory. They may even combine these trajectories to produce &#x0201C;b&#x00251;d&#x00251;&#x00022; by following the (motor) path for &#x0201C;b&#x00251;b&#x00251;&#x00022; and then transitioning to the path for &#x0201C;d&#x00251;d&#x00251;&#x00022; where the two trajectories (nearly) meet in the [&#x00251;] region of motor space. If the resulting &#x0201C;b&#x00251;d&#x00251;&#x0201D; trajectory contributes to communicative success (e.g., receiving the requested bottle), then the motor trace of the &#x0201C;b&#x00251;d&#x00251;&#x00022; trajectory is stored with a link to the concept &#x0201C;bottle.&#x0201D; This trace provides the first outline for the silhouette associated with that concept (see <xref ref-type="fig" rid="F2">Figure 2</xref>).</p>
<p>As junctures proliferate with vocal-motor practice and vocabulary expansion, they are grouped together based on their proximity to one another in motor space. These groupings are <italic>clusters</italic>. A cluster designates a specific region in motor space that is crossed over and over again while achieving similar sounds within various words. Over developmental time, clusters begin to serve as perceptual-motor units of control. They can be targeted quasi-independently because they designate regions within motor space that many trajectories go through, allowing the speaker to target the region from many other locations within the space. At a higher level of abstraction, clusters represent turning points in motor trajectories. These turning points can be conceived of as linguistically-significant vocal tract constrictions&#x02014;something similar to &#x0201C;gestures&#x0201D; in Articulatory Phonology (Browman and Goldstein, <xref ref-type="bibr" rid="B3">1986</xref>, <xref ref-type="bibr" rid="B2">1992</xref>), albeit with context-dependent timing that is defined by the trajectory leading into and out of the turning point. In perceptual space, clusters represent a quasi-static acoustic goal associated with a particular articulatory configuration&#x02014;such as the sound that we might associate with a segment (e.g., [&#x00251;]) or with a critical feature (e.g., the silence of stop closure). Although it is possible to associate clusters with gestural or featural descriptions of the phonology, we stress that they are simply units of speech motor control. Clusters only exist at the level of the perceptual-motor map. They do not necessarily create meaning contrasts. They emerge from and remain embedded in a well-defined perceptual-motor context.</p>
<p>Having introduced the Core concepts of perceptual and motor spaces, exemplars, silhouettes, the perceptual-motor map, junctures, and clusters, we are ready to describe the CC model. This model picks up after the first-word stage where the mathematical Core model leaves off.</p></sec></sec>
<sec id="s3">
<title>Architecture of the CC model</title>
<p>In Davis and Redford (<xref ref-type="bibr" rid="B8">2019</xref>), we modeled the first-word stage of spoken language development and its structuring effects on the perceptual-motor map. In this paper, we model word production at a later stage in development; a stage when the perceptual-motor map has already been structured with speech practice and so is already discretized into clusters. This new focus entails making explicit the relationship between wordform representations and the perceptual-motor map. This relationship is critical to the perceptual-motor integration of wordforms that is at the heart of speech production in the theory.</p>
<p>The silhouette and exemplar activate clusters in motor and perceptual space, respectively. In the CC model, sequential information is preserved by the silhouette with the time-varying activation of clusters in motor space.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> By contrast, the exemplar activates all its clusters at the same time in perceptual space. The time-varying activation of clusters in motor space is consistent with the ecological&#x02013;dynamic hypothesis that phonological representations incorporate time-varying (i.e., dynamic) information (Fowler, <xref ref-type="bibr" rid="B15">1980</xref>; Browman and Goldstein, <xref ref-type="bibr" rid="B3">1986</xref>, <xref ref-type="bibr" rid="B2">1992</xref>). The simultaneous activation of clusters in perceptual space is consistent with the structural hypothesis that paradigmatic relations are more important than syntagmatic ones when acoustic-auditory categories serve as speech motor goals (Diehl and Lindblom, <xref ref-type="bibr" rid="B9">2004</xref>; Flemming, <xref ref-type="bibr" rid="B14">2004</xref>). Very importantly, the different activation patterns ensure unique motor and perceptual contributions to wordform integration. The silhouette-driven activation pattern highlights context-dependent constraints on articulation. The exemplar-driven activation pattern highlights the goal of attaining (more) context-independent sounds in articulation. The different activation patterns and their specific consequences are inspired by Lindblom&#x00027;s (<xref ref-type="bibr" rid="B27">1990</xref>) H&#x00026;H theory of production. Lindblom proposes that speakers have two modes of production, a hypo mode and a hyper mode, that serve as ends of a speaking style continuum. The hypo mode results in highly coarticulated speech. The hyper mode results in more context-independent attainment of acoustic goals. The CC model reflects these extreme modes in its different activation patterns of motor and perceptual space.<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref></p>
<p>The silhouette and exemplar are integrated with cluster activation. More specifically, the activation pattern across clusters in motor space and the activation pattern across clusters in perceptual space are combined and used to determine a trajectory through the perceptual-motor map that guides speech movement. Look-ahead and look-back windows specify the extent to which information about the combined activation pattern in the future and/or past is incorporated into the current activation pattern. At any given time, the integration process thus results in the differential activation of multiple clusters. As clusters represent perceptual-motor units that are both spatial targets and perceptual goals, the simultaneous activation of several of these at once means that articulation represents a compromise between competing targets/goals.</p>
<p>Overall, the CC model claim is one of real-time speech motor planning and execution. Speech motor control is not modeled but the planning process remains compatible with current models (e.g., Houde and Nagarajan, <xref ref-type="bibr" rid="B21">2011</xref>; Guenther, <xref ref-type="bibr" rid="B17">2016</xref>; Parrell et al., <xref ref-type="bibr" rid="B43">2019</xref>). In what follows, the production process from cluster activation to perceptual-motor integration to the computation of the (perceptual-)motor output trajectory is formally described. We would point those interested in further detail to the source code, which is available on GitHub (<ext-link ext-link-type="uri" xlink:href="https://github.com/mayaekd/core)">https://github.com/mayaekd/core)</ext-link>.</p>
<sec>
<title>Cluster activation</title>
<p>Let <italic>C</italic> be a word-sized concept. The speech plan for <italic>C</italic> is the activation pattern of clusters in the perceptual-motor map that results from the selection of the silhouette that corresponds to <italic>C</italic>, <sc>SIL</sc><sub><italic>C</italic></sub>, and an exemplar, <italic>e</italic><sub><italic>C</italic></sub>, chosen from among the set of exemplars associated with <italic>C</italic>. The perceptual-motor map itself contains many clusters: <sc>CLUSTER</sc><sub>1</sub>, <sc>CLUSTER</sc><sub>2</sub>, &#x02026;&#x000A0;, <sc>CLUSTER</sc><sub><italic>n</italic></sub>. Each of these is made up of some number of junctures; assume <sc>CLUSTER</sc><sub><italic>i</italic></sub> is made up of <sc>JUNCTURE</sc><sub><italic>i</italic>,1</sub>, <sc>JUNCTURE</sc><sub><italic>i</italic>,2</sub>, &#x02026;&#x000A0;, <sc>JUNCTURE</sc><sub><italic>i</italic>,<italic>m</italic><italic>i</italic></sub>. The silhouette, <sc>SIL</sc><sub><italic>C</italic></sub>, activates clusters in motor space while the exemplar, <italic>e</italic><sub><italic>C</italic></sub>, activates clusters in perceptual space. For the reasons explained in the preceding section, the activation of clusters in motor space varies across time; the activation of clusters in perceptual space is simultaneous. The details of the activation patterns are as follows.</p>
<sec>
<title>Activation in motor space</title>
<p>First, the silhouette activates the region in motor space corresponding to the first step on the time interval. At the next time step, it activates the next corresponding region. At the one after that, the next region is activated, and so on until the path through motor space associated with the entire silhouette has been traversed.</p>
<p>When a region in motor space is activated, the activation immediately spreads across junctures that are inside that region or within a certain distance of that region. Juncture activation spreads evenly within the bounds of each cluster. This means that clusters are activated as units within motor space. Clusters that are further away from the region that is highlighted by a silhouette at a particular time step will be less activated than those that are closer to the region or are in the region itself, as depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>. More precisely, the motor activation at time <italic>t</italic> of <sc>CLUSTER</sc><sub><italic>i</italic></sub> is defined to be the average of the motor activation of every juncture in that cluster:</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The activation process in motor space is shown. The region of the silhouette at a particular time step activates junctures that are overlapping with the region or less than a certain distance away from it (distances shown by purple lines, <bold>left</bold>). Activation spreads evenly within a cluster. Activation levels are determined by the distances of the junctures to the silhouette region. Activation strength of clusters is depicted by the relative transparency-opacity of the clusters <bold>(right)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0003.tif"/>
</fig>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:mtext>LUSTER</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:mtext>UNCTURE</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where the motor activation of <sc>JUNCTURE</sc><sub><italic>i,j</italic></sub> is defined to be the highest when <sc>JUNCTURE</sc><sub><italic>i,j</italic></sub> is contained in <sc>SIL</sc><sub><italic>C</italic></sub>(<italic>t</italic>) and to fall off linearly as the distance between <sc>JUNCTURE</sc><sub><italic>i, j</italic></sub> and <sc>SIL</sc><sub><italic>C</italic></sub>(<italic>t</italic>) increases, bottoming out at zero:</p>
<disp-formula id="E2"><mml:math id="M2"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:mtext>UNCTURE</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">H</mml:mtext></mml:mstyle><mml:mtext>IGHEST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ROP</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">O</mml:mtext></mml:mstyle><mml:mtext>FF</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">S</mml:mtext></mml:mstyle><mml:mtext>LOPE</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ISTANCE</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mtext class="textsc" mathvariant="normal">SIL</mml:mtext></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:mtext>UNCTURE</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We generally set</p>
<disp-formula id="E3"><mml:math id="M3"><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">H</mml:mtext></mml:mstyle><mml:mtext>IGHEST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></disp-formula>
<p>and</p>
<disp-formula id="E4"><mml:math id="M4"><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ROP</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">O</mml:mtext></mml:mstyle><mml:mtext>FF</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">S</mml:mtext></mml:mstyle><mml:mtext>LOPE</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Although we refer here to the motor activations of the junctures, note that this should be thought of as an initial theoretical state of the cluster that is quickly changed once the activation spreads within a cluster.</p></sec>
<sec>
<title>Activation in perceptual space</title>
<p>Although the exemplar is also a function on a time interval, its set of points activate nearby junctures in perceptual space all at once when the exemplar is selected. Similar to juncture activation in motor space, activation spreads outwards from points along the exemplar trajectory; activation also decreases in strength with distance from the exemplar trajectory, and the activation is averaged across the points in the exemplar. Again, activation spreads so that all junctures within a particular cluster receive the same activation. For an exemplar consisting of points <italic>p</italic><sub>1</sub>, &#x02026;, <italic>p</italic><sub><italic>r</italic></sub>, and a cluster <sc>CLUSTER</sc><sub><italic>i</italic></sub> consisting of junctures {<sc>JUNCTURE</sc><sub><italic>i</italic>,1</sub>, &#x02026;,<sc>JUNCTURE</sc><sub><italic>i</italic>,<italic>m</italic><italic>i</italic></sub>}, we can write</p>
<disp-formula id="E5"><mml:math id="M5"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mo stretchy='false'>[</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">E</mml:mtext></mml:mstyle><mml:mtext>XEMPLAR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:msub><mml:mtext class="textsc" mathvariant="normal">LUSTER</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mtext>&#x000A0;</mml:mtext></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">E</mml:mtext></mml:mstyle><mml:mtext>XEMPLAR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:msub><mml:mtext>UNCTURE</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where</p>
<disp-formula id="E6"><mml:math id="M6"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">E</mml:mtext></mml:mstyle><mml:mtext>XEMPLAR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:msub><mml:mtext>UNCTURE</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>r</mml:mi></mml:mfrac><mml:munderover><mml:mstyle mathsize='140%' displaystyle='true'><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>r</mml:mi></mml:munderover ></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">H</mml:mtext></mml:mstyle><mml:mtext>IGHEST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mtext>ERCEPTUAL</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ROP</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">O</mml:mtext></mml:mstyle><mml:mtext>FF</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">S</mml:mtext></mml:mstyle><mml:mtext>LOPE</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mtext>ERCEPTUAL</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ISTANCE</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mstyle class="text" mathsize="15pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:msub><mml:mtext>UNCTURE</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Like in the motor case, we generally set</p>
<disp-formula id="E7"><mml:math id="M7"><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">H</mml:mtext></mml:mstyle><mml:mtext>IGHEST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mtext>ERCEPTUAL</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></disp-formula>
<p>and</p>
<disp-formula id="E8"><mml:math id="M8"><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ROP</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">O</mml:mtext></mml:mstyle><mml:mtext>FF</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">S</mml:mtext></mml:mstyle><mml:mtext>LOPE</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">P</mml:mtext></mml:mstyle><mml:mtext>ERCEPTUAL</mml:mtext><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></disp-formula></sec></sec>
<sec>
<title>Perceptual-motor integration</title>
<p>The silhouette and exemplar are integrated as follows to produce speech output. First, the combined activation pattern across the motor and perceptual spaces is computed. This pattern consists of activation that varies by time and by cluster, and is determined by the following equation for the activation at time <italic>t</italic> of cluster <sc>CLUSTER</sc><sub><italic>i</italic></sub>:</p>
<disp-formula id="E9"><mml:math id="M9"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:msub><mml:mtext>CTIVATION</mml:mtext><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:msub><mml:mtext>LUSTER</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext></mml:mrow><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:mtext>LUSTER</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mrow><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">E</mml:mtext></mml:mstyle><mml:mtext>XEMPLAR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:mtext>LUSTER</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We take the geometric mean (multiplicative mean) of the two activations rather than the arithmetic mean (additive mean) in order to determine the combined activation of a cluster in a way that ensures the correct sequencing of articulatory movements. The geometric mean functions as an AND gate rather than as an OR gate to activation&#x02014;if the activation of a cluster in either motor or perceptual space is zero, then the combined activation of that cluster is zero. Multiple clusters may compete to influence articulation, but competing clusters should all be within some limited distance of the region specified by the silhouette at that moment in time. If they are not, they should not influence articulation at all. Although the same constraint applies to both spaces, the constraint from motor space is more important. By ensuring that zero activation of a cluster in motor space cannot be overridden by some activation of the cluster in perceptual space, we are ensuring that activation from parts of the exemplar trajectory not relevant to the current time do not have an overwhelming influence on the output trajectory at that time.</p>
<p>The activation values of the cluster vary over time. When activation is computed for a specific time <italic>t</italic>, this yields a set of values <italic>a</italic><sub><italic>i</italic></sub>(<italic>t</italic>), for <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>n</italic>, where <italic>a</italic><sub><italic>i</italic></sub>(<italic>t</italic>) is the activation of <sc>CLUSTER</sc><sub><italic>i</italic></sub>. The CC model assumes that the motor system works out a compromise among the various clusters. In the model, the estimated outcome of this compromise at time <italic>t</italic> is computed as the weighted average of cluster locations in motor space, with the weights being the activations of the clusters at time <italic>t</italic>. That is, the estimated motor coordinate list, <sc>ESTMOTOR</sc>(<italic>t</italic>), is defined as:</p>
<disp-formula id="E10"><mml:math id="M10"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">E</mml:mtext></mml:mstyle><mml:mtext>ST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:mtext>ENTER</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">C</mml:mtext></mml:mstyle><mml:mtext>LUSTER</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <sc>MOTORCENTER</sc>(<sc>CLUSTER</sc><sub><italic>i</italic></sub>) is the motoric center of <sc>CLUSTER</sc><sub><italic>i</italic></sub>, which could be defined multiple ways, but which we choose to define as the average of all the junctures&#x00027; motor locations.</p>
<p>When computed for each time step determined by the silhouette, the result of integration is an output trajectory through motor space that reflects the influences from perceptual space due to the exemplar. <xref ref-type="fig" rid="F4">Figure 4</xref> provides an example of the integration process over 11 time steps (<italic>t</italic> &#x0003D; 11). The combined motor and perceptual activation pattern is shown in motor space, where relative activation is depicted by the relative opacity of the clusters. The trajectory (whose direction is light green to light blue) moves through motor space over time, mainly within the path described by the silhouette. This silhouette path is shown by the region in motor space (the royal blue octagon) that is highlighted at each time step. The full output trajectory for the selected silhouette&#x02013;exemplar pair is shown at time step 11 in motor space. It is also shown in perceptual space along with the exemplar trajectory. It is represented as a discontinuous trajectory in perceptual space to illustrate that this space has a different topology than motor space and because true discontinuities exist in perceptual space but never in motor space.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The perceptual-motor integration of wordforms results in an output trajectory through the perceptual and motor spaces, which are linked <italic>via</italic> clusters. Activation strength due to the silhouette and exemplar is depicted by the relative transparency/opacity of the clusters. These clusters are shown here in motor space for a silhouette that is 11 time steps in length. The region of the silhouette (blue octagon) is shown at each time step in this space. The resulting trajectory incorporates directional information (line shading from green to cyan blue). The trajectory is also shown in perceptual space, with the exemplar trajectory (dots shading from blue to pink). It is discontinuous in perceptual space because the topology of this space is different from that of motor space.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0004.tif"/>
</fig>
<p>Finally, a reminder that not every path through motor space is physically possible because the dimensions of this space are not (usually) independent of one another (e.g., the cross-sectional areas of 8 regions of the vocal tract from lips to larynx and the time derivatives of each of these cross-sectional areas). That said, the CC model assumes a perceptual-motor map that has been structured by experience. Under this assumption, there are a high number of paths that exist between clusters. The path that the motor system chooses to follow is estimated based on the linear combination of cluster weighting. The output trajectory that results could be predicted internally or it could be the trace of movement that has happened. Either way, the output trajectory is a result of cluster activations that are commands to the motor system; it is not itself a control structure.</p></sec></sec>
<sec id="s4">
<title>Learning and change in production</title>
<p>In the Core/CC model framework, an activated exemplar represents the perceptual goal of speech production. The jointly-activated silhouette constrains goal achievement by biasing movement toward familiar paths through motor space. In first and second language acquisition, these familiar paths are likely to diverge very substantially from the perceptual goal. Over time, path divergence narrows and production accuracy improves. This happens in one of two ways: (1) <italic>via</italic> change in the structure of the perceptual-motor map; (2) <italic>via</italic> change in the shape of existing silhouettes. The Core model addressed the former type of learning; the CC model captures the latter.</p>
<sec>
<title>Practice-driven change</title>
<p>Recall that silhouettes are only established after the perceptual-motor map is at least partially structured through prelinguistic speech practice. First word production is based on the perceptual matching and selection process that was described under the Core Concepts section. This process gives rise to the first silhouettes. Once enough silhouettes have been established, speech production is fast and automatic because it is largely driven by silhouette&#x02013;exemplar pairs that are activated when concepts are selected for communication. The repository of concepts with associated silhouette&#x02013;exemplar pairs is the expressive vocabulary. It is about half the size of the speaker&#x00027;s overall vocabulary (Brysbaert et al., <xref ref-type="bibr" rid="B4">2016</xref>). The other half is the receptive-only vocabulary. It includes only concept-associated exemplars that the speaker may choose to target at some point.</p>
<p>Production that is guided by the expressive vocabulary will entrench structure at the level of the perceptual-motor map because it constrains production to established motor paths. Accordingly, it will also slow the rate at which speech production patterns change. Some deviation from established paths is possible with the expansion of a silhouette due to random noise.<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> But, in general, the perceptual-motor integration of wordforms greatly reduces the exploration of new regions in motor space. Also, it is only with a return to a matching and selection process that new junctures and clusters can be generated (see Core Concepts). This means that practice-based changes to speech are initially more likely to occur at the level of wordform representation than at the level of the perceptual-motor map once an expressive vocabulary of a certain size is established. In the CC model, changes to the wordform occurs because practice results in silhouettes with weighted regions. These weighted regions encode frequency information and shift the silhouette in the direction of frequently used output trajectories that meet with communicative success. The details of the weighting algorithm are as follows.</p>
<sec>
<title>Weighted silhouettes</title>
<p>Recall that the silhouette highlights time-varying regions of motor space. The highlighted region is computed as the convex hull of the points associated with previously experienced trajectories (see Davis and Redford, <xref ref-type="bibr" rid="B8">2019</xref>; Sections 2.5.2, 2.5.3). In the CC model, the convex hull is partitioned into simplices (<italic>n</italic>-dimensional &#x0201C;triangles&#x00022;), each of which are assigned a weight. This means that, at each time, the highlighted region in motor space, returned by the function that is the silhouette, is a weighted homogenous simplicial complex. More specifically, let <sc>SIL</sc><sub><italic>C,n</italic></sub> be the silhouette for concept <italic>C</italic> at a particular time in development, denoted by <italic>n</italic>. Assume the current silhouette is <italic>T</italic> (relative) time units long, and let <italic>k</italic> be a sufficiently large number. Then <sc>SIL</sc><sub><italic>C,n</italic></sub> is defined to be a function with domain [0, <italic>T</italic>] that takes an input of a particular time and gives an output of the weighted region corresponding to that time in the form of a weighted simiplicial complex. That is, <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>) &#x0003D; (<italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic></sub>, <italic>v</italic><sub>1</sub>, &#x02026;, <italic>v</italic><sub><italic>k</italic></sub>), where each <italic>R</italic><sub><italic>i</italic></sub> is a simplex, and <italic>v</italic><sub><italic>i</italic></sub> is the weight of that simplex, and the following are satisfied:</p>
<list list-type="order">
<list-item><p><inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mo>&#x022C3;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> is a homogenous simplicial complex, where <inline-formula><mml:math id="M12"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> is the simplicial complex consisting of <italic>R</italic><sub><italic>i</italic></sub> and all of its faces; and</p></list-item>
<list-item><p>The union of the simplices, <inline-formula><mml:math id="M13"><mml:msubsup><mml:mrow><mml:mo>&#x022C3;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, is convex.</p></list-item>
</list>
<p>As before, the silhouette is built recursively by expanding it over time to include motor trajectories that have been successfully used to communicate a selected concept (see <xref ref-type="fig" rid="F2">Figure 2</xref>). But now that the regions specified by a silhouette are weighted, new motor trajectories will either add weight to the regions that it passes through (see Case 1) or it will affect the overall shape of the silhouette (see Case 2). The two cases are briefly described here.</p>
<p>Assume the speaker uses <sc>SIL</sc><sub><italic>C,n</italic></sub> to successfully communicate <italic>C</italic> using the motor trajectory <italic>M</italic>. Then the next iteration of the silhouette, <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>, will be defined at time <italic>t</italic> in the following way:</p>
<p><bold>Case 1</bold>. If <italic>M</italic>(<italic>t</italic>) is a point that is already in one of the simplices in <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>), then <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>(<italic>t</italic>) is the same as <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>) except with the weight of the simplex (subregion) containing <italic>M</italic>(<italic>t</italic>) increased by one. Similarly, if <italic>M</italic>(<italic>t</italic>) is contained in multiple simplices&#x02014;that is, if it lies on a shared boundary&#x02014;then <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>(<italic>t</italic>) is the same as <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>) but with all the simplices containing <italic>M</italic>(<italic>t</italic>) having their weight increased by one. This case is illustrated in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Both the upper and lower diagrams show how regions of a silhouette are reweighted as motor traces are absorbed by the wordform. In a given row, the leftmost panel shows the initial weighted region, <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>); the middle panel shows the point, <italic>M</italic>(<italic>t</italic>), that will be added; the rightmost panel shows the resulting region, <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>(<italic>t</italic>), with new weights. The numbers indicate the weights of the simplices. The upper diagram shows a simplicial 1-complex and the lower diagram shows a simplicial 2-complex.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0005.tif"/>
</fig>
<p><bold>Case 2</bold>. On the other hand, if <italic>M</italic>(<italic>t</italic>) is totally outside <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>), then <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>(<italic>t</italic>) is created by adding a minimal number of simplices to <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>) to create a homogenous simplicial complex in which <italic>M</italic>(<italic>t</italic>) is now contained, with the weights of the new simplices being 1. Examples of this case are illustrated in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Each row of panels shows how a silhouette region, <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>), changes with the inclusion of an additional point, <italic>M</italic>(<italic>t</italic>). In a given row, the leftmost panel shows the initial region, <sc>SIL</sc><sub><italic>C,n</italic></sub>(<italic>t</italic>), with the numbers indicating the weights; the middle panel shows the point to be added, <italic>M</italic>(<italic>t</italic>); the rightmost panel shows the new resulting shape and weighting of the region, <sc>SIL</sc><sub><italic>C,n</italic>&#x0002B;1</sub>(<italic>t</italic>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0006.tif"/>
</fig>
<p>The integration of a weighted silhouette, <sc>SIL</sc><sub><italic>C</italic></sub>, and an exemplar, <italic>e</italic><sub><italic>C</italic></sub>, will be similar to the integration described in the previous section but must take into account the weighting. The only thing that changes is how we compute the motor activation of a juncture. Suppose <sc>SIL</sc><sub><italic>C</italic></sub>(<italic>t</italic>) &#x0003D; (<italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic></sub>, <italic>v</italic><sub>1</sub>, &#x02026;, <italic>v</italic><sub><italic>k</italic></sub>). Then we define the weighted motor activation of <sc>JUNCTURE</sc><sub><italic>i, j</italic></sub> to be the weighted average of the activations that come from each region:</p>
<disp-formula id="E11"><mml:math id="M14"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:msub><mml:mtext>CTIVATION</mml:mtext><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:msub><mml:mtext>UNCTURE</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac><mml:mo>&#x000D7;</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">H</mml:mtext></mml:mstyle><mml:mtext>IGHEST</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">A</mml:mtext></mml:mstyle><mml:mtext>CTIVATION</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ROP</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">O</mml:mtext></mml:mstyle><mml:mtext>FF</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">S</mml:mtext></mml:mstyle><mml:mtext>LOPE</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>OTOR</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">D</mml:mtext></mml:mstyle><mml:mtext>ISTANCE</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">J</mml:mtext></mml:mstyle><mml:msub><mml:mtext>UNCTURE</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></sec>
<sec>
<title>The effect of practice on accuracy</title>
<p>To examine the effect of practice on learning and change in the model, we can use the silhouette at iteration <italic>n</italic> to produce an output trajectory that is absorbed as a motor trace into the silhouette; the new silhouette is then used for production at iteration <italic>n</italic> &#x0002B; 1. When we do this repeatedly (= practice), learning occurs with changes to the silhouette. <xref ref-type="fig" rid="F7">Figure 7</xref> shows what this change looks like, step-by-step, in a 3-dimensional space. The space represents the topology of clusters in both motor and perceptual space since these were identical in the simulation to facilitate the visualization of silhouette movement toward the exemplar in perceptual-motor space.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Silhouette change over time with each iteration of practice in a 3D perceptual-motor space where the perceptual and motor spaces have identical layouts. The upper-left panel shows the starting silhouette (blue triangular shapes), the exemplar trajectory (blue to pink dots), and the output trajectory (red to orange dots). Reading from right-to-left and then top-to-bottom, the figure illustrates how the silhouette changes in shape as it incorporates the output trajectory from each prior production.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0007.tif"/>
</fig>
<p>Imagine that the <italic>z</italic>-axis in <xref ref-type="fig" rid="F7">Figure 7</xref> represents a close&#x02013;open vocal tract dimension in motor space and the aperiodic&#x02013;periodic sound dimension in perceptual space, which do roughly correspond to one another. This would mean that activation of clusters near the <italic>x</italic>&#x02212;<italic>y</italic> plane would result in consonantal-like articulations and that activation of clusters that are further above the <italic>x</italic> &#x02212; <italic>y</italic> plane would result in vowel-like articulations. The silhouette, exemplar, and output paths in <xref ref-type="fig" rid="F7">Figure 7</xref> all travel from clusters near the <italic>x</italic> &#x02212; <italic>y</italic> plane toward those furthest from this plane and then back again&#x02014;a path that describes a CVC-shaped word. The upper-left panel shows a starting silhouette (blue triangular shapes) that might be an early representation of this word in that it is both far away from the exemplar trajectory (blue to pink dots) and is itself built up from only a few motor trajectories. With each of the 6 iterations of practice shown, the silhouette&#x00027;s path expands and changes shape: its weight gets distributed more toward the exemplar.</p>
<p>Practice-based changes to the silhouette mean that, with time, the output trajectory will draw nearer to those clusters that are especially activated by the exemplar. This effect of practice is more easily visualized in 2-dimensional space than in 3-dimensional space. <xref ref-type="fig" rid="F8">Figure 8</xref> therefore displays the results of a simulation in 2D space where, similar to <xref ref-type="fig" rid="F7">Figure 7</xref>, clusters are separated to model vowel- vs. consonant-like articulations and the motor and perceptual spaces have identical layouts. With this in mind, the exemplar trajectory shown in purple in the figure again describes a CVC trajectory. The silhouette in blue highlights a path that diverges from this trajectory. The output trajectory, which is linearly interpolated in red, is shown as a dotted line after the first time the exemplar and silhouette are integrated; it is shown as a dashed line after 50 iterations of the simulation and as a solid line after 200 iterations. Overall, the figure illustrates the expansion of the output trajectory in the direction of the larger exemplar trajectory with changes to the silhouette resulting from speech practice.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Change in output trajectory over time with iterations of practice in a 2D perceptual-motor space where the perceptual and motor spaces have identical layouts. <bold>(Upper)</bold> Silhouettes are shown as blue squares <bold>(left)</bold> or blue polygons of varying-opacity <bold>(center and right)</bold> to indicate weighting; the exemplar trajectory is traced in purple; the output trajectory in red. <bold>(Lower)</bold> The output trajectory is depicted after 1 iteration (dotted line), 50 iterations (dashed line), and 200 iterations (solid line). Reading from <bold>(left-to-right)</bold>, the output trajectory is shown to change shape to better approximate the exemplar trajectory over time.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0008.tif"/>
</fig>
<p>Intriguingly, the simulation result shown in <xref ref-type="fig" rid="F8">Figure 8</xref> indicates a period of relatively rapid change in production followed by a longer period of very marginal change. This unanticipated result is qualitatively similar to well-described patterns of early gains followed by plateaus in the motor learning literature (Adams, <xref ref-type="bibr" rid="B1">1987</xref>; Newell et al., <xref ref-type="bibr" rid="B39">2001</xref>). It also suggests that unsupervised speech practice is unlikely to drive substantial changes to production after a certain point. This is probably a good thing. After all, the persistent effect of &#x0201C;accent&#x0201D; in highly-proficient second language speakers would be hard to account for in the model if sheer practice were sufficient for a speaker to match exogenously-derived exemplars. Still, the result also suggests that other mechanisms besides practice are needed to describe the steep and relatively prolonged increase in speech production accuracy that is observed during the first 3 years of childhood. One possibility, not modeled here, is that feedback from listeners shapes learning&#x02014; especially in children&#x00027;s speech when utterances are too short to present much in the way of context for the listener. This possibility is already an assumption of the overarching theory. Recall, that motor traces are only absorbed into the silhouette if communication is successful (Redford, <xref ref-type="bibr" rid="B49">2019</xref>). Another possibility is that the production process can be perturbed to facilitate learning in such a way that merits, say, a return to the (slow) matching and selection process. If the speaker returns to the process of finding best perceptual matches between established motor trajectories and novel exemplars, new junctures may be created where different established trajectories near each other in motor space. The creation of new junctures may change the shape of existing clusters or establish new ones, thus changing the overall the structure of the perceptual-motor map in the direction of new ambient language input. Alternatively, the speaker may focus on the acoustic-perceptual shape of the word resulting in the up-weighting of contributions from the exemplar to overall cluster activation patterns during the integration process with consequences for the shape of the output trajectory. The theory allows for all of these alternatives.</p></sec></sec>
<sec>
<title>Novel word production</title>
<p>Although it is necessary to account for changes to known word production in a developmentally sensitive theory of production, it is not sufficient. This is especially true under the assumption of whole-word production as this assumption begets the problem of novel word production. Since we hypothesize that the default production strategy is silhouette&#x02013;exemplar integration once an expressive vocabulary is established, the CC model adopts a silhouette-based approach to novel word production. Although the approach is motivated by the model architecture, it also allows us to capture an empirical finding from the literature on nonword repetition: the effect of vocabulary size on production accuracy in children&#x00027;s speech and in adult second language speech.</p>
<p>Not surprisingly, older children repeat nonwords more accurately than younger children and adults with more exposure to a second language repeat nonwords in the target language more accurately than those with less exposure. But accuracy also varies independently from age and experience with vocabulary size: children with smaller vocabularies repeat nonwords less accurately than children with larger vocabularies (e.g., Metsala, <xref ref-type="bibr" rid="B34">1999</xref>; Verhagen et al., <xref ref-type="bibr" rid="B60">2022</xref>); college-aged adults with smaller second language vocabularies produce less native-like renditions of nonwords than those with larger vocabularies (Bundgaard-Nielsen et al., <xref ref-type="bibr" rid="B5">2012</xref>). Importantly, it is a child&#x00027;s expressive vocabulary size that correlates with production accuracy; not their overall vocabulary size (Edwards et al., <xref ref-type="bibr" rid="B10">2004</xref>; Munson et al., <xref ref-type="bibr" rid="B35">2005</xref>). In addition to vocabulary size, the production accuracy of novel words, or nonwords, varies with properties of the given nonword, including its &#x0201C;wordlikeness&#x0201D; and the relative frequency of its phonological patterning (e.g., Edwards et al., <xref ref-type="bibr" rid="B10">2004</xref>; Guion et al., <xref ref-type="bibr" rid="B19">2004</xref>; Munson et al., <xref ref-type="bibr" rid="B35">2005</xref>; Redford and Oh, <xref ref-type="bibr" rid="B51">2016</xref>). In brief, nonwords that obey the phonotactics of the (target) language and/or contain high frequency phonotactic patterns are repeated more accurately than those that are less &#x0201C;wordlike&#x0201D; with respect to phonotactics and/or contain less frequent patterns. The latter findings suggest that nonword production relies on existing wordform representations (Edwards et al., <xref ref-type="bibr" rid="B10">2004</xref>; Guion et al., <xref ref-type="bibr" rid="B19">2004</xref>; Redford and Oh, <xref ref-type="bibr" rid="B51">2016</xref>).<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> The CC model implements this hypothesis. When there is no silhouette for a given word, the speaker leverages the silhouettes that do exist to generate an archi-silhouette, or an A-silhouette, to provide the time-varying information needed to guide production. The A-silhouette is built by pulling together silhouettes from the nearest phonological neighbors of the targeted novel word form. In the psycholinguistic literature, phonological neighbors are wordforms that differ from one another by one phoneme (Luce and Pisoni, <xref ref-type="bibr" rid="B28">1998</xref>). In the CC model, they are based on similarity in perceptual space, which is defined using the distance metric on that space. The algorithm for building an A-silhouette is described next.</p>
<sec>
<title>Building an A-silhouette</title>
<p>Recall that the CC model has a function that measures distances between points in perceptual space. Let <italic>d</italic><sub><sc>PERC</sc></sub> be a function that measures the distance between perceptual trajectories (see Davis and Redford, <xref ref-type="bibr" rid="B8">2019</xref>). The function operates by (1) aligning trajectories in perceptual space so their endpoints line up, using linear interpolation if necessary to fill in points, so that every point in one trajectory corresponds to one in the other, (2) finding the distances between corresponding points, and then (3) taking the average of these distances.</p>
<p>Now, suppose the speaker is attempting a new word <italic>W</italic> with exemplar <italic>E</italic>. Let <italic>k</italic> be a parameter with a fixed value representing the number of similar words from which to build an A-silhouette for <italic>W</italic>. For each word <italic>w</italic><sub><italic>i</italic></sub> (<italic>i</italic> &#x0003D; 1, 2, 3, &#x02026;&#x000A0;) in the expressive lexicon, let <italic>e</italic><sub><italic>i</italic></sub> be its corresponding exemplar and let <sc>SIL</sc><sub><italic>i</italic></sub> be its corresponding silhouette. Assume that the expressive words are already ordered by perceptual closeness to <italic>W</italic>; that is, <italic>d</italic><sub><sc>PERC</sc></sub>(<italic>w</italic><sub>1</sub>, <italic>W</italic>) &#x02264; <italic>d</italic><sub><sc>PERC</sc></sub>(<italic>w</italic><sub>2</sub>, <italic>W</italic>) &#x02264; <italic>d</italic><sub><sc>PERC</sc></sub>(<italic>w</italic><sub>3</sub>, <italic>W</italic>) &#x02264; &#x02026;&#x000A0; Then <italic>w</italic><sub>1</sub>, <italic>w</italic><sub>2</sub>, &#x02026;, <italic>w</italic><sub><italic>k</italic></sub> are the <italic>k</italic> perceptually closest words to <italic>W</italic> in the expressive lexicon, and their silhouettes, <sc>SIL</sc><sub>1</sub>,<sc>SIL</sc><sub>2</sub>, &#x02026;,<sc>SIL</sc><sub><italic>k</italic></sub>, are chosen to build the A-silhouette.</p>
<p>We assume that the chosen silhouettes have already been modified so that they are aligned with each other in time. The A-silhouette is a silhouette <sc>ASIL</sc> such that at each time <italic>t</italic>, <sc>ASIL</sc> is defined as a combination of <sc>SIL</sc><sub><italic>i</italic></sub>(<italic>t</italic>) for <italic>i</italic> &#x0003D; 1, 2, &#x02026;, <italic>k</italic>. More specifically, fix <italic>t</italic> and let <sc>SIL</sc><sub><italic>i</italic></sub>(<italic>t</italic>) &#x0003D; (<italic>R</italic><sub><italic>i</italic>,1</sub>, <italic>R</italic><sub><italic>i</italic>,2</sub>, &#x02026;, <italic>R</italic><sub><italic>i</italic>,<italic>n</italic><italic>i</italic></sub>, <italic>v</italic><sub><italic>i</italic>,1</sub>, <italic>v</italic><sub><italic>i</italic>,2</sub>, &#x02026;, <italic>v</italic><sub><italic>i</italic>,<italic>n</italic><italic>i</italic></sub>) where <italic>R</italic><sub><italic>i</italic>, 1</sub>, <italic>R</italic><sub><italic>i</italic>, 2</sub>, &#x02026;, <italic>R</italic><sub><italic>i</italic>,<italic>n</italic><italic>i</italic></sub> are the <italic>n</italic><sub><italic>i</italic></sub> subregions making up <sc>SIL</sc><sub><italic>i</italic></sub>(<italic>t</italic>) and <italic>v</italic><sub><italic>i</italic>, 1</sub>, <italic>v</italic><sub><italic>i</italic>, 2</sub>, &#x02026;, <italic>v</italic><sub><italic>i</italic>,<italic>n</italic><italic>i</italic></sub> are their respective weights. The weights are scaled so that the maximum weight at time <italic>t</italic> is the same for each silhouette. That is, let <sc>MAXWEIGHT</sc><sub><italic>i</italic></sub> &#x0003D; max(<sub><italic>v</italic><sub><italic>i, j</italic></sub>)<italic>j</italic> &#x0003D; 1, 2, &#x02026;, <italic>n</italic><sub><italic>i</italic></sub></sub>, meaning <sc>MAXWEIGHT</sc><sub><italic>i</italic></sub> is the maximum weight of the regions in the <italic>i</italic>th silhouette (at time <italic>t</italic>). Then we use <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> to denote the scaled version of <italic>v</italic><sub><italic>i, j</italic></sub>, and we define <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mo class="qopname">max</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>AX</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">W</mml:mtext></mml:mstyle><mml:mtext>EIGHT</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo class="qopname">&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">M</mml:mtext></mml:mstyle><mml:mtext>AX</mml:mtext><mml:mstyle class="text" mathsize="12.5pt" mathcolor="black"><mml:mtext class="textsc" mathvariant="normal">W</mml:mtext></mml:mstyle><mml:mtext>EIGHT</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula>. That is, for each region, we take the original weight, multiply it by the maximum weight of all the regions in all the silhouettes, and then divide that by the maximum weight of the regions in that silhouette. Finally, the regions from all the silhouettes at time <italic>t</italic> are combined using the new weights. The combination process is demonstrated first with an example. The general process is given afterwards.</p>
<p>Suppose we have 3 aligned silhouettes, <sc>SIL</sc><sub>1</sub>,<sc>SIL</sc><sub>2</sub>, <sc>SIL</sc><sub>3</sub>, and suppose that at time 2, each silhouette consists of two regions, <italic>R</italic><sub>1,1</sub> and <italic>R</italic><sub>1,2</sub>; <italic>R</italic><sub>2,1</sub> and <italic>R</italic><sub>2,2</sub>; and <italic>R</italic><sub>3,1</sub> and <italic>R</italic><sub>3,2</sub>, respectively, where they overlap as shown in <xref ref-type="fig" rid="F9">Figure 9</xref>. Suppose these regions have respective weights <italic>v</italic><sub>1,1</sub> &#x0003D; 3 and <italic>v</italic><sub>1,2</sub> &#x0003D; 4; <italic>v</italic><sub>2,1</sub> &#x0003D; 5 and <italic>v</italic><sub>2,2</sub> &#x0003D; 8; and <italic>v</italic><sub>3,1</sub> &#x0003D; 2 and <italic>v</italic><sub>3,2</sub> &#x0003D; 1. That is,</p>
<disp-formula id="E12"><mml:math id="M17"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textsc" mathvariant="normal">SIL</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>4</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mtext class="textrm" mathvariant="normal">where&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;and&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;are the pink triangles</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textsc" mathvariant="normal">SIL</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>8</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mtext class="textrm" mathvariant="normal">where&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;and&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;are the purple triangles</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textsc" mathvariant="normal">SIL</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mtext class="textrm" mathvariant="normal">where&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;and&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">&#x000A0;are the blue triangles</mml:mtext></mml:mtd></mml:mtr><mml:mtr></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Then scaling the weights as described above yields a maximum weight of 8 for each region; that is,</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>(Left)</bold> Regions corresponding to three motor silhouettes at a particular time; each silhouette at this time has a two subregions, with weights labeled. <bold>(Center)</bold> The regions with scaled weights. <bold>(Right)</bold> The weighted combination of the regions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0009.tif"/>
</fig>
<disp-formula id="E13"><mml:math id="M18"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>6</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>4</mml:mn><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Then we will define the combination of these regions, <sc>ASIL</sc>(2), to be the weighted region shown in red. That is, <sc>ASIL</sc>(2) &#x0003D; (<italic>T</italic><sub>1</sub>, <italic>T</italic><sub>2</sub>, <italic>T</italic><sub>3</sub>, <italic>T</italic><sub>4</sub>, <italic>T</italic><sub>5</sub>, <italic>T</italic><sub>6</sub>, <italic>T</italic><sub>7</sub>, <italic>T</italic><sub>8</sub>, <italic>T</italic><sub>9</sub>, <italic>T</italic><sub>10</sub>, <italic>T</italic><sub>11</sub>, <italic>T</italic><sub>12</sub>, <italic>T</italic><sub>13</sub>, <italic>T</italic><sub>14</sub>, <italic>T</italic><sub>15</sub>, 8, 4, 1, 10, 20, 10, 9, 5, 6, 16, 16, 8, 1, 8, 1), where <italic>T</italic><sub><italic>i</italic></sub> are the red triangles shown in <xref ref-type="fig" rid="F9">Figure 9</xref>.</p>
<p>Returning to the general case where the selected silhouettes are <sc>SIL</sc><sub>1</sub>, <sc>SIL</sc><sub>2</sub>, &#x02026;, <sc>SIL</sc><sub><italic>k</italic></sub>, we define <sc>ASIL</sc>(<italic>t</italic>) &#x0003D; (<italic>T</italic><sub>1</sub>, <italic>T</italic><sub>2</sub>, &#x02026;, <italic>T</italic><sub><italic>n</italic></sub>, <italic>v</italic><sub>1</sub>, <italic>v</italic><sub>2</sub>, &#x02026;, <italic>v</italic><sub><italic>n</italic></sub>) where <italic>T</italic><sub>1</sub>, <italic>T</italic><sub>2</sub>, &#x02026;, <italic>T</italic><sub><italic>n</italic></sub> is a triangulation of the convex hull of all the regions making up all the <sc>SIL</sc><sub><italic>i</italic></sub>(<italic>t</italic>). For each <italic>i</italic>, the weight <italic>v</italic><sub><italic>i</italic></sub> of the region <italic>T</italic><sub><italic>i</italic></sub> is defined as follows: either (1) <italic>v</italic><sub><italic>i</italic></sub> is equal to the sum of the weights of all the original regions that <italic>T</italic><sub><italic>i</italic></sub> lies inside, or (2) <italic>v</italic><sub><italic>i</italic></sub> &#x0003D; 1 if it lies in none of the original regions but is still part of the convex hull.</p>
<p>That is, <sc>ASIL</sc>(<italic>t</italic>) &#x0003D; (<italic>T</italic><sub>1</sub>, <italic>T</italic><sub>2</sub>, &#x02026;, <italic>T</italic><sub><italic>n</italic></sub>, <italic>v</italic><sub>1</sub>, <italic>v</italic><sub>2</sub>, &#x02026;, <italic>v</italic><sub><italic>n</italic></sub>) such that</p>
<list list-type="order">
<list-item><p><italic>T</italic><sub>1</sub>&#x0222A;<italic>T</italic><sub>2</sub>&#x0222A;&#x022EF;&#x0222A;<italic>T</italic><sub><italic>n</italic></sub> &#x0003D;<sc>CONVHULL</sc>(<italic>R</italic><sub>1,1</sub>, <italic>R</italic><sub>1,2</sub>, &#x02026;, <italic>R</italic><sub>1,<italic>n</italic><sub>1</sub></sub>, <italic>R</italic><sub>2,1</sub>, <italic>R</italic><sub>2,2</sub>, &#x02026;, <italic>R</italic><sub>2,<italic>n</italic><sub>2</sub></sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>, 1</sub>, <italic>R</italic><sub><italic>k</italic>, 2</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>,<italic>n</italic><italic>k</italic></sub>)</p></list-item>
<list-item><p>Each <italic>T</italic><sub><italic>i</italic></sub> is a simplex (an &#x0201C;<italic>n</italic>-dimensional triangle&#x00022;)</p></list-item>
<list-item><p>The regions do not overlap each other more than at a boundary: interior(<italic>T</italic><sub><italic>i</italic></sub>)&#x02229;interior(<italic>T</italic><sub><italic>j</italic></sub>) &#x0003D; &#x02298; for all 1 &#x02264; <italic>i</italic> &#x0003C; <italic>j</italic> &#x02264; <italic>n</italic></p></list-item>
<list-item><p>For every set <italic>A</italic> &#x0003D; {<italic>R</italic><sub><italic>i</italic><sub>1</sub>, <italic>j</italic><sub>1</sub></sub>, &#x02026;, <italic>R</italic><sub><italic>i</italic><sub><italic>m</italic></sub>, <italic>j</italic><sub><italic>m</italic></sub></sub>}, either &#x022C2;<sub><italic>a</italic>&#x02208;<italic>A</italic></sub><italic>a</italic> &#x0003D; &#x02298; or &#x022C2;<sub><italic>a</italic>&#x02208;<italic>A</italic></sub><italic>a</italic> &#x0003D; <italic>T</italic><sub><italic>k</italic><sub>1</sub></sub>&#x0222A;<italic>T</italic><sub><italic>k</italic><sub>2</sub></sub>&#x0222A;&#x022EF;&#x0222A;<italic>T</italic><sub><italic>k</italic><sub><italic>s</italic></sub></sub> for some <italic>k</italic><sub>1</sub>, <italic>k</italic><sub>2</sub>, &#x02026;, <italic>k</italic><sub><italic>s</italic></sub> &#x02208; {1, 2, &#x02026;, <italic>n</italic>}</p></list-item>
<list-item><p><inline-formula><mml:math id="M19"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>&#x02113;</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;containing&#x000A0;</mml:mtext><mml:msub><mml:mi>T</mml:mi><mml:mi>&#x02113;</mml:mi></mml:msub></mml:mrow></mml:munder><mml:mrow><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>&#x02032;</mml:mo></mml:msubsup></mml:mrow></mml:mstyle></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#x000A0;at&#x000A0;least&#x000A0;one&#x000A0;</mml:mtext><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;contains&#x000A0;</mml:mtext><mml:msub><mml:mi>T</mml:mi><mml:mi>&#x02113;</mml:mi></mml:msub><mml:mtext>,&#x000A0;i.e.&#x000A0;if&#x000A0;this</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mtable columnalign='center'><mml:mtr><mml:mtd><mml:mtext>sum&#x000A0;is&#x000A0;nonzero</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;otherwise</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula></p></list-item>
</list></sec>
<sec>
<title>The effect of vocabulary size on accuracy</title>
<p>According to the process outlined above, exemplars of words that belong only to the receptive vocabulary are attempted by combining the silhouettes of perceptually similar words that belong to the expressive vocabulary. But how good is this combined form? To what extent will it allow for a path through motor space that overlaps with the clusters activated by the novel exemplar in perceptual space? In this section, we demonstrate that the answer to these questions depends on the size of the expressive vocabulary. More specifically, we show that the goodness of the A-silhouette depends on the goodness of the perceptual matches to the novel wordform. The goodness of the perceptual matches in turn depends on the size of the speaker&#x00027;s expressive vocabulary, <italic>V</italic>, in relation to the larger vocabulary, <italic>L</italic>.</p>
<p>The larger vocabulary, <italic>L</italic>, is a theoretic construct that represents the set of words in a language over which the phonology is defined. The size of <italic>L</italic> depends on what exactly it represents. <italic>L</italic> could represent the size of a dictionary vocabulary or the size of an adult&#x00027;s overall vocabulary (10,000 words to 200,000 words) or the expressive vocabulary only, that is, half of the overall vocabulary size (Brysbaert et al., <xref ref-type="bibr" rid="B4">2016</xref>). Alternatively, <italic>L</italic> could represent the total number of words required for normal every-day communication. We estimate that number here as 2500 words. This number is based on Nation and Waring&#x00027;s (<xref ref-type="bibr" rid="B38">1997</xref>) synthesis of research findings on the relationship between vocabulary size and second language acquisition for pedagogical purposes. Nation and Waring suggest that &#x0201C;a vocabulary size of 2,000&#x02013;3,000 words provides a very good basis for language use.&#x0201D; This suggestion is based on the vocabulary size needed to achieve over 90% coverage of English texts aimed at young adult readers (e.g., 2,600 words result in 96% text coverage and a density of 1 unknown word occurring every 25 words). Insofar as young adults are perfectly good speakers of their native language, a vocabulary of roughly 2500 wordforms should adequately cover the phonological space of a language. It therefore provides a good basis for <italic>L</italic>.</p>
<p>Given that the words in <italic>L</italic> describe the phonological space for a particular language, it is clear that a subset <italic>V</italic> of <italic>L</italic> may fail to do so. And, if it fails to do so, then the A-silhouettes that are built up from wordforms in <italic>V</italic> are unlikely to reliably provide accurate information regarding the best path to take through motor space in order to approximate an exemplar that represents a novel word target. In particular, suppose <italic>W</italic> is the novel word, and suppose the A-silhouette is going to be built from the <italic>k</italic> words in <italic>V</italic> that are perceptually closest to <italic>W</italic>. What is the probability that these <italic>k</italic> words from <italic>V</italic> are actually some of the closest words to <italic>W</italic> in all of <italic>L</italic>? To make it more concrete, let <italic>k</italic> &#x0003D; 3 and let &#x0201C;best&#x00022; be a synonym for &#x0201C;perceptually closest to <italic>W</italic>.&#x0201D; We can ask:</p>
<list list-type="bullet">
<list-item><p>What is the probability that the 3 best words in <italic>L</italic> are contained in <italic>V</italic> (and thus are also the 3 best words in <italic>V</italic>)?</p></list-item>
<list-item><p>What is the probability that 3 of the 4 best words in <italic>L</italic> are contained in <italic>V</italic>?</p></list-item>
<list-item><p>What is the probability that 3 of the 5 best words in <italic>L</italic> are contained in <italic>V</italic>?</p></list-item>
</list>
<p>More generally:</p>
<list list-type="bullet">
<list-item><p>What is the probability that 3 of the 3 &#x0002B; <italic>r</italic> best words in <italic>L</italic> are contained in <italic>V</italic>?</p></list-item>
</list>
<p>And even more generally:</p>
<list list-type="bullet">
<list-item><p>What is the probability that <italic>k</italic> of the <italic>k</italic> &#x0002B; <italic>r</italic> best words in <italic>L</italic> are contained in <italic>V</italic>?</p></list-item>
</list>
<p>Naturally, this probability increases as the size of <italic>V</italic> increases. In particular, if <italic>n</italic> is the number of words in <italic>L</italic> and <italic>m</italic> is the number of words in <italic>V</italic>, the probability that <italic>k</italic> of the <italic>k</italic> &#x0002B; <italic>r</italic> best words in <italic>L</italic> are contained in <italic>V</italic>, i.e. that the <italic>k</italic> best words in <italic>V</italic> are a subset of the <italic>k</italic> &#x0002B; <italic>r</italic> best words in all of <italic>L</italic>, is:</p>
<disp-formula id="E14"><mml:math id="M20"><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>-</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>m</mml:mi><mml:mo>!</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>!</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>!</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>(assuming <italic>k</italic> &#x02264; <italic>m</italic> and <italic>k</italic> &#x0002B; <italic>r</italic> &#x02264; <italic>n</italic>). This is illustrated in <xref ref-type="fig" rid="F10">Figure 10</xref> for an <italic>L</italic> of size 2,500, and various values of <italic>k</italic> and <italic>p</italic>(&#x0003D; <italic>k</italic> &#x0002B; <italic>r</italic>). As the size of <italic>V</italic> increases (as we move right on the <italic>x</italic>-axis) the probability that the speaker&#x00027;s expressive vocabulary includes enough of the larger vocabulary&#x00027;s perceptually closest words to <italic>W</italic> also increases. This increase differs somewhat depending on the value of <italic>k</italic>, which, recall, is the number of words that are chosen to create the A-silhouette, and the value of <italic>p</italic>(&#x0003D; <italic>k</italic> &#x0002B; <italic>r</italic>), which is the number of words in <italic>L</italic> that are perceptually &#x0201C;close enough&#x00022; to the novel word that any subset of <italic>k</italic> of those words could be used to create a very good A-silhouette for guiding production. The data here suggest that if <italic>V</italic> is 500&#x02014;which is approximately the size of a typically-developing 3-year-old&#x00027;s expressive vocabulary (Shipley and McAfee, <xref ref-type="bibr" rid="B55">2019</xref>)<xref ref-type="fn" rid="fn0005"><sup>5</sup></xref>&#x02014;then it has a good chance (about 70%) of containing at least 1 of the 5 best words in <italic>L</italic>, but a poor chance (about 15%) of containing at least 3 of the 7 best words. This observation begs the question of how many closest perceptual wordforms are needed to generate an A-silhouette that will yield a good approximation of the novel word target. The data in the figure suggests that if in general any 3 of the closest 6 words to a goal word will yield a good A-silhouette, then good A-silhouettes can be reliably generated when <italic>V</italic> is 70% of <italic>L</italic>, or 1,750 words.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>The probability that an expressive vocabulary of size <italic>m</italic>, drawn randomly from a larger vocabulary size of 2,500 words, contains at least <italic>k</italic> of the <italic>p</italic> words in <italic>L</italic> that are the closest perceptual matches to an exemplar that represents some novel word.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0010.tif"/>
</fig>
<p>The predicted effect of an A-silhouette that is built up from a subset of &#x0201C;close enough&#x0201D; silhouettes is an output trajectory that approximates the exemplar of the novel word that is being attempted. Less good A-silhouettes result in less accurate output trajectories. To test this prediction, and so the effect of vocabulary size on the production accuracy, we simulated novel CVCV word production given different expressive vocabulary sizes and an all-CVCV language of 1,296 words. The language was built up from paths through a 2D motor space and a 2D perceptual space. The spaces had 6 clusters deemed consonantal articulations and 6 clusters deemed vocalic articulations. These groups of 6 were separated from one another in the <italic>y</italic> direction in motor space. The transformation from motor space to perceptual space was one that maintained this consonant-vowel separation, but shuffled the clusters in the <italic>x</italic> direction to render different topologies for the two spaces.<xref ref-type="fn" rid="fn0006"><sup>6</sup></xref> The 1,296 wordforms were all the possible paths going from center-of-cluster to center-of-cluster in a CVCV-like pattern (1, 296 &#x0003D; 6 consonants &#x000D7; 6 vowels &#x000D7; 6 consonants &#x000D7; 6 vowels). The silhouettes consisted of 7 uniformly-weighted square regions, with regions 1, 3, 5, 7 centered on the appropriate CVCV clusters, and regions 2, 4, 6 falling evenly between them. The exemplar paired with a silhouette was built by taking the motor trajectory going through the center of the silhouette and finding the corresponding perceptual trajectory based on the transformation between the spaces.</p>
<p>In the simulation, the novel word was an exemplar randomly selected from the language. The initial expressive vocabulary consisted of 5 silhouette&#x02013;exemplar pairs randomly selected from the 1,296-word language (minus the novel word). An A-silhouette was built from the 3 words in the expressive vocabulary that were perceptually closest to the novel word. An output trajectory was computed based on the integration of the A-silhouette and the novel word exemplar. The distance in perceptual space between the output trajectory and the novel word exemplar trajectory was calculated to measure the accuracy of the output trajectory. The initial vocabulary was then increased to 10 words by adding an additional 5 random CVCV words to the expressive vocabulary. A new A-silhouette was made, again using the 3 closest words, an output trajectory computed, and the distance in space from the exemplar calculated. The expressive vocabulary was next increased to 20, then 40, and so on for a range of sizes up to 1,200. For each vocabulary size, the output trajectory based on A-silhouette&#x02013;exemplar integration was found and the distance from the novel word exemplar calculated.</p>
<p>The entire simulation was run 20 times with different randomly-selected novel words and expressive vocabularies. <xref ref-type="fig" rid="F11">Figure 11</xref> shows the mean distance between output and exemplar trajectory as a function of vocabulary size for the 20 runs. The data indicate increasing production accuracy with increasing vocabulary size. The increase is steeper early on and more gradual later on. The pattern qualitatively matches the very robust increases in production accuracy seen during the earliest stages of speech acquisition followed by slower gains but continuing improvement.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>The average distance of the output trajectory from a novel word exemplar given integration based on an A-silhouette built from the 3 perceptually-closest silhouettes to the novel word exemplar. Distances are shown as a function of vocabulary size (log-transformed). Dotted lines show &#x000B1;1 standard deviation around the mean for the simulation, which was run 20 times.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnhum-17-893785-g0011.tif"/>
</fig></sec></sec></sec>
<sec id="s5">
<title>Summary and conclusion</title>
<p>The CC model captures the observation that speech develops with language use to address the problem of learning and change in production. The child&#x00027;s first words represent both a first attempt at speech and a first attempt to communicate using language. Control over speech action evolves in this communicative context with speech practice. And we engage in a whole lot of practice. The estimate from voice recordings of college-aged adults is that we speak about 16,000 words a day (Mehl et al., <xref ref-type="bibr" rid="B32">2007</xref>). This kind of practice must have implications for speech production. In our theory it does.</p>
<p>The theory assumes a dual lexicon and whole-word speech production. The motor wordforms (silhouettes) in the lexicon are endogenous representations built up with speech practice. The perceptual wordforms (exemplars) are exogenous representations that reflect ambient language patterns. Speech production is the integration of these forms in the perceptual-motor map. The perceptual-motor map is discretized with vocal-motor practice, including speech practice, into language-specific clusters that represent units of speech motor control. The perceptual aspect of these units can be related to sound categories or to perceptual features; the motor aspect to vocal tract constrictions similar in some respects to the &#x0201C;gestures&#x0201D; of Articulatory Phonology except that do not necessarily code meaning contrast. They are units that represent both acoustic-auditory goals and spatial targets for the speech motor system.</p>
<p>When a word is selected for output from the expressive vocabulary, its silhouette and exemplar activate clusters in motor and perceptual space. The silhouette contributes time-varying information about movement through motor space within a window of activation that allows contextual effects to emerge (i.e., syntagmatic relations). The exemplar provides static information about the acoustic-auditory goals to be achieved for successful communication (i.e., paradigmatic relations). Perceptual-motor integration of the forms results in an output trajectory that traces speech movement due to the integration process. If the speech movement described by an output trajectory results in successful communication, then its trace is absorbed into the silhouette for the concept intended and communicated. By this mechanism, the silhouette for a word is shifted in the direction of the exemplar(s) of a word. This is the practice-based mechanism for motor learning and change in the model. Simulation results suggest that practice has a large initial effect on production accuracy, and that this effect plateaus relatively quickly, or is, at least reduced to only a very marginal effect over time. Overall, the pattern recalls the power law function of motor learning (see Newell et al., <xref ref-type="bibr" rid="B39">2001</xref>).</p>
<p>Learning and change in the model also occurs with novel word production. In a system where silhouette&#x02013;exemplar integration is the dominant mode of production, the accurate rendition of a novel word requires a silhouette-like form to achieve the targeted exemplar. The new silhouette, an A-silhouette, is created by combining existing silhouettes, which are selected based on the closeness of their perceptual counterparts to the novel-word exemplar. The algorithm for combining existing silhouettes to generate an A-silhouette relies on the model-internal fact that the expressive lexicon is structured according to the perceptual and motor spaces within which the dual wordforms reside. The receptive-only lexicon is also structured by the perceptual space within which single wordforms reside alongside their dual wordform neighbors. Although merely a logical consequence of the CC model architecture, the phonetically-structured lexicon of our theory parallels the well-established psycholinguistic hypothesis of a phonologically organized lexicon (Pisoni et al., <xref ref-type="bibr" rid="B47">1985</xref>; Luce and Pisoni, <xref ref-type="bibr" rid="B28">1998</xref>).</p>
<p>The integration of an A-silhouette and an exemplar associated with a novel word results in an output trajectory. The extent to which this output trajectory is similar to the exemplar varies naturally with vocabulary size. Smaller vocabularies do not regularly allow for the same quality of perceptual matches as larger vocabularies and so the A-silhouettes that are created based on a small vocabulary result in poorer production accuracy than those created based on larger vocabularies. This implication of the model is consistent with the effect of vocabulary size on nonword repetition accuracy in children&#x00027;s speech and in adult second language speech.</p>
<sec>
<title>Why core?</title>
<p>The CC model provides an intellectual framework within which to understand developmental changes in speech production. For this reason, it also provides a framework for understanding the emergence of individual differences in speech production, including differences due to developmental disorder. The model perspective is that these differences are the result of developmental trajectories that are themselves defined by iterative processes that may compound over time the effects of small differences in initial parameter settings.</p>
<p>No existing linguistic or psycholinguistic theory of speech production that we know of has been advanced with the particular aim of explaining change in a manner that naturally gives rise to different outcomes. To the best of our knowledge, every instantiated theory that handles adult spoken language production assumes (more or less) current descriptions of the adult speech behavior as its starting and ending point. They are teleological in this way. For this reason, individual differences are often treated as specific deviations from normativity rather than as the product of differing initial conditions and constraints on development. The teleological frame is, in part, the legacy of Saussure and his emphasis on the synchronic over the diachronic. It is, in part, the legacy of Chomsky and his emphasis on what is universal and so what might be innate. Collective knowledge about speech and language has grown enormously under these legacies. Our goal is to reframe some of this existing knowledge within an emergentist framework to better understand individual differences and to encourage new avenues of empirical research.</p></sec>
<sec>
<title>Future directions</title>
<p>The Core/CC model framework emphasizes the role of variability in learning and change. Recall that speakers can only target previously experienced paths through motor space, even when attempting a new perceptual goal (sound or word). Under this hypothesis, noise in the periphery due to immature motor control provides an important learning benefit, not least of which is better and more thorough exploration of the motor space than would otherwise be possible; and it is through exploration that junctures proliferate in the perceptual-motor map in the first place. Clusters, the units of speech motor control, are created from these junctures. Clusters allow speakers to achieve language-specific acoustic-auditory goals. The proliferation of junctures in motor space is a prerequisite for doing so. The highly variable speech movements of children&#x00027;s speech compared to adults&#x00027; speech may therefore be what allows them to acquire native-like speech sound articulation in a second language&#x02014;something that adult learners are purportedly unable to do. The prediction is then for an increase in perceived accentedness in speech with age of acquisition, but one that tracks more specifically with age-related changes in the variability of speech movements. Age-of-acquisition effects are, of course, well-described in studies of second language speech&#x02014;in fact, the age of 5&#x02013;7 years has been suggested as a cut-off for nativelike acquisition of a second language speech category (e.g., Guion, <xref ref-type="bibr" rid="B18">2003</xref>)&#x02013;but the explanation for why this might be is elusive. Our prediction suggests that the cut-off is causally tied to the rapid leveling off of articulatory variability during developmnet (see, e.g., Smith and Zelaznik, <xref ref-type="bibr" rid="B56">2004</xref>). Also, note that, just as children&#x00027;s speech continues to exhibit greater variability than adult speech until age 12&#x02013;14 years, so too the age-of-acquisition effect on second language speech is graded&#x02014;there is not an abrupt cut-off in native-like attainment of a second language at age 5 or 7 years across all individuals. Future research on second language acquisition could investigate the extent to which greater variability in the realization of sounds at one stage in development predicts more accurate (= target-like) attainment of these sounds at a later stage.</p>
<p>The Core/CC model framework also predicts a relatively abrupt transition from a period of exceptionally high variability in the production of novel words to a period of relative stability in word production that corresponds to a change in strategy from the matching and selection of existing motor trajectories to create best perceptual approximations of novel exemplar trajectories to a strategy based on an expressive vocabulary and so on the integration of perceptual and motor wordforms. Consistent with this, Vihman (<xref ref-type="bibr" rid="B61">2014</xref>) describes a shift in word production around 2 years of age that she attributes to a shift away from a strategy of schema-based production and toward template-based word production. Our A-silhouettes might be considered templates in that they are not word-specific, but rather an amalgam of similar sounding words. Vihman (<xref ref-type="bibr" rid="B61">2014</xref>) also notes that a schema-based and templatic-based production strategy may co-exist for some time during development, and that some children never really exihibit a phase that can truly be described as templatic. The CC model suggests that the path toward understanding these individual differences is through more careful study of the relationship between expressive vocabulary and phonological development during the young preschool years. This study should include not just the size of the expressive vocabulary, but also its detail regarding its phonological structure in perceptual and motor spaces.</p>
<p>In the CC model, the extent to which A-silhouettes allow for matching exogenous wordform representations varies with the size and structure of the expressive vocabulary. As already noted, this pattern is consistent with the effect of vocabulary size on nonword repetition accuracy in children&#x00027;s speech. But a detailed consideration of this relationship leads us now to wonder about an inflection point in development when production is no longer driven by the integration of the specific perceptual and motor wordforms that are stored together in an expressive vocabulary. Rather, it could be driven by the integration of perceptual wordforms and A-silhouettes. What this might mean is a question for future research. But, to give that research some structure, let us consider the problem in a little more detail.</p>
<p>Under the simplifying assumption that an expressive vocabulary is some random subset of the words in a language, it is clear that an A-silhouette will provide as good guidance as a more specific motor wordform once the expressive vocabulary reaches a certain size. The question then becomes: What is that certain size? This depends in part on the number of words needed to adequately describe the language. In our simulations, the language vocabulary was 2,500 words. This number of words was chosen on the grounds that between 2,000 and 3,000 words is adequate for everyday communication in English. We presume that this means that a specific set of 2,500 words adequately describes the phonology of English. But the number 2,500 was also chosen with young children&#x00027;s speech patterns in mind. In particular, 30% of 2,500 words is 750 words, which is a good approximation of a 3-year-old&#x00027;s expressive vocabulary size. And, since we know that 3-year-old speech is different from adult speech, it was convenient to consider the potential shape of A-silhouettes in this context. But the reader will have also noted that 2,500 words falls well short of the average expressive vocabulary size of a typical adult. In fact, the lower bound estimate of an average adults&#x00027; expressive vocabulary size is 10,000 words; and, 30% of 10,000 words is even larger than our language vocabulary estimate. Given this, by the logic of our own model, 10,000 distinct silhouettes are clearly not required to produce 10,000 words. This observation suggests several paths for future research, including a version of the prior suggestion: more careful studies of the structure and size of developing expressive vocabularies are needed to better understand the relationship between the accuracy with which a novel word can be produced and the size of the expressive vocabulary.</p>
<p>Finally, the developmental perspective adopted here motivates our view that perceptual experience and motor practice interact and build on each other through time; together, they provide the foundation for an individualized account of spoken language patterns. The Core/CC model framework assumes the evolution of speech perception and of perceptual wordform representations, but addresses only the effects of motor practice on change. This limitation argues for future research that has as its aim to understand, in precise terms, how much of developmental change in the sound patterns of speech is due to perception and how much is due to production. It will also be important to determine how exactly to tell the difference between the two. The Core model framework suggests, consistent with much other theory, that perceptually-driven changes should be in the direction of increasing contrasts, and that motor-driven changes are in the timing domain. But timing differences also give rise to contrast. This is, in fact, the foundational insight on which Articulatory Phonology was built (i.e., language-specific gestural coordination). So, again, under the now well-articulated assumption of a dual lexicon, future research will need to detail the separate and interacting contributions from perceptual learning and speech motor learning to understand the emergence and evolution of individualized speech patterns.</p></sec></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>Publicly available data for this study can be found at: <ext-link ext-link-type="uri" xlink:href="https://github.com/mayaekd/core">https://github.com/mayaekd/core</ext-link>.</p></sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>The research reported here was fully collaborative. Both authors contributed to the writing and approved the submitted version of the manuscript.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) under grant R01HD087452 (PI: MR).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Author disclaimer</title>
<p>The content is solely the authors&#x00027; responsibility and does not necessarily reflect the views of NICHD.</p>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Time is modeled discretely for computational reasons, but the concept is one of a continuous unfolding process (see Davis and Redford, <xref ref-type="bibr" rid="B8">2019</xref>).</p></fn>
<fn id="fn0002"><p><sup>2</sup>Style-shifting is not addressed in this paper, but can be modeled within CC as the greater weighting of either the motor or perceptual activation pattern during integration. A reviewer points out that style could also be modeled in other ways within the model, including by the selection of specific formal or casual exemplars of words or by changing the size of the look-back and look-ahead windows of integration. This is also true. The main point here is that the distinct motor and perceptual activation patterns in the CC model are meant to incorporate the tension between &#x0201C;ease&#x0201D; and &#x0201C;distinctiveness&#x0201D; that is at the heart of Lindblom&#x00027;s H&#x00026;H theory of production.</p></fn>
<fn id="fn0003"><p><sup>3</sup>Recall that the silhouette incorporates motor traces of words that were successfully communicated. This allows for the influence of the periphery (i.e., articulation) on representation. The periphery introduces noise into the representation in any number of ways, including by virtue of poorly established &#x0201C;functional synergies&#x0201D; (see, e.g., Smith and Zelaznik, <xref ref-type="bibr" rid="B56">2004</xref>).</p></fn>
<fn id="fn0004"><p><sup>4</sup>For a substantially different interpretation of these findings see Gathercole (<xref ref-type="bibr" rid="B16">2006</xref>).</p></fn>
<fn id="fn0005"><p><sup>5</sup>This assumes an expressive vocabulary that is half the size of the overall vocabulary, which Shipley and McAfee (<xref ref-type="bibr" rid="B55">2019</xref>) place at about 1,000 words for a typically-developing 3-year-old.</p></fn>
<fn id="fn0006"><p><sup>6</sup>Specifically, the clusters were 4 &#x000D7; 4 squares of 16 junctures, with the horizontal distance between two adjacent junctures within a cluster being 1 and the horizontal distance between two adjacent clusters being 2. The vertical distance between adjacent junctures within a cluster was 1 and the vertical distance between the bottom row of clusters and the top row of clusters was 15. Let us designate the bottom-row clusters as &#x0201C;consonants&#x0201D; and the top as &#x0201C;vowels.&#x0201D; The transformation between motor and perceptual space can then be described as follows: If in motor space, the consonants from left to right were <italic>C</italic><sub>1</sub>, <italic>C</italic><sub>2</sub>, <italic>C</italic><sub>3</sub>, <italic>C</italic><sub>4</sub>, <italic>C</italic><sub>5</sub>, <italic>C</italic><sub>6</sub>, then in perceptual space they were <italic>C</italic><sub>3</sub>, <italic>C</italic><sub>4</sub>, <italic>C</italic><sub>1</sub>, <italic>C</italic><sub>2</sub>, <italic>C</italic><sub>5</sub>, <italic>C</italic><sub>6</sub>; if in motor space, the vowels from left to right were <italic>V</italic><sub>1</sub>, <italic>V</italic><sub>2</sub>, <italic>V</italic><sub>3</sub>, <italic>V</italic><sub>4</sub>, <italic>V</italic><sub>5</sub>, <italic>V</italic><sub>6</sub>, then in perceptual space they were <italic>V</italic><sub>3</sub>, <italic>V</italic><sub>4</sub>, <italic>V</italic><sub>5</sub>, <italic>V</italic><sub>6</sub>, <italic>V</italic><sub>1</sub>, <italic>V</italic><sub>2</sub>.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adams</surname> <given-names>J. A.</given-names></name></person-group> (<year>1987</year>). <article-title>Historical review and appraisal of research on the learning, retention, and transfer of human motor skills</article-title>. <source>Psychol. Bull</source>. <volume>101</volume>, <fpage>41</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1037/0033-2909.101.1.41</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Browman</surname> <given-names>C. P.</given-names></name> <name><surname>Goldstein</surname> <given-names>L.</given-names></name></person-group> (<year>1992</year>). <article-title>Articulatory phonology: an overview</article-title>. <source>Phonetica</source> <volume>49</volume>, <fpage>155</fpage>&#x02013;<lpage>180</lpage>. <pub-id pub-id-type="doi">10.1159/000261913</pub-id><pub-id pub-id-type="pmid">1488456</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Browman</surname> <given-names>C. P.</given-names></name> <name><surname>Goldstein</surname> <given-names>L. M.</given-names></name></person-group> (<year>1986</year>). <article-title>Towards an articulatory phonology</article-title>. <source>Phonology</source> <volume>3</volume>, <fpage>219</fpage>&#x02013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1017/S0952675700000658</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brysbaert</surname> <given-names>M.</given-names></name> <name><surname>Stevens</surname> <given-names>M.</given-names></name> <name><surname>Mandera</surname> <given-names>P.</given-names></name> <name><surname>Keuleers</surname> <given-names>E.</given-names></name></person-group> (<year>2016</year>). <article-title>How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant&#x00027;s age</article-title>. <source>Front. Psychol</source>. <volume>7</volume>, <fpage>1116</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2016.01116</pub-id><pub-id pub-id-type="pmid">27524974</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bundgaard-Nielsen</surname> <given-names>R. L.</given-names></name> <name><surname>Best</surname> <given-names>C. T.</given-names></name> <name><surname>Kroos</surname> <given-names>C.</given-names></name> <name><surname>Tyler</surname> <given-names>M. D.</given-names></name></person-group> (<year>2012</year>). <article-title>Second language learners&#x00027; vocabulary expansion is associated with improved second language vowel intelligibility</article-title>. <source>Appl. Psycholinguist</source>. <volume>33</volume>, <fpage>643</fpage>&#x02013;<lpage>664</lpage>. <pub-id pub-id-type="doi">10.1017/S0142716411000518</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bybee</surname> <given-names>J.</given-names></name></person-group> (<year>2002</year>). <article-title>Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change</article-title>. <source>Lang. Var. Change</source> <volume>14</volume>, <fpage>261</fpage>&#x02013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1017/S0954394502143018</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davis</surname> <given-names>B. L.</given-names></name> <name><surname>MacNeilage</surname> <given-names>P. F.</given-names></name> <name><surname>Matyear</surname> <given-names>C. L.</given-names></name></person-group> (<year>2002</year>). <article-title>Acquisition of serial complexity in speech production: a comparison of phonetic and phonological approaches to first word production</article-title>. <source>Phonetica</source> <volume>59</volume>, <fpage>75</fpage>&#x02013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1159/000066065</pub-id><pub-id pub-id-type="pmid">12232462</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davis</surname> <given-names>M.</given-names></name> <name><surname>Redford</surname> <given-names>M. A.</given-names></name></person-group> (<year>2019</year>). <article-title>The emergence of discrete perceptual-motor units in a production model that assumes holistic phonological representations</article-title>. <source>Front. Psychol</source>. <volume>10</volume>, <fpage>2121</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2019.02121</pub-id><pub-id pub-id-type="pmid">31620055</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Diehl</surname> <given-names>R. L.</given-names></name> <name><surname>Lindblom</surname> <given-names>B.</given-names></name></person-group> (<year>2004</year>). <article-title>Explaining the structure of feature and phoneme inventories: the role of auditory distinctiveness,</article-title> in <source>Speech Processing in the Auditory System</source>, eds <person-group person-group-type="editor"><name><surname>Greenberg</surname> <given-names>S.</given-names></name> <name><surname>Ainsworth</surname> <given-names>W. A.</given-names></name> <name><surname>Popper</surname> <given-names>A. N.</given-names></name> <name><surname>Fay</surname> <given-names>R. R.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>101</fpage>&#x02013;<lpage>162</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edwards</surname> <given-names>J.</given-names></name> <name><surname>Beckman</surname> <given-names>M. E.</given-names></name> <name><surname>Munson</surname> <given-names>B.</given-names></name></person-group> (<year>2004</year>). <article-title>The interaction between vocabulary size and phonotactic probability effects on children&#x00027;s production accuracy and fluency in nonword repetition</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>47</volume>, <fpage>421</fpage>&#x02013;<lpage>436</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2004/034)</pub-id><pub-id pub-id-type="pmid">15157141</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferguson</surname> <given-names>C. A.</given-names></name> <name><surname>Farwell</surname> <given-names>C. B.</given-names></name></person-group> (<year>1975</year>). <article-title>Words and sounds in early language acquisition</article-title>. <source>Language</source> <volume>15</volume>, <fpage>419</fpage>&#x02013;<lpage>439</lpage> <pub-id pub-id-type="doi">10.2307/412864</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Flege</surname> <given-names>J. E.</given-names></name></person-group> (<year>1995</year>). <article-title>Second language speech learning: theory, findings, and problems,</article-title> in <source>Speech Perception and Linguistic Experience: Theoretical and Methodological Issues</source>, ed <person-group person-group-type="editor"><name><surname>Strange</surname> <given-names>W.</given-names></name></person-group> (<publisher-loc>Timonium, MD</publisher-loc>: <publisher-name>York Press</publisher-name>), <fpage>233</fpage>&#x02013;<lpage>277</lpage>.<pub-id pub-id-type="pmid">30843410</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Flege</surname> <given-names>J. E.</given-names></name> <name><surname>Bohn</surname> <given-names>O. S.</given-names></name></person-group> (<year>2021</year>). <article-title>The revised speech learning model (SLM-r),</article-title> in <source>Second Language Speech Learning: Theoretical and Empirical Progress</source>, ed <person-group person-group-type="editor"><name><surname>Wayland</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>83</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Flemming</surname> <given-names>E.</given-names></name></person-group> (<year>2004</year>). <article-title>Contrast and perceptual distinctiveness,</article-title> in <source>Phonetically Based Phonology</source>, eds <person-group person-group-type="editor"><name><surname>Hayes</surname> <given-names>B.</given-names></name> <name><surname>Kirchner</surname> <given-names>R.</given-names></name> <name><surname>Steriade</surname> <given-names>D.</given-names></name></person-group> (<publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>232</fpage>&#x02013;<lpage>276</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fowler</surname> <given-names>C. A.</given-names></name></person-group> (<year>1980</year>). <article-title>Coarticulation and theories of extrinsic timing</article-title>. <source>J. Phon</source>. <volume>8</volume>, <fpage>113</fpage>&#x02013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1016/S0095-4470(19)31446-9</pub-id><pub-id pub-id-type="pmid">9509581</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gathercole</surname> <given-names>S. E.</given-names></name></person-group> (<year>2006</year>). <article-title>Nonword repetition and word learning: the nature of the relationship</article-title>. <source>Appl. Psycholinguist</source>. <volume>27</volume>, <fpage>513</fpage>&#x02013;<lpage>543</lpage>. <pub-id pub-id-type="doi">10.1017/S0142716406060383</pub-id><pub-id pub-id-type="pmid">15257665</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guenther</surname> <given-names>F. H.</given-names></name></person-group> (<year>2016</year>). <source>Neural Control of Speech</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guion</surname> <given-names>S. G.</given-names></name></person-group> (<year>2003</year>). <article-title>The vowel systems of quichua-spanish bilinguals</article-title>. <source>Phonetica</source> <volume>60</volume>, <fpage>98</fpage>&#x02013;<lpage>128</lpage>. <pub-id pub-id-type="doi">10.1159/000071449</pub-id><pub-id pub-id-type="pmid">12853715</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guion</surname> <given-names>S. G.</given-names></name> <name><surname>Harada</surname> <given-names>T.</given-names></name> <name><surname>Clark</surname> <given-names>J. J.</given-names></name></person-group> (<year>2004</year>). <article-title>Early and late Spanish-English bilinguals&#x00027; acquisition of English word stress patterns</article-title>. <source>Biling Lang Cogn</source> <volume>7</volume>, <fpage>207</fpage>&#x02013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1017/S1366728904001592</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holt</surname> <given-names>L. L.</given-names></name> <name><surname>Lotto</surname> <given-names>A. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Speech perception as categorization</article-title>. <source>Attent. Percept. Psychophys</source>. <volume>72</volume>, <fpage>1218</fpage>&#x02013;<lpage>1227</lpage>. <pub-id pub-id-type="doi">10.3758/APP.72.5.1218</pub-id><pub-id pub-id-type="pmid">20601702</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Houde</surname> <given-names>J. F.</given-names></name> <name><surname>Nagarajan</surname> <given-names>S. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Speech production as state feedback control</article-title>. <source>Front. Hum. Neurosci</source>. <volume>5</volume>, <fpage>82</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2011.00082</pub-id><pub-id pub-id-type="pmid">22046152</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>J. J.</given-names></name></person-group> (<year>1997</year>). <article-title>How to say &#x02018;Grandma&#x00027;: The problem of developing phonological representations</article-title>. <source>First Lang</source>. <volume>17</volume>, <fpage>1</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1177/014272379701705101</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>1997</year>). <article-title>The auditory/perceptual basis for speech segmentation</article-title>. <source>Work. Papers Linguist</source>. <volume>50</volume>, <fpage>101</fpage>&#x02013;<lpage>113</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). <article-title>Resonance in an exemplar-based lexicon: the emergence of social identity and phonology</article-title>. <source>J. Phon</source>. <volume>34</volume>, <fpage>485</fpage>&#x02013;<lpage>499</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2005.08.004</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levelt</surname> <given-names>W. J.</given-names></name> <name><surname>Roelofs</surname> <given-names>A.</given-names></name> <name><surname>Meyer</surname> <given-names>A. S.</given-names></name></person-group> (<year>1999</year>). <article-title>A theory of lexical access in speech production</article-title>. <source>Behav. Brain Sci</source>. <volume>22</volume>, <fpage>1</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1017/S0140525X99001776</pub-id><pub-id pub-id-type="pmid">11301520</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Levelt</surname> <given-names>W. J. M.</given-names></name></person-group> (<year>1989</year>). <source>Speaking: From Intention to Articulation</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>.</citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lindblom</surname> <given-names>B.</given-names></name></person-group> (<year>1990</year>). <article-title>Explaining phonetic variation: a sketch of the HandH theory,</article-title> in <source>Speech Production and Speech Modelling</source>, eds <person-group person-group-type="editor"><name><surname>Hardcastle</surname> <given-names>W. J.</given-names></name> <name><surname>Marchal</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>403</fpage>&#x02013;<lpage>439</lpage>.</citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luce</surname> <given-names>P. A.</given-names></name> <name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>1998</year>). <article-title>Recognizing spoken words: the neighborhood activation model</article-title>. <source>Ear. Hear</source>. <volume>19</volume>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1097/00003446-199802000-00001</pub-id><pub-id pub-id-type="pmid">9504270</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Major</surname> <given-names>R. C.</given-names></name></person-group> (<year>1998</year>). <article-title>Interlanguage phonetics and phonology: an introduction</article-title>. <source>Stud. Second Lang. Acquisit</source>. <volume>20</volume>, <fpage>131</fpage>&#x02013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1017/S0272263198002010</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Major</surname> <given-names>R. C.</given-names></name></person-group> (<year>2001</year>). <source>Foreign Accent: The Ontogeny and Phylogeny of Second Language Phonology</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCune</surname> <given-names>L.</given-names></name> <name><surname>Vihman</surname> <given-names>M. M.</given-names></name></person-group> (<year>2001</year>). <article-title>Early phonetic and lexical development: a productivity approach</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>44</volume>, <fpage>670</fpage>&#x02013;<lpage>684</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2001/054)</pub-id><pub-id pub-id-type="pmid">11407570</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mehl</surname> <given-names>M. R.</given-names></name> <name><surname>Vazire</surname> <given-names>S.</given-names></name> <name><surname>Ram&#x000ED;rez-Esparza</surname> <given-names>N.</given-names></name> <name><surname>Slatcher</surname> <given-names>R. B.</given-names></name> <name><surname>Pennebaker</surname> <given-names>J. W.</given-names></name></person-group> (<year>2007</year>). <article-title>Are women really more talkative than men?</article-title> <source>Science</source> <volume>317</volume>, <fpage>82</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1126/science.1139940</pub-id><pub-id pub-id-type="pmid">17615349</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Menn</surname> <given-names>L.</given-names></name></person-group> (<year>1983</year>). <article-title>Development of articulatory, phonetic, and phonological capabilities,</article-title> in <source>Language Production, Vol. 2</source>, ed <person-group person-group-type="editor"><name><surname>Butterworth</surname> <given-names>B.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>50</lpage>.<pub-id pub-id-type="pmid">19181392</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Metsala</surname> <given-names>J. L.</given-names></name></person-group> (<year>1999</year>). <article-title>Young children&#x00027;s phonological awareness and nonword repetition as a function of vocabulary development</article-title>. <source>J. Educ. Psychol</source>. <volume>91</volume>, <fpage>3</fpage>. <pub-id pub-id-type="doi">10.1037/0022-0663.91.1.3</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Munson</surname> <given-names>B.</given-names></name> <name><surname>Kurtz</surname> <given-names>B. A.</given-names></name> <name><surname>Windsor</surname> <given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>The influence of vocabulary size, phonotactic probability, and wordlikeness on nonword repetitions of children with and without specific language impairment</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>48</volume>, <fpage>1033</fpage>&#x02013;<lpage>1047</lpage>. <pub-id pub-id-type="doi">10.1044/1092-4388(2005/072)</pub-id><pub-id pub-id-type="pmid">16411794</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagle</surname> <given-names>C. L.</given-names></name></person-group> (<year>2018</year>). <article-title>Examining the temporal structure of the perception-production link in second language acquisition: a longitudinal study</article-title>. <source>Lang. Learn</source>. <volume>68</volume>, <fpage>234</fpage>&#x02013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1111/lang.12275</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagle</surname> <given-names>C. L.</given-names></name> <name><surname>Baese-Berk</surname> <given-names>M. M.</given-names></name></person-group> (<year>2022</year>). <article-title>Advancing the state of the art in L2 speech perception-production research: revisiting theoretical assumptions and methodological practices</article-title>. <source>Stud. Second Lang. Acquisit</source>. <volume>44</volume>, <fpage>580</fpage>&#x02013;<lpage>605</lpage>. <pub-id pub-id-type="doi">10.1017/S0272263121000371</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>P.</given-names></name> <name><surname>Waring</surname> <given-names>R.</given-names></name></person-group> (<year>1997</year>). <article-title>Vocabulary size, text coverage and word lists</article-title>. <source>Vocabulary</source> <volume>14</volume>, <fpage>6</fpage>&#x02013;<lpage>19</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newell</surname> <given-names>K. M.</given-names></name> <name><surname>Liu</surname> <given-names>Y. T.</given-names></name> <name><surname>Mayer-Kress</surname> <given-names>G.</given-names></name></person-group> (<year>2001</year>). <article-title>Time scales in motor learning and development</article-title>. <source>Psychol. Rev</source>. <volume>108</volume>, <fpage>57</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.108.1.57</pub-id><pub-id pub-id-type="pmid">11212633</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nielsen</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Specificity and abstractness of VOT imitation</article-title>. <source>J. Phon</source>. <volume>39</volume>, <fpage>132</fpage>&#x02013;<lpage>142</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2010.12.007</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nittrouer</surname> <given-names>S.</given-names></name> <name><surname>Studdert-Kennedy</surname> <given-names>M.</given-names></name> <name><surname>McGowan</surname> <given-names>R. S.</given-names></name></person-group> (<year>1989</year>). <article-title>The emergence of phonetic segments: evidence from the spectral structure of fricative-vowel syllables spoken by children and adults</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>32</volume>, <fpage>120</fpage>&#x02013;<lpage>132</lpage>. <pub-id pub-id-type="doi">10.1044/jshr.3201.120</pub-id><pub-id pub-id-type="pmid">2704187</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niziolek</surname> <given-names>C. A.</given-names></name> <name><surname>Nagarajan</surname> <given-names>S. S.</given-names></name> <name><surname>Houde</surname> <given-names>J. F.</given-names></name></person-group> (<year>2013</year>). <article-title>What does motor efference copy represent? Evidence from speech production</article-title>. <source>J. Neurosci</source>. <volume>33</volume>, <fpage>16110</fpage>&#x02013;<lpage>16116</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2137-13.2013</pub-id><pub-id pub-id-type="pmid">24107944</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parrell</surname> <given-names>B.</given-names></name> <name><surname>Lammert</surname> <given-names>A. C.</given-names></name> <name><surname>Ciccarelli</surname> <given-names>G.</given-names></name> <name><surname>Quatieri</surname> <given-names>T. F.</given-names></name></person-group> (<year>2019</year>). <article-title>Current models of speech motor control: a control-theoretic overview of architectures and properties</article-title>. <source>J. Acoust. Soc. Am</source>. <volume>145</volume>, <fpage>1456</fpage>&#x02013;<lpage>1481</lpage>. <pub-id pub-id-type="doi">10.1121/1.5092807</pub-id><pub-id pub-id-type="pmid">31067944</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pierrehumbert</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>Exemplar dynamics: word frequency, lenition and contrast,</article-title> in <source>Frequency and the Emergence of Linguistic Structure</source>, eds <person-group person-group-type="editor"><name><surname>Bybee</surname> <given-names>J. L.</given-names></name> <name><surname>Hopper</surname> <given-names>P. J.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>John Benjamins Publishing Company</publisher-name>), <fpage>137</fpage>&#x02013;<lpage>157</lpage>.</citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierrehumbert</surname> <given-names>J.</given-names></name></person-group> (<year>2002</year>). <article-title>Word-specific phonetics</article-title>. <source>Lab. Phonol</source>. <volume>7</volume>, <fpage>101</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1515/9783110197105.101</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierrehumbert</surname> <given-names>J. B.</given-names></name></person-group> (<year>2003</year>). <article-title>Phonetic diversity, statistical learning, and acquisition of phonology</article-title>. <source>Lang. Speech</source> <volume>46</volume>, <fpage>115</fpage>&#x02013;<lpage>154</lpage>. <pub-id pub-id-type="doi">10.1177/00238309030460020501</pub-id><pub-id pub-id-type="pmid">14748442</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pisoni</surname> <given-names>D. B.</given-names></name> <name><surname>Nusbaum</surname> <given-names>H. C.</given-names></name> <name><surname>Luce</surname> <given-names>P. A.</given-names></name> <name><surname>Slowiaczek</surname> <given-names>L. M.</given-names></name></person-group> (<year>1985</year>). <article-title>Speech perception, word recognition and the structure of the lexicon</article-title>. <source>Speech Commun</source>. <volume>4</volume>, <fpage>75</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1016/0167-6393(85)90037-8</pub-id><pub-id pub-id-type="pmid">23226910</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redford</surname> <given-names>M. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Unifying speech and language in a developmentally sensitive model of production</article-title>. <source>J. Phon</source>. <volume>53</volume>, <fpage>141</fpage>&#x02013;<lpage>152</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2015.06.006</pub-id><pub-id pub-id-type="pmid">26688597</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redford</surname> <given-names>M. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Speech production from a developmental perspective</article-title>. <source>J. Speech Lang. Hear. Res</source>. <volume>62</volume>, <fpage>2946</fpage>&#x02013;<lpage>2962</lpage>. <pub-id pub-id-type="doi">10.1044/2019_JSLHR-S-CSMC7-18-0130</pub-id><pub-id pub-id-type="pmid">31465709</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redford</surname> <given-names>M. A.</given-names></name> <name><surname>Oh</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>The representation and execution of articulatory timing in first and second language acquisition</article-title>. <source>J. Phon</source>. <volume>63</volume>, <fpage>127</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2017.01.004</pub-id><pub-id pub-id-type="pmid">28947839</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redford</surname> <given-names>M. A.</given-names></name> <name><surname>Oh</surname> <given-names>G. E.</given-names></name></person-group> (<year>2016</year>). <article-title>Children&#x00027;s abstraction and generalization of English lexical stress patterns</article-title>. <source>J. Child Lang</source>. <volume>43</volume>, <fpage>338</fpage>&#x02013;<lpage>365</lpage>. <pub-id pub-id-type="doi">10.1017/S0305000915000215</pub-id><pub-id pub-id-type="pmid">26027880</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Samuel</surname> <given-names>A. G.</given-names></name> <name><surname>Kraljic</surname> <given-names>T.</given-names></name></person-group> (<year>2009</year>). <article-title>Perceptual learning for speech</article-title>. <source>Attent. Percept. Psychophys</source>. <volume>71</volume>, <fpage>1207</fpage>&#x02013;<lpage>1218</lpage>. <pub-id pub-id-type="doi">10.3758/APP.71.6.1207</pub-id><pub-id pub-id-type="pmid">19633336</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidt</surname> <given-names>R. A.</given-names></name></person-group> (<year>1975</year>). <article-title>A schema theory of discrete motor skill learning</article-title>. <source>Psychol. Rev</source>. <volume>82</volume>, <fpage>225</fpage>. <pub-id pub-id-type="doi">10.1037/h0076770</pub-id><pub-id pub-id-type="pmid">7330441</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidt</surname> <given-names>R. A.</given-names></name></person-group> (<year>2003</year>). <article-title>Motor schema theory after 27 years: reflections and implications for a new theory</article-title>. <source>Res. Q. Exerc. Sport</source>. <volume>74</volume>, <fpage>366</fpage>&#x02013;<lpage>375</lpage>. <pub-id pub-id-type="doi">10.1080/02701367.2003.10609106</pub-id><pub-id pub-id-type="pmid">14768837</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shipley</surname> <given-names>K. G.</given-names></name> <name><surname>McAfee</surname> <given-names>J. G.</given-names></name></person-group> (<year>2019</year>). <source>Assessment in Speech-Language Pathology: A Resource Manual</source>. <publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>Plural Publishing</publisher-name>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>A.</given-names></name> <name><surname>Zelaznik</surname> <given-names>H. N.</given-names></name></person-group> (<year>2004</year>). <article-title>Development of functional synergies for speech motor coordination in childhood and adolescence</article-title>. <source>Dev. Psychobiol</source>. <volume>45</volume>, <fpage>22</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1002/dev.20009</pub-id><pub-id pub-id-type="pmid">15229873</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>R.</given-names></name> <name><surname>Hawkins</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Production and perception of speaker-specific phonetic detail at word boundaries</article-title>. <source>J. Phon</source>. <volume>40</volume>, <fpage>213</fpage>&#x02013;<lpage>233</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2011.11.003</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Velleman</surname> <given-names>S.</given-names></name></person-group> (<year>1998</year>). <source>Making Phonology Functional: What do I do First</source>? <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Butterworth-Heinemann</publisher-name>.</citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Velleman</surname> <given-names>S. L.</given-names></name> <name><surname>Vihman</surname> <given-names>M. M.</given-names></name></person-group> (<year>2002</year>). <article-title>Whole-word phonology and templates: trap, bootstrap, or some of each?</article-title> <source>Lang. Speech Hear. Serv. Sch</source>. <volume>33</volume>, <fpage>9</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1044/0161-1461(2002/002)</pub-id><pub-id pub-id-type="pmid">27764418</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Verhagen</surname> <given-names>J.</given-names></name> <name><surname>Van Stiphout</surname> <given-names>M.</given-names></name> <name><surname>Elma</surname> <given-names>B. L. O. M.</given-names></name></person-group> (<year>2022</year>). <article-title>Determinants of early lexical acquisition: effects of word-and child-level factors on Dutch children&#x00027;s acquisition of words</article-title>. <source>J. Child Lang</source>. <volume>49</volume>, <fpage>1</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1017/S0305000921000635</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vihman</surname> <given-names>M. M.</given-names></name></person-group> (<year>2014</year>). <source>Phonological Development: The First Two Years, 2nd Edn</source>. <publisher-loc>Malden, MA</publisher-loc>: <publisher-name>Wiley-Blackwell</publisher-name>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vihman</surname> <given-names>M. M.</given-names></name> <name><surname>Croft</surname> <given-names>W.</given-names></name></person-group> (<year>2007</year>). <article-title>Phonological development: toward a &#x02018;radical&#x00027; templatic phonology</article-title>. <source>Linguistics</source> <volume>45</volume>, <fpage>683</fpage>&#x02013;<lpage>725</lpage>. <pub-id pub-id-type="doi">10.1515/LING.2007.021</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wedel</surname> <given-names>A. B.</given-names></name></person-group> (<year>2006</year>). <article-title>Exemplar models, evolution and language change</article-title>. <source>Linguist. Rev</source>. <volume>23</volume>, <fpage>247</fpage>&#x02013;<lpage>274</lpage>. <pub-id pub-id-type="doi">10.1515/TLR.2006.010</pub-id><pub-id pub-id-type="pmid">25813345</pub-id></citation></ref>
</ref-list>
</back>
</article> 
