Language Usage and Second Language Morphosyntax: Effects of Availability, Reliability, and Formulaicity

Guo, Rundi; Ellis, Nick C.

doi:10.3389/fpsyg.2021.582259

ORIGINAL RESEARCH article

Front. Psychol., 29 April 2021

Sec. Psychology of Language

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.582259

Language Usage and Second Language Morphosyntax: Effects of Availability, Reliability, and Formulaicity

Language Learning Laboratory, Department of Psychology, University of Michigan, Ann Arbor, MI, United States

Article metrics

View details

Citations

6,5k

Views

2,4k

Downloads

Abstract

A large body of psycholinguistic research demonstrates that both language processing and language acquisition are sensitive to the distributions of linguistic constructions in usage. Here we investigate how statistical distributions at different linguistic levels – morphological and lexical (Experiments 1 and 2), and phrasal (Experiment 2) – contribute to the ease with which morphosyntax is processed and produced by second language learners. We analyze Chinese ESL learners’ knowledge of four English inflectional morphemes: -ed, -ing, and third-person -s on verbs, and plural -s on nouns. In Elicited Imitation Tasks, participants listened to length- and difficulty-matched sentences each containing one target morpheme and typed the whole sentence as accurately as they could after a short delay. Experiment 1 investigated lexical and morphemic levels, testing the hypotheses that a morpheme is expected to be more easily processed when it is (1) highly available (i.e., occurring in frequent word-forms), and (2) highly reliable (i.e., occurring in lemma words that are consistently conjugated in the form containing this morpheme). Thirty sentences were made for each morpheme, divided into three Availability-Reliability Distribution (ARD) groups on the basis of corpus analysis in the Corpus of Contemporary American English (COCA; Davies, 2008-): 10 target words high in availability, 10 high in reliability, and 10 low in both reliability and availability. Responses were scored on whether the target morpheme was accurately reproduced given the provision of the correct lemma. A generalized linear mixed-effects logit model (GLMM) revealed fixed effects of morpheme type, availability, and reliability on the accuracy of morpheme provision. There were no effects of lemma frequency. Experiment 2 successfully replicated these results and extended the investigation to explore phrasal formulaicity by manipulating the frequency of the four-word strings in which the morpheme was embedded. GLMMs replicated the effects of word-form availability and reliability and additionally revealed independent phrase-superiority effects where morphemes were better reproduced in contexts of higher string-frequency. Taken together, these findings demonstrate that morpheme acquisition reflects the distributional properties of learners’ experience and the mappings therein between lexis, morphology, phraseology, and semantics. These conclusions support an emergentist view of the statistical symbolic learning of morphology where language acquisition involves the satisfaction of competing constraints across multiple grain-sizes of units.

Introduction

Novice and intermediate learners of English as a second language (ESL) are far from consistent in their production of inflectional morphemes, such as regular past-tense -ed, or third person singular present-tense -s. Jia and Fuse (2007) show that the acquisition of a morpheme such as the third-person singular -s can take 5 years or more to go from 0 to 80% provision in obligatory contexts for ESL children. Five years of English usage involves many thousands of receptive experiences of high frequency functional morphemes, and many thousands of contexts requiring their productive use, yet provision is variable. This suggests that the system is learned incrementally, and that regularities/generalization/productivity emerge from the combined experience of usage. But is it the case that, for any given morpheme, some exemplars are more easily recognized in the input and produced earlier in acquisition? If so, what are these exemplars that are more likely to be preferentially processed? And why these bellwethers? Are they special in their distributional statistics, for example, in terms of their frequency, or their form-function contingency, or their formulaicity? These are the questions that motivate our research here. How does second language (L2) morphological ability depend upon usage?

Usage-based theories hold that domain-general cognitive mechanisms drive the learning of linguistic constructions and the emergence of generalizations (e.g., Goldberg, 2006; Beckner et al., 2009; MacWhinney and O’Grady, 2014; Wulff and Ellis, 2018). They proposed that acquisition is modulated by factors affecting attention and memory, such as exemplar type- and token-frequency, contingency of form-function mapping, salience of form and of function, paradigm complexity, neighborhood effects and the proportion of friends to enemies¹ in quasi-regular domains, etc. (e.g., Marchman, 1997; MacWhinney, 2001; Bybee, 2006; Ellis, 2006a, b; Seidenberg and Plaut, 2014). For the case of morphology, we might ask then, in the 5 years during which L2 learners are learning to produce third-person singular -s, do experiences of particular -s inflected verbs play a role in the acquisition of the system more than others? Likewise, for the even more extended period during which L2 learners are learning to produce regular past-tense -ed, are particular -ed inflected verbs more potent exemplars than others? And so on. From studies of children (Brown, 1973; Braine et al., 1990; Ambridge et al., 2015; Finley, 2018) and of adults (e.g., Seidenberg and Plaut, 2014; Pollatsek et al., 2015), there is good reason to suspect that distributional factors affect L1 and L2 morpheme acquisition and processing.

Linguistic constructions vary in frequency and they distribute across usage in complex probabilistic patterns. Psycholinguistics research has established several important aspects of these distribution patterns. The most studied parameter is availability, which concerns how often a language learner experiences a given form in their usage history. Availability is estimated as the normalized token frequency of a specific word-form in representative corpora. For example, the availability of the word-form depends is the overall probability of encountering the word-form depends in English usage, i.e., P(depends). The effects of availability on the development and processing of L1 and L2 has been well-established. For example, words high in frequency are named faster (Forster and Chambers, 1973; Seidenberg and McClelland, 1989), judged faster in lexical decision tasks (Yap and Balota, 2014), fixated for shorter durations in reading (McDonald and Shillcock, 2003), recognized more easily in speech (Luce, 1986), and spelled more accurately (Barry and Seymour, 1988). More generally, language learners are sensitive to the frequency of linguistic cues across a wide range of linguistic domains and levels of representation, including phonology and phonotactics, lexis, reading and spelling, morphosyntax, sentence comprehension, etc. (Bybee and Hopper, 2001; Ellis, 2002; Bod et al., 2003).

For the particular case of morphology, words inflected in a form that is high in token frequency are produced earlier and more accurately in that form compared to in other forms and compared to other words that are inflected in low token frequency forms. Such token frequency effects of word-forms have been reported in the acquisition of L1 (Marchman, 1997; Ambridge et al., 2015; Räsänen et al., 2016), L2 (Larsen-Freeman, 1976; Goldschneider and DeKeyser, 2001; Jia and Fuse, 2007), and artificial grammars (Braine et al., 1990; Finley, 2015). Notably, the frequency of word lemmas plays a lesser role in the accurate retrieval of inflected word-forms as compared to the token frequency of the inflected word-form itself - a key finding that has important implications for emergentist approaches which posit chunk-based learning from usage, construction grammar, and linguistic structure as processing history.

Another important distribution parameter is reliability, i.e., how likely it is that a linguistic cue reliably co-occurs either with another construction, or with a particular interpretation. Measuring reliability entails the statistical estimation of some form of contingency (MacWhinney, 2001; Ellis, 2006a; Gries and Ellis, 2015). In the context of morpheme acquisition, reliability can be understood as the relative frequency of different word-forms of a lemma, for example, the reliability of the lemma [depend] occurring in its -s morpheme inflected form depends can be calculated as the number of occurrences of the word-form depends divided by the number of occurrences of all possible word-forms of the lemma [depend] such as depend, depending, and depended, i.e., P (depends| [depend]). To the native ear, depends might well sound more natural than depended, perhaps due to the fact that in an English-speaking environment, when the word depend is used, it is most often conjugated in its third-person singular form. In other words, the high reliability of depends might facilitate its processing in this form, regardless of the overall frequency of occurrence of depends in the entire environment. As a result, depend might become implicitly more associated with the morpheme -s and with the present than with the other tenses. Psychological research into animal and human learning alike demonstrates profound and ubiquitous impacts of contingency in the learning of cue-outcome associations (Shanks, 1995).

The relative frequency of different morphological forms of the same word have been found to predict usage, language change, accuracy, and error patterns in language processing and acquisition (Bybee, 1985; Hay, 2001; Matthews and Theakston, 2006; Sugaya and Shirai, 2009; Tatsumi et al., 2018). Hay’s (2001) study of relative frequency in derivational morphology, which follows proposals on the structure of paradigms (groups of inflectionally related words with a common lexical stem) in Bybee (1985, Chapter 3), demonstrates that the more frequent member of a paradigm is more accessible and less compositional. Paradigms consist of words of different frequency/accessibility levels, the high frequency words are dominant, and the others are dependent upon them. In studies of language change, paradigms are more likely to be re-made on the basis of the highest frequency form (Bybee, 1985). In studies of L1 acquisition, when acquiring irregular plural forms, English speaking children tend to erroneously produce phrases like ^∗two mouse much more frequently than phrases like ^∗two tooth, likely because mouse is a much more reliable form for the lemma [mouse] than tooth is for [tooth]: the word-form mouse occurs seven times more than the word-form mice, whereas the word-form tooth occurs only one sixth as often as does teeth (Matthews and Theakston, 2006).

The third distribution parameter to be considered here is formulaicity, i.e., the frequency of the multi-word strings in which a morpheme-inflected word-form is embedded². Consider how you might more naturally end the phrase “you’ve got to be …” with “kidding” than with “playing.” According to the Corpus of Contemporary American English (Davies, 2008-), the multi-word string “you’ve got to be kidding” is more frequently used than “you’ve got to be playing”; i.e., P (“you’ve got to be kidding”) > P (“you’ve got to be playing”). Note that this is an effect of string frequency, since P (“kidding”) < P (“playing”).

High-frequency phrases, idioms, and formulaic sequences (Sinclair, 1995; Wray, 2002) are processed more fluently than matched low-frequency strings. For example, in phrasal decision tasks (Bod, 2001; Jiang and Nekrasova, 2007; Arnon and Snider, 2010), high frequency phrasal constituents or short sentences (e.g., don’t have to worry; I like it) are judged to be grammatical phrases faster than less frequent controls composed of frequency-matched component words (don’t have to wait; I keep it). Formulaicity effects have likewise been demonstrated in L1 acquisition by Bannard and Matthews (2008) who showed that 2–3-year-old English speaking children were quicker and more accurate at repeating frequently occurring multi-word strings (e.g., “sit in your chair”) sampled from a large child-directed speech corpus, compared to matched infrequent strings (“sit in your truck”). High-frequency slot-and-frame patterns (Braine, 1976) or frames (Mintz, 2003) can strongly constrain the nature of the slot-filler, e.g., a frame like ‘‘to __ it’’ is highly predictive of verbs³. Such distributional information can be potent in the acquisition of both the grammatical and the semantic properties of the slot-filler (Elman, 1990; Redington et al., 1998). Mintz et al. (2014) compared training situations in which target words (such as lowfa) occurred surrounded by two-word frames (such as swetch_klide) that frequently co-occurred, against situations in which target words occurred in simpler bigram contexts (such as swetch lowfa or lowfa klide) where only an immediately adjacent word provides the context for categorization). They found that learners categorized words together when they occurred in similar frame contexts, but not when they occurred in similar bigram contexts. In a study of L1 English-speaking 2 1/2-year-olds, Childers and Tomasello (2001) found that a nonce verb was better acquired so to be subsequently used creatively in a transitive utterance when it was surrounded by pronouns than when surrounded by proper nouns or names, suggesting that the child’s transitive schema may start out with pronouns in pre-/post-verbal positions (i.e., pronoun V pronoun) rather than being fully general. In other words, frequent formulaic frames can positively promote the processing and productivity of their subcomponent words.

Together, these studies demonstrate that the three distributional factors of availability, reliability, and formulaicity pervasively affect language acquisition and processing. In the current study, we are concerned with their roles in L2 morphology, and whether particular exemplars are more easily recognized in the input and correctly produced because of their privileged distributions in the language.

We examine L2 knowledge of four common English inflectional morphemes (the regular past-tense ending -ed, the progressive marker -ing, the third-person singular present-tense ending -s, and the nominal plural marker -s). We target ESL learners whose native language is Mandarin Chinese because this population has been shown to experience greater challenges in acquiring L2 inflections due to the fact that Mandarin Chinese has minimal verb-tense and noun morphology (Yeh et al., 2015). None of the four English morphemes included in our study has a direct morphological equivalence in Mandarin Chinese, although some of them can be expressed with non-inflectional grammatical cues, e.g., certain classifiers that can express plurality, certain aspectual markers (e.g., V-le) that arguably possess properties of a tense marker (Ross, 1995), and lexical cues such as numbers and adverbs. We aim to assess how much ESL morpheme processing and production depends upon their English usage distributions.

Experiment 1 investigates distributions at lexical and morphological levels. Experiment 2 extends the study to include the effects of the distributions of larger phraseological constructions on the processing of embedded morphemes.

Experiment 1

Experiment 1 investigates the effects of availability and reliability of the word-forms containing target morphemes on the production accuracy of the target morphemes. We hypothesize that a morpheme is more easily processed (1) when it occurs in a word-form that is highly frequent in usage (i.e., highly available), and (2) when it is attached to a word that is more consistently conjugated in the form containing this morpheme compared to other forms of the same word lemma (i.e., highly reliable).

Many studies of morpheme acquisition, following Brown (1973), assess spontaneous production of target morphemes in obligatory contexts, i.e., where the morpheme would be obligatory in a native-speaking adult’s speech either because of the pragmatic context of discourse (e.g., describing something happened in the past calls for use of past-tense verbs) or the syntactic structure of the utterance (e.g., “Yesterday, I walk__ to the store” requires the regular past-tense morpheme -ed). Here, instead, we use an Elicited Imitation Task (EIT) with morphemes in non-obligatory contexts of the sort used by Marchman (1997) who investigated the production of the past-tense -ed morpheme in L1 English-speaking children.

Elicited Imitation Tasks

Elicited Imitation Tasks have been widely used in studies of L2 processing and have been shown to have high validity and reliability (Ortega et al., 2002; Erlam, 2006; Gaillard and Tremblay, 2016). In one version of EIT, the participant hears a sentence and is asked to repeat the exact sentence after a short delay. Unlike production in uncontrolled spontaneous speech (e.g., Jia and Fuse, 2007), EIT allows controlled examination of morpheme production in contexts that are matched in important respects, thus isolating the effects of the property of the morphemes themselves from that of their context (Erlam, 2006). The use of predetermined sentence stimuli allows the control of important potential confounds such as the presence of an adverbial tense cues, the frequency of the strings of words that contain the morpheme, grammatical complexity, memory load, etc.

For present purposes, we modified the EIT design to require participants to ‘repeat’ the sentence by typing the written form rather than speaking the oral form. This modification circumvents accent-induced transcription ambiguity, facilitates data collection and analysis, and is less threatening to our Chinese ESL participants who reportedly experience considerable discrepancy between their proficiency in spoken and written forms of English as a result of classroom pedagogical practices in China, which commonly deemphasize oral English instruction (Ren, 2011).

A Process Analysis of EIT as It Relates to Morpheme Production

Each of the 120 randomized trials of our EIT involved listening to a single sentence out of context and then, after a short delay during which the participant rates it for how sensible it is, repeating it verbatim, in all of its parts, as accurately as possible. What processes might be involved in the successful repetition of a target morpheme in such a task? The following sketch is informed by proposals in usage-based linguistics, construction grammar, the psycholinguistics of sentence processing, predictive processing, and first and second language acquisition [see, particularly, Christiansen and Chater (2016) on “Chunk-and-Pass” processing, and, more generally for review of language emergence, MacWhinney and O’Grady, 2014]:

The perception and comprehension encoding stages in EIT involve three parts: (1) taking in word-forms into an auditory/lexical buffer, (2) linking lexical items syntactically, and (3) constructing a meaningful interpretation of the sentence. Based on the psycholinguistic research which has shown a variety of frequency effects in the perception and processing of words, morphemes, multi-word chunks, and syntactic constructions, we propose that the initial recognition and preservation of the correct form of target words is likely influenced by the forces of availability, reliability, and formulaicity in terms of storage in an auditory/lexical buffer. Then, the language system rapidly integrates all available incoming information, interactively satisfying multiple constraints as quickly as possible, to update the current interpretation of what has been said so far. Relevant cues include sentence-internal information about lexical and structural biases, as well as extra-sentential cues from the referential and pragmatic context (although the decontextualized nature of EIT denies many of these usual additional influences). As the incoming auditory information is chunked, it is rapidly integrated with contextual information to recognize words and morphemes, which are in turn chunked into larger multiword units. Incremental identification of incoming units is influenced by the sequential probabilities of what has been processed to date: the next word in a well-entrenched word sequence is more easily identified, as is an incoming morpheme that is highly predicted in its context. In parsing and interpreting the target morphemes, there are potential influences of syntactic integrity, e.g., auxiliary [be] impacting particularly progressive -ing, and of contextual support where context could influence the encoding of the past -ed. The encoding of third person present -s and plural -s on subjects should also be under the influence of syntactic integrity, although in English, agreement processing is generally less obligatory than processing for tense and aspect (MacWhinney, 1997, 2001). The final stage of EIT, (4) production, is also expected to be sensitive to frequency effects and sequential probabilities at word, morpheme, and particularly phrasal levels: a well-entrenched formulaic phrase will support provision of its component morphemes whether they are analyzed or not. A relevant process analysis of the imitative written production of a recently heard message might look quite like that for speaking (e.g., Levelt, 1989) – something fast, skilled, and automatic that builds upon highly specialized mechanisms dedicated to performing specific subroutines, such as retrieving appropriate words, generating morpho-syntactic structure, computing the phonological target shape of syllables, words, phrases and whole utterances, accessing their orthographic codes, and creating and executing motor programs for skilled typing. In such imitative redintegration, we might expect probabilistic effects to be at their strongest. Formulaic language is more common in speech than in writing (Erman and Warren, 2000); and the observation that memorized clauses and clause-sequences form a high proportion of the fluent stretches of speech heard in everyday conversation led Pawley and Syder (1983) to propose that it is this use of memorized language that underpins fluency.

Method

Participants

Participants were Chinese native speakers (n = 22) who were international students at a major university in the United States. They were either sampled from the Subject Pool of the Psychology Department and participated for course credit (n = 1) or recruited through recruitment posters around the campus and paid $15 for their participation (n = 21). The sample were unintendedly female-dominant (n = 18). Participants were between 18 and 28 years old (Mean = 22, SD = 2.64). All but one of them had lived in an English immersed environment for some time⁴. Excluding this participant, the length of residence in an English-speaking country was between 6 and 84 months (Mean = 31.33, SD = 21.69). All participants had a high-level English proficiency sufficient to permit them to follow the English instructions and complete the language task entirely in English. Their proficiency in English was assessed by self-ratings and self-reported TOEFL scores. One participant was excluded from analysis due to excessive missing data. Summary of participant characteristics is reported in Section 1.1 of Supplementary Data Sheet 1.

Materials

Elicited imitation task

We followed the EIT design features outlined by Ortega et al. (2002). All sentences were within the recommended syllable length of 10–17 syllables to ensure optimal difficulty; the words that contain the target morpheme were placed in the middle of sentences, with filler words at the very beginning and ends of the sentences to reduce primacy and recency effects. So, in the stimulus sentence “Late Wednesday evening I thanked him for the lovely flowers,” the target morpheme is the -ed in thanked in the middle of the sentence, the controlled four-word context was I thanked him for, the primacy filler was a randomly selected three word phrase Late Wednesday evening, and the recency filler was the lovely flowers. More detail on how these sentences were constructed is described in the following section. To reduce the impact of phonological rehearsal in short term memory, a 3–5 s distraction task was set up in-between the stimulus and response for each sentence, during which the participants had to judge whether the sentence seemed sensible to them, thus reducing the opportunity for rehearsal. This semantic judgment task helped to ensure that participants actively engaged in semantic processing of the sentences rather than simply trying to encode and retain their acoustic forms. The following section describes the procedure for how the sentences were created.

Item development

The study targeted four of the most studied inflectional morphemes in English verbs and nouns: the regular past-tense ending -ed, the progressive marker -ing, the third-person singular present-tense ending -s, and nominal plural marker -s. Thirty sentences were made for each morpheme, which were divided into three Availability-Reliability Distribution (ARD) groups on the basis of corpus analysis in the Corpus of Contemporary American English (COCA) (Davies, 2008)⁵. COCA is widely used and frequently updated and contains over 520 million words (with 20 million words added each year from 1990 to 2015) coming from 220 thousand text sources that are equally divided among different genres of American English such as spoken, news and magazines, academic texts, fiction, etc., making it the only large and balanced corpus of English used in the United States.

The ARD groups for each morpheme were determined on the basis of their carrier words. The three groups are: (1) Top 10 Availability, (2) Top 10 Reliability, and (3) Bottom 10 Reliability. We first assessed the lemma frequency and the inflected word-form frequency by conducting searches for the top 1000 most frequent content verbs ([vv^∗]) and the top 1000 content nouns ([nn^∗]) and recording the frequency counts in Excel. The Top 10 Availability group consists of the top 10 most frequent inflected word-forms exemplifying each of the four target morphemes. The search commands were as follows: regular past-tense verbs (-ed: ^∗ed.[vv^∗d]), third-person singular present-tense verbs (-s: ^∗s.[vv^∗z]), progressive verbs (-ing: [vv^∗g]), and regular plural nouns (-s: ^∗s.[^∗nn2^∗])⁶. Morpheme reliability was operationalized as the proportion of times that the lemma occurred with that specific morpheme by dividing the word-form frequency (i.e., the frequency of the word-form inflected with the target morpheme) by the lemma frequency (i.e., frequency of all possible word-forms of the lemma). For each morpheme, we ranked the items by reliability and took the top 10 of these to form the Top 10 Reliability group, unless the item had already been included in the Top 10 Availability group. Lastly, the Bottom 10 Reliability group were the items lowest in reliability of expression of the embedded morpheme. It was formed from the bottom results of the proportion rankings that were also relatively low in word-form frequency. Where there was room for choice between exemplars, we favored the alternative with the highest lemma frequency. We also tried to match the Top 10 Reliability and Bottom 10 Reliability items for word-form frequency. The frequency and reliability characteristics of the stimulus sample are summarized in Table 1. Figure 1 illustrates the distribution of the stimuli belonging to each group for each morpheme within the top 1000 frequent content verbs or nouns accordingly. We show the Top 10 Availability group in blue and illustrate with the leading exemplar (e.g., students for plural -s), the Top 10 Reliability group in green (participants), and the Bottom 10 Reliability group in red (gods).

TABLE 1

Group and morpheme	Lemma frequency (Mean)	Inflected word-form frequency (Mean)	Reliability¹ (Proportion) (Mean)
Top 10 Availability	744806	121345	0.32

Past-tense -ed	372980	96811	0.29
Third-person -s	884702	86873	0.13
Progressive -ing	1462384	139036	0.20
Plural -s	259157	162659	0.66

Top 10 Reliability	28908	20042	0.64

Past-tense -ed	16206	10967	0.68
Third-person -s	29185	11750	0.45
Progressive -ing	12426	6597	0.56
Plural -s	57813	50856	0.88

Bottom 10 Reliability	815824	16912	0.03

Past-tense -ed	64125	1615	0.03
Third-person -s	97507	1929	0.02
Progressive -ing	2904224	53365	0.02
Plural -s	197441	10741	0.06

The mean lemma frequency, inflected word-form frequency (availability), and word-form:lemma proportion (reliability) of the carrier words in the stimulus sample by ARD group and by morpheme.

¹Reliability = Word-form frequency/Lemma frequency.

FIGURE 1

To build the sentence contexts for these carrier words, we first conducted n-gram searches for possible three-word strings with the target word in the middle (e.g., [^∗ wanted ^∗]) and then selected the top frequent results for each. These results (e.g., [you wanted to]) were then fed into searches for possible four-word strings with an extra slot at the end (i.e., [you wanted to ^∗]). Then, a three-word random time filler phrase (e.g., On Wednesday morning…) was put at the beginning of each sentence. We only included tense-neutral time phrases — those that do not provide any lexical time cue — so that the target inflectional morpheme would be the only indicator of the tense, i.e., the morphemes would be in non-obligatory contexts. The sentences were then completed to a length of 14–17 syllables by adding a random possible phrase at the end that would make the whole sentence grammatical, logical, and relatively sensible. All filler words were checked with a lexical range breakdown using the computer program VocabProfile from the Compleat Lexical Tutor website (Cobb, accessed 8/2017) so they are roughly in the same frequency band, mostly the top 1000 frequent words. All the finalized sentences were manually checked by two native Chinese speakers to make sure they were sufficiently comprehensible for an average Chinese ESL learner. All sentences were recorded in Audacity⁷. Each of them was spoken twice by a male native speaker of American English and was evaluated by another native speaker to select the best version. Sample sentences can be found in Table 2. The complete stimuli set including the full list of words with their lemma frequency, word-form frequency, and the calculated reliability along with their carrier sentences is available in Supplementary Data Sheet 2.

TABLE 2

Morpheme frequency group	Target morpheme	Sentence	Word-form frequency	Lemma frequency	Proportion (reliability)
Top 10 availability	-ed	On Monday afternoon he*looked*at me for a long moment	121996	652141	0.19
Top 10 reliability	-ing	Late Thursday evening I was*kidding*about his hair and beard	5734	6410	0.89
Bottom 10 reliability	3sg -s	On Friday afternoon he*talks*about the latest developments	7125	304560	0.02
Bottom 10 reliability	Plural -s	On Wednesday she hears about the*gods*of the new religion	6156	125937	0.05

Sample sentences for selected morpheme in each group for Experiment 1.

Target word in each sentence is bolded.

Procedure

The EIT task was administered individually through a PsychoPy program (PsychoPy, RRID:SCR_006571, Peirce, 2007) running on an iMac computer equipped with headphones located in an experimental booth. The total duration of the experiment was approximately 80 min. After providing informed consent, participants received brief oral instructions from the experimenter. The program began with an instruction screen that explained how each sentence would be presented and what their task was, followed by a practice session of five sentences. Participants proceeded to the experimental trials if no further questions arose. The experimenter remained available to aid them as needed.

All participants listened to the 120 sentences. The presentation sequence was individually randomized. On each trial, they first heard a spoken sentence, such as “Late Wednesday evening I thanked him for the lovely flowers.” Immediately after the audio ended, the screen displayed instructions for the participant to judge how much sense the sentence made to them by rating it on a sliding scale of 1–7 using the mouse. Once this rating had been completed, the participant was asked to type out the complete and correct sentence to the best of their ability. Participants decided when the next trial should start by pressing the spacebar. They were notified at the midpoint of the experiment and allowed to take a short break if desired. Their reproduction of the sentences was recorded in csv files.

After the experiment, participants completed a 5-min language history questionnaire (Supplementary Data Sheet 1, 2.2) adapted from Lim and Godfroid (2015). This included questions on general demographics such as gender and age, as well as language background including previous and current exposure and usage of English, English proficiency test scores, self-rated general proficiency in English, and self-rated proficiency on different aspects of using English (reading, writing, speaking, and listening).

Elicited imitation responses were scored for accuracy of production of the one target word that contained the target morpheme in each sentence in the following steps: First, using a string search command in Excel, we screened whether each response contained the exact match of the target word (marked as 1) or not (marked as 0). “Exact matches” were also automatically marked 1 for “correct lemma” and 1 for “correct morpheme.” Second, we manually checked responses marked as 0s for “exact match” looking for typos and spelling errors, as well as irregularities in the inflectional paradigms of certain words, to decide whether a reasonable attempt at the target word was present. In cases where the attempted target word reasonably resembled any form of the lemma (e.g., ^∗glansed for glanced), they were given a “correct lemma” score of 1. Likewise, if its ending reasonably resembled the target morpheme (e.g., ^∗lookign for looking), or if its form resembled the tense or number indicated by the target morpheme (e.g., ^∗drooling for drilling), they were given a “correct morpheme” score of 1. Finally, using Excel commands again, we identified whether the correct morpheme was present in the target word given the presence of a correct lemma (“correct morpheme given correct lemma”). To sum up, each typed response was either marked as 1 (for “correct morpheme given correct lemma”), 0 (for “incorrect or absent morpheme given correct lemma”), or N/A (for cases where the lemma is absent). The lemma-absent cases constituted 8.91% of the responses and were excluded from further analysis. The scoring method is illustrated in Table 3 with two examples of each scenario.

TABLE 3

	Correct morpheme given correct lemma	Score: 1
Examples:	(1a) Late Wednesday evening, I*thanked*him for the lovely flowers.
	(2a) On Thursday, she knew about the*fathers*of the fat children.

	*Incorrect or absent morpheme given* correct lemma**	Score: 0

Examples:	(1b) Late Wednesday evening, I*thanks*him for the lovely flowers.
	(2b) On Thursday, she knew about the*father*of the fat children.

	Incorrect or absent lemma	Score: N/A (excluded)

Examples:	(1c) Late Wednesday evening, I*think*him for the lovely flowers.
	(2c) On Thursday, she knew about the*mothers*of the fat children.

Scoring method with sample sentences.

Target word in each sentence is bolded.

Results

The accuracy scores of sentences for each morpheme in each ARD group are shown in Figure 2. To examine the effects of the two distributional factors, availability and reliability, on the production accuracy of the morphemes, we used generalized linear mixed-effect models using the “lme4” package (R package: lme4, RRID:SCR_015654, version 1.1-13, Bates et al., 2015) in R (version 3.3.3, R Core Team, 2017). The models were fit by maximum likelihood (Laplace Approximation), with random effects specified for subjects and items. Because the four target morphemes are inherently stratified in frequency in the corpus, e.g., past-tense -ed verbs are generally used more frequently than third-person -s verbs, the distributional factors are correlated with morpheme type. To reduce multicollinearity and to account for the between-morpheme differences, we first ran a mixed-effect model with morpheme type as the only fixed-effect predictor to serve as a baseline model which parses out the differences between the four morphemes. From there, we built up the model by incrementally specifying other predictors one at a time to identify the unique contributions of each. To determine which subject-level random effects to include, specifically whether or not to include the random slopes for subjects for each fixed-effect predictor, we ran two versions for each model, one with only random intercepts, and one further adding random slopes. We report the model with random intercepts for subjects unless adding random slopes significantly improves model fit, in which case we report the latter. The preliminary steps (testing morpheme type alone; morpheme type + morpheme reliability; and morpheme type + morpheme availability) are detailed in Supplementary Data Sheet 1, 1.2. We describe here the complete Model 1 involving all three fixed effects.

FIGURE 2

Model 1: Morpheme Type + Morpheme Reliability + Morpheme Availability

Model 1, which included morpheme type, reliability, availability (i.e., log word-form frequency) as fixed-effect predictors, and random intercepts for subjects and items), is detailed in Table 4. Stimulus sentence length in syllables was included as a fixed predictor to control for any stimulus length effects. Each participant’s stimulus sense rating for each sentence was also included as a potential predictor.

TABLE 4

Model 1: no interactions.

		Fixed effects				Random effects

						By subject	By item

Parameters		Estimate	SE	z	p	SD	SD
Intercept		0.145	3.424	0.04	0.966	1.04	1.24
Morpheme type¹	Plural -s	–1.480	0.424	–3.49	0.000***
	Third-person -s	–0.347	0.391	–0.88	0.376
	Progressive -ing	0.963	0.435	2.21	0.027*
Morpheme reliability		1.937	0.536	3.61	0.000***
Morpheme availability²		0.420	0.202	2.08	0.037*
Sentence length (syllables)		0.035	0.207	0.17	0.866
Sense rating		–0.009	0.059	–0.16	0.874

¹Past-tense -ed is the reference level. ²Word-form frequency was logarithmically transformed. Model formula: accuracy∼morpheme + reliability + availability + length + sense + (1\| subject) + (1\| item). **p = 0; p < 0.01; p < 0.1.

Model 1b: Interactions between morpheme type and reliability.

		*Fixed effects*				*Random effects*

						By subject	By item

*Parameters*		Estimate	SE	z	p	SD	SD

Intercept		3.491	2.940	1.19	0.235	1.10	1.11
Morpheme type¹	Plural -s	–2.667	0.593	–4.50	0.000***
	Third-person -s	–1.257	0.536	–2.35	0.019*
	Progressive -ing	0.534	0.594	0.90	0.368
Morpheme reliability		0.554	1.026	–0.54	0.589
Sentence length (syllables)		–0.028	0.192	–0.14	0.886
MorphemePlural-s: Reliability		3.902	1.231	3.17	0.002**
MorphemePres-s: Reliability		2.952	1.638	1.80	0.072.
MorphemeProg-ing: Reliability		1.626	1.647	0.99	0.323

¹Past-tense -ed is the reference level. Model formula: accuracy∼morpheme^∗reliability + length + (1\| subject) + (1\| item). *p = 0; p < 0.001; *p < 0.01; ^⋅p < 0.05; p < 0.1.

Model 1c: interactions between morpheme type and availability.

		*Fixed effects*				*Random effects*

						By Subject	By Item

*Parameters*		Estimate	SE	z	p	SD	SD

Intercept		1.765	3.546	0.50	0.619	1.10	1.15
Morpheme type¹	Plural -s	–9.827	2.673	–3.68	0.000***
	Third-person -s	–2.660	1.869	–1.42	0.155
	Progressive -ing	1.779	2.246	0.79	0.428
Morpheme availability²		0.221	0.355	0.62	0.534
Sentence length (syllables)		0.017	0.200	0.08	0.934
MorphemePlural-s: Availability		1.943	0.604	3.22	0.001**
MorphemePres-s: Availability		0.517	0.458	1.13	0.259
MorphemeProg-ing: Availability		–0.201	0.527	–0.38	0.703

Experiment 1 results from the mixed effects model including fixed effects of morpheme type, morpheme reliability (proportion), morpheme availability (log word-form frequency), stimulus sentence length, stimulus sense rating, and random effects of subject and item.

¹Past-tense -ed is the reference level. ²Word-form frequency was logarithmically transformed. Model formula: accuracy∼morpheme^∗availability + length + (1| subject) + (1| item). ***p = 0; **p < 0.001; p < 0.1.

There were effects of morpheme type: -ing had significantly higher accuracy than -ed (estimate = 0.963, SE = 0.435, z = 2.21, p = 0.027); Plural -s had significantly lower accuracy than -ed (estimate = –1.480, SE = 0.424, z = –3.49 p = 0.000). The difference between the third person present tense -s and -ed was not significant (estimate = –0.347, SE = 0.392, z = –0.88, p = 0.37). The effect of morpheme reliability was highly significant (estimate = 1.937, SE = 0.537, z = 3.61, p = 0.000). Additionally, availability had significant but smaller effects (estimate = 0.419, SE = 0.201, z = 2.08, p = 0.037). Stimulus sentence length was non-significant (estimate = 0.035, SE = 0.207, z = 0.17, p = 0.866). Stimulus sense rating was also non-significant (estimate = –0.009, SE = 0.059, z = –0.16, p = 0.874).

Analysis of Deviance using Type III Wald chi-square tests showed that morpheme type, reliability, and availability were all significant predictors of accuracy [morpheme type: χ²(df = 3) = 31.595, p = 0.000; reliability: χ²(df = 1) = 13.015, p = 0.000; availability: χ²(df = 1) = 4.332, p = 0.04], confirming their individual unique contributions to production accuracy. Stimulus sentence length was not a significant predictor χ²(df = 1) = 0.055, p = 0.814, nor was stimulus sense χ²(df = 1) = 0.025, p = 0.874. Figure 3 separately plots the effects of Morpheme (3a), reliability (3b), and availability (3c).

FIGURE 3

Model 1b (mid panel of Table 4) investigated the interaction between morpheme type and reliability. Past-tense -ed was the reference level for type. Allowing for the interaction removes any overall effect of reliability (estimate = 0.554, SE = 1.026, z = –0.54, p = 0.589). However, there remains a significant effect of reliability on Plural -s (estimate = 3.902, SE = 1.231, z = 3.17, p = 0.002) and a marginal one on third person present tense -s (estimate = 2.952, SE = 1.638, z = 1.80, p = 0.072).

Model 1c (lower panel of Table 4) investigated the interaction between morpheme type and availability, again with past-tense -ed as the reference level. Allowing for the interaction removes any overall effect of availability (estimate = 0.221, SE = 0.355, z = 0.62, p = 0.534); although there remains a substantial effect of availability upon Plural -s (estimate = 1.943, SE = 0.604, z = 3.22, p = 0.001).

Exploring Log Lemma Frequency

To examine any effects of lemma frequency (rather than the frequency of the inflected form) alongside morpheme type, we ran a model which included morpheme type and log lemma frequency as fixed-effect predictors and random intercepts for subjects and items. Morpheme type showed consistent effects: -ing had significantly higher accuracy than -ed (estimate = 1.031, SE = 0.481, z = 2.14, p = 0.03); Plural -s had significantly lower accuracy than -ed (estimate = –0.875, SE = 0.431, z = –2.03, p = 0.04). However, the effect of log lemma frequency was negligible (estimate = –0.007, SE = 0.223, z = –0.03, p = 0.98). Analysis of Deviance using Type III Wald chi-square tests confirmed the morpheme effect [χ²(df = 3) = 21.231, p = 0.000], and revealed that lemma frequency was not a significant predictor of accuracy when morpheme type was taken into account [χ²(df = 1) = 0.001, p = 0.97].

Post hoc Explorations of Effects of n-Gram Frequency

As a post hoc analysis to explore potential effects of phrasal frequency, we investigated whether log frequency of the three-word string (e.g., you wanted to, see section “Item Development”) explained significant additional variance alongside morpheme type. Morpheme type showed consistent effects: -ing had significantly higher accuracy than -ed (estimate = 1.131, SE = 0.445, z = 2.54, p = 0.01); Plural -s had significantly lower accuracy than -ed (estimate = –0.980, SE = 0.401, z = –2.44, p = 0.02). The effect of log 3-gram frequency was also highly significant (estimate = 0.526, SE = 0.165, z = 3.18, p = 0.001). Analysis of Deviance using Type III Wald chi-square tests showed that in addition to morpheme type, log 3-gram frequency was also significantly predictive of accuracy [morpheme type: χ²(df = 3) = 26.66, p = 0.000; log 3-gram frequency: χ²(df = 1) = 10.14, p = 0.001].

To try to see whether availability (i.e., log word-form frequency) or log 3-gram frequency had independent effects, and which was the greater contributor, we tried models which included both as potential contributors. However, because log word-form frequency and log 3-gram frequency were inherently highly correlated (r = 0.810), they pull against each other and neither ends up as significant: availability (estimate = 0.025, SE = 0.345, z = 0.07, p = 0.941), log 3-gram frequency (estimate = 0.425, SE = 0.274, z = 1.55, p = 0.122). This is to be further investigated in Experiment 2.

Results Summary

In sum, these analyses revealed independent effects on production accuracy of morpheme availability and reliability. The interactions of these factors with morpheme type revealed a significant effect of reliability on Plural -s and a marginal effect of third person present tense -s, and significant effects of availability on Plural -s. In contrast to availability of the inflected form, there were no effects of log lemma frequency. Neither sentence length nor sense rating had any effect on morpheme provision. Post-hoc exploratory analyses showed that the frequency of the three-word strings also positively predicted accurate provision of the embedded morpheme. However, we had not planned this analysis and had not systematically manipulated the 3-gram frequency in the stimulus materials or controlled for the inherently high correlation between the frequency of the three-word string and the frequency of the word-form inside the string. More careful controls of string frequency are therefore needed to confirm this tentative conclusion.

Discussion of Experiment 1

As predicted, both availability and reliability of the morphemes were positively associated with morpheme production accuracy in the EIT. A morpheme (e.g., plural-s) in a word-form (e.g., participant-s) is more easily recognized and produced when the word-form is high in token frequency and when it is the more reliable form of the lemma ([PARTICIPANT]). The effects of reliability were numerically greater than those of availability.

The participants showed a greater sensitivity to the distribution of the morphologically complex surface forms of the words than to the distribution of the underlying lemmas. This finding supports those of Bybee (1985) and Hay (2001) on the importance of relative frequency in derivational morphology described in the introduction. Similar patterns were also observed in Sereno and Jongman’s (1997) lexical decision task, in which words were presented in singular (e.g., car), or in plural (cars) to native speakers. It was found that the difference in reaction times were predicted only by how frequent the specific surface form was presented, whether singular or plural, but not by the total frequency of both forms (i.e., the lemma frequency). Sereno and Jongman took this as evidence against rule-based processing models of inflectional morphology.

The rank order difficulty of the target morphemes (-ing > -ed > third person present tense -s > plural-s) was generally consistent with the common order reported in prior SLA morpheme studies (Krashen et al., 1977; Goldschneider and DeKeyser, 2001):

with the exception of the plural -s, which was previously reported to be among the earliest to be acquired and processed by L1 and L2 learners of English (Brown, 1973; Krashen et al., 1977). Due to the limited sample size, we refrain from further interpreting this pattern unless it is replicated in Experiment 2. Note also that our stimuli involve a systematically factored selection of 30 exemplars of each type rather than a representationally random sample as used in previous studies, and this might have led to the deviation from the common order.

Post hoc exploratory analyses involving the three-word string suggested that frequency beyond the lexical level could also have affected the production accuracy of the embedded morpheme. As previously discussed, facilitation effects of string frequency (formulaicity) have been observed in the processing of phrasal expressions and non-phrasal “lexical bundles” (Arnon and Snider, 2010; Tremblay et al., 2011), and high frequency frames can facilitate the acquisition and processing of individual component words (Childers and Tomasello, 2001). However, formulaicity research has primarily focused on the facilitation effects on the processing and acquisition of lexical items (Ellis, 2012b; Siyanova-Chanturia and Pellicer-Sanchez, 2018) rather than morphology. The demonstration of effects of formulaicity upon L2 morpheme processing requires more formal control and investigation in a design with greater power than the post-hoc explorations we report here – hence Experiment 2.

Experiment 2

Here we aimed to replicate Experiment 1’s findings on the pattern of morpheme acquisition, and the facilitation effects of morpheme availability and reliability, with a larger sample of participants, with improved stimulus materials, and with a new speaker for the stimulus recordings. Importantly, it extended to the investigation of the effects on morpheme processing and production of frequency at a phrasal level, i.e., the frequency of the four-word strings that contained the target morpheme. To achieve this, we included the same morphemes as those in Experiment 1 but embedded them in high- and low- frequency four-word strings in the sentences for elicited imitation. Motivated by existing literature on formulaicity and the preliminary results from Experiment 1, we predicted that besides the frequency of the word-forms inflected with the target morpheme, the frequency of the four-word strings in which the morpheme-carrying word-form are embedded would also positively predict the morpheme production accuracy in the elicited imitation of sentences.