Paradigms in the Mental Lexicon: Evidence From German

Previous research showed that the mental lexicon is organized morphologically, but the evidence was limited to words that differ only in subphonemic detail. We investigated whether word forms that are related through morphology but have a different stem vowel affect each other's processing. We focused on two issues in two auditory lexical decision experiments. The first is whether the number of morphologically related word forms with the same stem vowel matters. The second is whether the source of similarity matters. Word recognition experiments have shown that word forms that are phonologically embedded and related through inflection speed up each other's recognition, suggesting the word forms are represented within one unit in the mental lexicon. Research has further shown that words that are related through derivation, but that are phonologically different, are affected in a different way than words that are related through inflection. We conducted two experiments to further investigate this. We used three subtypes of one inflectional class of German nouns, which allowed us to study different word forms with a phonological difference, while keeping the morphological relations among the word forms constant. All of these nouns have a plural form that ends in a \textipa{-@}. They differ in the distribution of front and back vowels in the singular, plural and diminutive. This allows us to investigate the question whether word forms with different phonemes are processed differently with regard to (a) the number of word forms that share a vowel, and (b) the source of the similarity among the word forms; is the processing among word forms related through inflection different from the processing of word forms that are related through derivation? We found that nonces that are based on word forms with a fronted vowel are mistaken for words when they resemble words in the word family, but not when they are unrelated to words in the word family. This shows that morphological effects in word auditory recognition studies are also found when the word forms differ in a full phoneme. We argue that this can be captured with a network representation, instantiated as a frame.

that reaction times to simplex words are modulated by the frequency of whole complex words, and not 140 by the summed frequency of their individual morphemes. This is true even in agglutinative languages 141 (Lehtonen et al., 2007;Moscoso del Prado Martín et al., 2004;Vannest et al., 2002). This shows that 142 network models are correct in assuming that the mental lexicon is a network of connected nodes; words 143 that share phonological form and meaning through shared morphology are activated simultaneously. But it 144 also shows that complex words are stored as wholes. 145 Another argument against the centrality of stems in the network model comes from instances of paradigm 146 leveling; members of a paradigm are often adjusted to each other-leveled-in order to make them more 147 similar. An example of such leveling is found in Dutch. In Dutch [n] is normally not pronounced after 148 a [@]. The infinitive of lopen 'to walk' is pronounced [lop@]. Only under very formal circumstances it is 149 pronounced [lop@n] (Booij, 1995). The first person singular present tense is ik loop, pronounced [Ik lop], 150 and often analyzed as the stem form. However, in case an infinitive ends in a sequence [@n@], as in oefenen 151 [uf@n@] 'to practice', the first person singular, present tense is ik oefen [Ik uf@n], and not *[Ik uf@] (Koefoed, 152 1979). Even though this process is correctly described as blocking of [n]-deletion at the end of a verbal stem (Booij, 1995), this description does not provide an understanding of the blocking. In nouns there is no such blocking. This can be seen by comparing suffixation of the agentive -aar in ler-aar [lerar] 'teacher', 155 form the verb leren [ler@] 'to teach', with molen-aar [mol@naar] 'miller' from the noun molen [mol@] 'mill'. 156 The agentive suffixes appear after the stem form and the form [mol@naar] shows that the final [n] is part of 157 the stem of [mol@n]. In the singular, however, this [n] is deleted; n-deletion is not blocked in nominal stems. 158 This raises the question why [n] deletion only affects nominal stems, but not verbal stems? To answer 159 this question, we propose that the blocking of [n]-deletion in verbs is a case of paradigm leveling; as far 160 as we know this has not been proposed before. The verbal paradigm of [uf@n@] has the plural forms wij, 161 jullie, zij [uf@n@] and the [n] after the first [@] is therefore preserved in the first person singular [Ik uf@n]. The 162 paradigm of nouns such as [mol@] do not contain forms with a final [n]. In short, this argument reinforces 163 the case against a central role of stems in the representation of paradigms. 164 In addition to providing an argument against the centrality of stems, paradigm leveling also highlights 165 the fact that paradigms have structure and should not be represented as a list. In Dutch paradigm leveling, 166 as we have seen above, plural verbal forms asymmetrically affect the singular forms. Such asymmetrical 167 relations have also been observed for morphological features that make up a paradigm (Blevins, 2016;168 Haspelmath and Sims, 2010;Seyfarth et al., 2014). In German nouns, for example, it has been observed 169 that in some inflectional classes is a dependency between genitive forms and plural forms, but the reverse  (Eisenberg, 2004;Thieroff and Vogel, 2009). Morphological properties sometimes depend on 175 phonological properties (see also Neef, 1998). For example, if a plural ends in a [@] its singular ends in a 176 closed syllable. This is true for words such as Bart 'beard' Bärte 'beard-PL', Boot 'boat' Boote 'boot-PL' 177 and Fest Feste 'party, celebrartion-PL'. The reverse, again, is not always true. Singulars such as Mensch or 178 Staat have a plural that ends in en: Menschen and Staaten.

179
In short, the paradigm-as-list model of Ernestus and Baayen (2007b) is insufficient because paradigms 180 are not lists, and the network model of Schriefers et al. (1992) is insufficient because paradigmatic effects 181 go beyond shared stems. A representation of a paradigm needs to capture the dependencies among its word 182 forms. This, then, raises the question as to how paradigms can be represented.

183
Frame representations allow us to capture the dependencies effects mentioned above (Barsalou, 1992;184 Gamerschlag et al., 2013;Löbner, 2014;Petersen and Osswald, 2014). In a frame the properties of a central 185 node are represented as attribute-value structures. Attributes are functions that return a value. We will now 186 analyze inflectional classes of German nouns as sets of (recursive) attribute-values pairs. 187 We propose to represent the inflectional classes of German nouns (Eisenberg, 2004;Köpcke, 1988;188 Thieroff and Vogel, 2009) as frames. The central node of each class is the category noun, and its attributes 189 and their values are morphological and phonological properties that define an inflectional class. Providing 190 a full overview of all inflectional classes is beyond the scope of this paper. Instead we provide frame 191 representations of the class of nouns that has a plural that ends in a schwa-these nouns will also be at the 192 heart of our experiments. The frame representations of these nouns are illustrated in figures 1, 2, 3 and 193 4. Each frame represents one subclass of nouns. The central node-the referential node-is indicated by a 194 double circle, that attributes in small caps and their values in italics.

Paradigms in the mental lexicon
The paradigm of the nouns illustrated in figure 1  diminutives shows that that they are lexically different from their base. In an overview of the typology of 208 meaning of diminutives Jurafsky (1996) finds that, in addition to denoting smallness, diminutives can also 209 denote affection, pejorative meanings or even contempt. This also holds for German diminutives. The word 210 form spelled Bärtchen may refer to a small beard, either to indicate its smallness or to express a measure 211 of contempt. The word form Frauchen, in contrast, can only refer to a woman who owns a pet-usually a 212 dog-irrespective of the size of the woman. The word form Brötchen, as a further example, can only refer to 213 a roll, no matter what its size, and never to a small loaf of bread. As these meanings are partly lexicalized 214 they must be stored in the mental lexicon.

215
The change in meaning associated with derived forms, as with diminutives, is analyzed as a shift of 216 the referent from one node to another (Andreou, 2018;Kawaletz and Plag, 2015). This is illustrated in 217 figure 5. The referent of the noun has shifted to the node that contains the value of the attribute HAS-SIZE.

218
In the figures 1, 2, 3 and 4 a branch with attribute-values for size size was omitted to avoid cluttering 219 the representation. The frames in these figures do include such a branch the crucial difference with the 220 representation of a diminutive as in figure 5 is the referential node, indicated with a double circle. The  To further investigate the role of morphology in word recognition and to test the predictions of our 224 proposed frame representations, we will study the responses latencies in a particular type of German noun Bötchen, [bøtç@n] 'little boat'. We will refer to this group of nouns as Type 2 nouns (see figure 1 and 4).

233
The nouns in the third subgroup have a front vowel in all three word forms: Fest [fEst] 'party, celebration ', 234 Feste [fEst@] 'parties, celebrations' and Festchen [fEstç@n] 'little party, little celebration'. We will refer to 235 this group of nouns as Type 3 nouns (see figure 1 and 4).

236
Table 1. The noun types of the inflectional class in our study. V = Vowel, f = front, b = back This class of nouns allows us to address two questions that have arisen from the research summarized we will use words that are morphologically related and differ by a phoneme, rather than in subphonemic 259 duration only. We expect that the recognition of nonces of type 1 (see table 1 above) is affected by their 260 relation to existing word forms that are morphologically related, despite their phonological difference with 261 an existing word. The more easily a nonce is mistaken for a word, the more mistakes participants will make 262 in their accuracy and the more their response latencies will be affected.

263
The second set of expectations relates to the structure of the representations of inflected and derived 264 words in the mental lexicon. If these are stored in the mental lexicon as the specific frame proposed in van de Vijver Baer-Henney Paradigms in the mental lexicon figure 5, in which diminutives have a different referential node than plain nouns, we expect that diminutives 266 exert less influence on singular and plural nouns than singular and plural nouns on each other; singular and 267 plural nouns share a referential node. This difference in referential nodes will affect both the accuracy and 268 the response latencies. 269 We ran two auditory lexical decision tasks and measured the accuracy and response latency to words and 270 nonces. Our method is slightly different from the one used in Ernestus and Baayen (2007b). We did not tell 271 our participants to accept a nonce if it occurs in a word, but rather we asked them to judge whether an item 272 is a word or not.

EXPERIMENT 1
In the first lexical decision test we investigated whether the accuracy and speed with which a nonce with a The material consisted of 90 German words (they are listed in the Appendix ??). All material was 287 recorded in a carrier sentence Ich habe X gesagt. 'I said X.' to ensure that the words have comparable 288 prosodies. The words were excised from the sentences with Praat (Boersma and Weenink, 2018).

289
We used thirty Type 1 words: Monosyllabic words with a back stem vowel in the singular (e.g. Bart 290 bA5t 'beard') and a front vowel in the plural (Bärte bE5t@) and the diminutive (Bärtchen bE5tç@n). We 291 created thirty nonces by giving the singular a front vowel (e.g. bE5t). The nonce has the same vowel as two 292 allomorphs in the paradigm of Bart: the plural allomorph and the diminutive allomorph. Apart from the 293 value of the [back] feature nothing in the word was changed in order to preserve its syllable structure. 294 We further used thirty Type 2 words: Monosyllabic words with a back vowel in the singular (e.g. Boot

295
bot 'boat') and the plural (e.g. Boote bot@) and a front vowel in the diminutive (e.g. Bötchen bøtç@n). We 296 created thirty nonces by giving the singular a front vowel.(e.g. bøt). This nonce has the same vowel as the 297 diminutive.

298
The last group of thirty words were Type 3 words. They were also monosyllabic and had either front 299 vowels in the singular, plural and diminutive stem or a back vowel in the singular.

Paradigms in the mental lexicon
In addition we selected 180 existing monosyllabic words as fillers and 180 nonces based on these fillers.

304
The total amount of items was therefore 540. They are all listed in 5. As filers we used monosyllabic nouns 305 with front vowels from the same inflectional class as the words.

306
To be able to estimate the effect of frequency on our results, but we found no significant differences in 307 frequency among the types of words in our experiments. We provide the details, therefore, in an appendix 308 5. We also estimated the neighborhood density of the words in our experiment. Here, too, we found no 309 significant differences among the word types and provide the details in an appendix 5. 310 We created two lists, A and B, to prevent a sequence of a word and a related filler in the experiment. Half The experiment was programmed with PsyScope (Cohen et al., 1993) and was carried out in a quiet room 315 at the University of Düsseldorf. The stimulus material was presented over headphones.

316
The experiment started with 16 practice trials half of which consisted of words and the other half of 317 pseudo-words that obeyed the phonotactics of German. In the experiment there were 90 words and 90 318 nonces that we derived from the existing words. In addition we used 180 fillers; again 90 words and 90 319 pseudo-words.

320
After this the experimental items were presented in random order for each participant. Each trial started 321 with a silence of 500 ms. followed by a tone of 500 ms. Then, after a silence of 450 ms., an item was 322 presented and the participants had to decide as quickly as possible whether this was a word or not. The

329
We first consider the accuracy of the participants to words in order to establish that they understood the 330 task; that they correctly accepted words and did not incorrectly reject them. The raw result is summarized 331 in table 2. The counts in 2 show that the words of all types were correctly accepted in more than 93% of 332 the cases. 3.
We expected that nonces of type 1 are more likely to be mistaken for a word, because they resemble 337 two existing word forms in the paradigm. We expected that nonces of type 2 are, in comparison to type 1 in the paradigm, they should be easiest to recognize as nonces.

340
The results of the nonces in table 4 show that nonces of type 1 were incorrectly accepted in 14% of the 341 cases, proportionally more than type 2 and type 3 nonces. Nonces of type 1 were more often mistaken for real words than nonces of type 2 or 3. This analysis, 346 then, confirms that nonces of type 1 are more difficult to reject than nonces of type 2 or 3 as expected. In 347 an analysis, which is not shown here, in which type 2 was designated to be the intercept showed that the 348 accuracy of type 2 and 3 nonces is not statistically different. 349 We will now present the results of a mixed effects model of the log-transformed reaction times of the 350 correctly judged words in experiment 1. We will end the presentation of the results of experiment 1 with a mixed effects model of the reaction 358 times to the incorrectly identified nonces in experiment 1. The participants thought erroneously that these 359 were words and in that case the paradigm may have been activated to influence the reaction times. The 360 number of items over which this analysis was run, was very small, though, as the participants made 361 relatively few mistakes.

362
The results of a linear mixed effects model with the logarithm of the Reaction time as dependent variable 363 and Type (type 1, type 2, type 3), as fixed factor, and random slopes for Items and Participants is presented 364 in Table 7. appears to be a tendency to react a bit more slowly to type 2 and type 3 nonces.

367
We expected that nonces of type 1 were more likely to be mistaken for words, because there is enough 368 support for their assumption within word family of type 1. This expectation turned out to be correct. It was there is either support from a derived word form in the word family (type 2) or no support for the nonce 372 (type 3), and therefore more uncertainty on the part of the participants. The data from the reaction time 373 analysis of nonce items are more inconclusive. The participants were so good at rejecting nonce words, 374 that we had few data on which to base our analysis. The tendency of the data, though, is that nonces of type 375 1 are reacted to more slowly than type 2 and 3 nonces (see table 7).

376
In short, experiment 1 showed that there is evidence for a role of morphological information in word 377 recognition that goes beyond small subphonemic differences among the parts of words forms in a word 378 family (Ernestus and Baayen, 2007b;Schriefers et al., 1992). This evidence is given by a reduced accuracy 379 for nonces that are supported by many forms in the word family. This support provides the participants with 380 mistaken certainty that they are, in fact, dealing with a word. The analysis of the words provides additional 381 support for this interpretation. Type 1 words are processed fastest (see table 3) and most accurate (see table   382 6) of the types in our experiment. The singular of type 1 activates the associated inflected and derived word 383 forms and thus makes it more likely for a participant to mistakingly think that a nonce form of type 1 is an 384 existing word.

385
A different interpretation cannot be ruled out without further evidence. As experiment 1 showed no 386 difference between nonces of type 2 and type 3, it may also be that the source of support caused our results, 387 rather than the amount of support. In this interpretation type 1 nonces are reacted to differently because van de Vijver Baer-Henney Paradigms in the mental lexicon they are similar to an inflected form in the word family, whereas the nonces of type 2 are related to a 389 derived word and type 3 nonce are not related to any word in the word family.

390
A second experiment, in which the amount of support for nonces is kept constant will be able to 391 distinguish these two interpretations.

EXPERIMENT 2
The second experiment was a lexical decision experiment as well. Its aim was to investigate whether the 393 source of similarity among word forms in a word family is relevant. Are nonces processed differently if 394 they resemble an inflected word form than when they resemble a derived word form? If they are, we expect 395 differences in accuracy and response latencies among the nonces of different types, correlating with the 396 source of support for a nonce.

418
In addition we selected as fillers 180 existing bisyllabic plural words from the same inflectional class as 419 the words, and 180 nonces based on these fillers. The total amount of items was therefore 540. They are all 420 listed in the Appendix (section III).

422
The procedure for experiment 2 was identical to experiment 1.

424
We first consider the accuracy of the participants. This establishes that the participants understood the 425 task. The data in The data in table 10 were analyzed in a logistic mixed effects model with accuracy as dependent variable 439 and with Type as fixed effect, and random slopes for items and participants. The analysis confirms that 440 nonces of type 1 and 2 are judged equally accurately, whereas nonces of type 3 are judged with greater 441 accuracy, as is illustrated in table 11.

442
We expected that the source of support mattered and that nonces that are supported by an inflected form 443 are treated differently from nonces that have support from a diminutive. It turns out, though, that nonces of 444 type 1 and type 2 are both mistaken for words to the same extent, but differently from type 3. 2 logarithm of the Reaction time as dependent variable and Type (type 1, type 2, type 3), is presented in Table   447 12. Item and Participants were given random slopes. Words of type 2 are reacted to slower than words of type 1, and words of type 3 are reacted to a bit slower,

449
but not significantly, than words of type 1. An analysis in which the fixed factor was releveled so as to 450 make type 2 the intercept (the analysis is not shown here), showed that the difference between type 2 and 451 type 3 words is not significant. The reaction time data, too, show that type 1 and type 2 are different from 452 type 3 words.

453
We also analyzed the accuracy data of incorrectly accepted nonces, that we have presented in table 10.

454
The participants thought erroneously that these were words and in that case the paradigm may have been 455 activated to influence the reaction times.

456
The results of a linear mixed effects model with the logarithm of the Reaction time as dependent variable 457 and Type (type 1, type 2, type 3), as fixed factor, and random slopes for Items and Participants is presented 458 in table 13. The analysis shows that the reaction times to items of type 2 and 3 are slightly, but significantly, 459 faster than reaction times to items of type 1. analysis indicate that having support from an inflected form in the word family makes the reaction times 463 slower than having support from a derived form or no support at all.

464
In combination with the analysis of accuracy, the data indicate that participants are the accuracy of their 465 judgements is not affected by the source of support for a nonce (table 11) than inflected forms affect derived forms (singulars and plurals as opposed to diminutives.) 485 We expected that, if morphology plays a role in word recognition, the nonces with support from word 486 forms in the word family would be more likely to be mistaken for a word. As a consequence, such a nonce 487 would be more likely to be erroneously accepted as a word (type 1 nonces in experiment 1). Moreover,

488
we expected that the source of support would affect the reaction times and the accuracy to judgements of 489 nonces, since we hypothesize that not all words forms in a word family affect each other to the same extent.

490
These expectations were borne out. Participants were more likely to mistake a nonce for a word if the 491 phonological make up of a nonce was supported by two word forms in the word family (see table 4 and 7).

492
However, as the participants made relatively few mistakes, the reaction time data do not allow us a firm 493 conclusion, even though the tendency in the data hints at a faster decision in case a nonce is supported by 494 two forms in the word family. We extend the results from (Schriefers et al., 1992;Ernestus and Baayen, 495 2007b) by showing that even morphologically related word forms that differ in one phoneme affect each 496 other's response latencies, provided they are morphologically related.

497
In a second lexical decision experiment we assessed whether a derived item exerts less influence on an 498 inflected item, than inflected items on each other. We expected that a nonce that resembles an inflected 499 form would be more likely to be mistaken for a word than when a nonce resembled a derived form (type 1 500 and 2 nonces in experiment 2). Moreover, we expected that the difference in response latencies of incorrect 501 reactions to a nonce that resembles an inflected form are different than the response latencies of incorrect 502 answers to a nonce that resembles a derived form (type 1 and 2 nonces in experiment 2).

503
The expectations were partially borne out. Nonces that are similar to an inflected word are mistaken for 504 a word as often as nonces that are similar to a derived word. This shows that derived words do indeed 505 influence inflected words and that inflected words influence each other, but not that the strength of the 506 influence is determined by the source of the influence. However, the response latencies show that a nonce van de Vijver Baer-Henney

Paradigms in the mental lexicon
that has support from an inflected form (nonces of type 1) take longer to be erroneously accepted as a word 508 than a nonce that has support from a derived form (type 2) or a nonce that has no support (type 3).

509
In combination the results show that morphologically related word forms that differ in a vowel phoneme 510 affects each other, and that the influence of word forms in a paradigm is not equal: Inflected word forms 511 exert a stronger influence on each other than a derived word form on an inflected word form. In short, the 512 results of experiment 1 and 2 together suggest that the frame representation proposed in figure 5 is on the 513 right track.

514
These results are reflected in the frame representations (see figures 1, 2, 3, 4 and 5): inflected forms share 515 a central node and influence each other more strongly. The influence of derived words on inflected words is 516 smaller because they do not share a central node with inflected word forms.

517
Ernestus and Baayen (2007b) showed that both inflected words and derived words influence each other, 518 but their items were almost identical and differed only in subphonemic detail. This, it may turn out, is a 519 crucial difference with our study. in order for derived forms to exert a greater influence on inflected forms 520 it may be necessary for them to not only resemble the inflected words semantically, but also phonologically 521 and phonetically. This would also extend to the results of Schriefers et al. (1992).

522
Our results support network models in which word forms are organized according to morphological 523 affiliation, and phonological and semantic similarity. We have made the morphological organization more 524 specific to include the difference between inflection and derivation as a difference between the referential 525 node within a concept. In processing this difference is reflected by the fact that the influence of inflected 526 words on each other is stronger than the influence of derived forms on inflected forms. Moreover, we have 527 provided an argument to further incorporate word families in models of word recognition.

CONCLUSION
Our experiments provided further evidence that the mental lexicon is organized along morphological lines.

533
Much evidence in the literature shows that derived word forms themselves for networks of related derived network. This provides evidence for a network of paradigmatic relations, that we represented as a frame in 538 figures 1, 2, 3, 4 and 5. Inflectionally related forms share a referential node, while in derived words the 539 referential node is a different one.

CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial 541 relationships that could be construed as a potential conflict of interest.  (1989). The locus of the effects of sentential-semantic context in spoken-word processing.

676
We did not find frequency information of all words, in fact for 21% of our data we did not find frequency 677 information (we did not find frequency information on 32% of Type 1 nouns, 19% of Type 2 nouns and 678 13% of Type 3 nouns).

679
We these caveats in mind, we calculated a regression model with Number (singular or plural) and Type 680 (Type 1, Type 2 and Type 3) as predictors of the frequency per million. As can be seen in In short, the frequencies if the three types of nouns in our experiments is comparable and any effect 683 that we may find is attributable to factors other than (or, perhaps more accurately, in addition to a similar) 684 frequency effect.

APPENDIX II. NEIGHBORHOOD DENSITY
An inhibitory effect is found among words that are phonologically or phonetically similar, and which do 686 not stand in a morphological relationship to each other. The similarity among words can be measured in several ways (Gahl and Strand, 2016), but often it is done in terms of phonemes. Words that differ one 688 phoneme are called neighbors (Luce, 1985;Gahl and Strand, 2016). For example, the words sling and fling 689 are neighbors. The response latencies to words with many neighbors is slowed down in comparison to 690 words with a few neighbors (Luce, 1985;Luce and Pisoni, 1998;Pisoni et al., 1985). To ensure that the 691 effects we found can indeed be ascribed to morphology and not on an effect of neighborhood density, we 692 calculated the neighborhood density of our items. We created a data set of German word forms of nouns, 693 verbs and adjectives by extracting 355.625 nouns from the CELEX corpus (Baayen et al., 1995;Shaoul 694 and Tomaschek, 2013). We then created a list that contained all words that we used in our experiments; all 695 singulars, plurals and diminutives. We then used the data set to calculate, for each word in our experiment, 696 how many neighbors each had by using (Hall et al., 2015). For each word in our experiment we counted as 697 neighbor each word in the data set that differed by one phoneme from the experimental word (Vitevitch 698 and Luce, 1999). 3 For example, we found that Krug 'mug' has 4 neighbors: Krugs 'mug GEN', trug 'bear 699 PST', klug 'smart' and Krieg 'war'. We then used the density in a regression analysis. The density of plurals 700 and singulars is higher than the density of diminutives, but other than that the density are comparable. It is 701 therefore unlikely that differences in neighborhood density among our words have contributed much to our 702 results.

APPENDIX III. MATERIAL
3 There are other methods of establishing neighborhoods (Gahl and Strand, 2016), and we tried them but our results remain the same.