Semantic Richness Effects in Syntactic Classification: The Role of Feedback

Words with richer semantic representations are recognized faster across a range of lexical processing tasks. The most influential account of this finding is based on the idea that semantic richness effects are mediated by feedback from semantic-level to lower-level representations. In an earlier lexical decision study, Yap et al. (2015) tested this claim by examining the joint effects of stimulus quality and four semantic richness dimensions (imageability, number of features, semantic neighborhood density, semantic diversity). The results of that study showed that joint effects of stimulus quality and richness were generally additive, consistent with the idea that semantic feedback does not typically reach the earliest levels of representation in lexical decision. The present study extends this earlier work by investigating the joint effects of stimulus quality and the same four semantic richness dimensions on syntactic classification performance (is this a noun or verb?), which places relatively more emphasis on semantic processing. Additive effects of stimulus quality and richness were found for two of the four targeted dimensions (concreteness, number of features) while semantic neighborhood density and semantic diversity did not seem to influence syntactic classification response times. These findings provide further support against the view that semantic information reaches early letter-level processes.


INTRODUCTION
In order to understand the mechanisms and processes that support reading, researchers have examined the effect of a myriad of word properties on lexical processing performance (see Yap and Balota, 2015, for a review). However, although the ultimate goal of reading is comprehension, the visual word recognition literature has traditionally been dominated by studies that consider the influence of orthographic (e.g., bigram frequency, word length, frequency of occurrence, orthographic neighborhood density), phonological (e.g., regularity, consistency), and morphological (e.g., morphological family size, derivational, and inflectional entropy) characteristics on tasks such as lexical decision (i.e., discriminating between a word and nonwords such as FLIRP) and speeded pronunciation (i.e., reading letter strings aloud). In addition to these variables, there is increasing evidence that semantic richness (i.e., the extent to which words are associated with relatively more semantic information) is also an important predictor of word recognition performance (see Pexman, 2012, for a review).
Across standard lexical processing paradigms, including lexical decision, speeded pronunciation, perceptual identification (i.e., identifying visually degraded stimuli), and semantic decision (e.g., classifying words as animate or inanimate), it is now wellestablished that semantically rich words are generally recognized more quickly and accurately Yap et al., 2012). We should point out here that semantic richness should not be considered a unitary construct, but is instead most appropriately reflected by a number of dimensions which map onto distinct theoretical perspectives.
These dimensions include, but are not limited to, the number of semantic features participants associate with a word's referent (e.g., COW's features include has four legs, eats grass, produces milk; McRae et al., 2005), its semantic neighborhood density (Shaoul and Westbury, 2010), its number of senses (Miller, 1990;Hoffman et al., 2013), the number of distinct first associates elicited by the word in free association (Nelson et al., 1998), imageability or concreteness, the extent to which the word evokes mental imagery (Cortese and Fugett, 2004;Brysbaert et al., 2014), body-object interaction, the extent to which a human body can interact with the word's referent , sensory experience ratings, the extent to which a word evokes a sensory or perceptual experience (Juhasz and Yap, 2013), modality-specific perceptual strength, the extent to which a word's referent is experienced through the five senses (Lynott and Connell, 2009;Connell and Lynott, 2014), and emotional valence (i.e., whether a word is positive, negative, or neutral; Yap and Seow, 2014). While investigators typically focus on one semantic richness variable at a time, there have been attempts to characterize the relative predictive power of different dimensions. For instance, Yap et al. (2012) compared the influence of number of features, number of senses, semantic neighborhood density, imageability, and bodyobject interaction across multiple word recognition tasks. While every variable produced significant effects in at least one task, only the effects of imageability and number of features were reliable (or borderline reliable) across all tasks, indicating that imaginal and featural aspects may be weighted relatively more heavily in a word's semantic representation.

RICHNESS EFFECTS THROUGH SEMANTIC FEEDBACK
Collectively, the foregoing findings converge on the idea that the word recognition system has access to a word's meaning before a word is fully identified (Balota, 1990). The theoretical framework most commonly used to explain this finding is an embellished version of the interactive activation and competition (IAC) model of letter perception (McClelland and Rumelhart, 1981). The IAC model contains processing nodes that are organized at three levels (features, letters, words) and is both interactive (i.e., activation can flow bidirectionally between levels) and cascaded (i.e., as soon as processing at a level begins, it sends activation to the next level). Cascaded processing contrasts with thresholded processing, in which a later process begins only after an earlier process is complete. By augmenting the standard IAC model with meaning-level representations, Balota (1990; see also Balota et al., 1991) suggested that semantic influences in word recognition can be accommodated by feedback from semantic-level to lexical-level (i.e., word-level) representations. Specifically, semantically richer words (e.g., words with many semantic features) generate more semanticlevel activity, thereby producing stronger feedback to lexical-level units. If one further assumes that lexical decision and speeded pronunciation responses are driven by lexical-level orthographic and phonological activity, respectively, the semantic feedback received by lexical-level units will consequently speed up lexical decision and pronunciation times (Hino and Lupker, 1996;Pexman et al., 2002).
Although studies have explored feedback from semantic-to lexical-level representations  and from phonological to orthographic representations (Pexman et al., 2001), the extent to which semantic richness effects are mediated by word-to-letter feedback is less well-understood. The topdown influence of word-on letter-level representations is an integral assumption of McClelland and Rumelhart's (1981) IAC model, and remains a fundamental aspect of the field's most influential word recognition models, including the dual-route cascaded (DRC) model (Coltheart et al., 2001), the multiple readout model (Grainger and Jacobs, 1996), the bimodal interactive activation framework (Grainger et al., 2005), and the CDP+ and CDP++ models (Perry et al., 2007(Perry et al., , 2010. On a related note, the interaction between semantic priming and target degradation (i.e., stronger semantic priming when targets are visually degraded, e.g., Balota et al., 2008) has also been explained using semantic feedback to letter-level representations by way of lexical-level representations (McNamara, 2005).
To test the assumption that meaning-level information reaches the letter level, , using the lexical decision task, investigated the joint effects of stimulus quality (clear vs. degraded) with four richness dimensions, imageability, number of features, semantic neighborhood density, and ambiguity, which map onto distinct and influential theoretical perspectives (see Pexman, 2012, for more discussion). Presenting words in a degraded manner slows down early feature-and letter-level processing (Blais et al., 2011), and interactive effects of stimulus quality and a factor (e.g., semantic richness) indicate that the factor exerts an influence on an early processing locus (see, Sternberg, 1998, for more discussion of additive factors logic). If semantic richness effects indeed reflect partially activated letter-level representations, the most straightforward prediction is that the deleterious impact of visual degradation should be smaller for words which are semantically richer. Interestingly,  did not observe this pattern. Instead, they found robust additive effects of stimulus quality and richness (i.e., two main effects and no interaction) for the targeted dimensions. In other words, degradation effects were equivalent in magnitude for words that were high and low in semantic richness. In the light of these findings,  suggested that semantic feedback does not appear to reach earlier levels of representation in lexical decision. Additionally, accommodating the additive effects of stimulus quality and richness seems to require a more complex theoretical account wherein activation is thresholded at the letter level but cascaded from the lexical level onwards (Besner and Roberts, 2003;Reynolds and Besner, 2004).  proposed that their findings are also consistent with a flexible lexical processor (Balota and Yap, 2006) which adaptively modulates the processing dynamics of early word recognition processes (i.e., whether letter-level processing is cascaded or thresholded) in response to task contexts and demands. In lexical decision, the ultimate goal of the participant is to discriminate between familiar/meaningful real words and unfamiliar/meaningless nonwords, making familiarity an important dimension for word-nonword discrimination (Balota and Chumbley, 1984). Stimulus degradation may undermine such familiarity-based information , and thresholding the letter output helps to recover the familiarity signal by perceptually normalizing degraded stimuli.

THE PRESENT STUDY
The results from  show quite clearly that the effects of stimulus quality and richness are additive in lexical decision. However, as discussed above, it is possible that this pattern is idiosyncratic to lexical decision, because of the task's emphasis on familiarity-based information. The first goal of the present study was to explore if the additive effects of stimulus quality and richness generalize to a syntactic classification task (is this word a noun or verb?), a task which demands more extensive consideration of the word's meaning (see Sidhu et al., 2014, for more discussion of task demands). The richness dimensions of interest are similar to those studied in  and include concreteness, number of features, semantic neighborhood density, and ambiguity. Experiment 1 examines the effects of concreteness and number of features, while Experiment 2 examines the effects of semantic neighborhood density and ambiguity.
Unlike lexical decision, which is primarily driven by the familiarity of the orthographic code (Balota and Chumbley, 1984), syntactic classification reflects the ease with which semantic coding can be completed (Hino et al., 2006). If letter-level thresholding is indeed a flexible and adaptive consequence of lexical decision's heavy reliance on familiaritybased information, then such thresholding (and its attendant additive effects) may be absent in the syntactic classification task, which places less emphasis on orthographic familiarity. Instead, one might predict an interaction between stimulus quality and semantic richness in syntactic classification, with smaller degradation effects for richer targets. We should also point out that uninflected verb stimuli will be used in the present study, that is, "verbness" cannot be simply assessed by a superficial check for diagnostic morphemes or suffixes. Instead, participants need to judge if a word's meaning denotes actions or entities, which likely requires more semantic processing than standard lexical decision.
More importantly perhaps, there is evidence that the nature of semantic richness effects varies across tasks. For example, there is a theoretically intriguing dissociation in the literature, where ambiguous words are associated with a processing advantage in lexical decision (Borowsky and Masson, 1996) but a processing disadvantage in semantic decision (Piercey and Joordens, 2000). Multiple meanings produce greater feedback from semantic-to lexical-level representations, which is helpful in lexical decision. However, in a task which relies more heavily on the semantic code, multiple meanings can slow responses down due to one-tomany mappings from orthography to semantics (Borowsky and Masson, 1996), greater competition between different meanings (Grainger et al., 2001), or competition between the activated meanings and the required response (Pexman et al., 2004). Thus far, task dissociations have been studied at the level of main effects. For example, Hino et al. (2002) examined how the main effect of semantic ambiguity varied across three lexical processing tasks, lexical decision, speeded naming, and semantic categorization. Our second goal is to determine if similar dissociations are observable for the joint effects of stimulus quality and the different semantic richness dimensions.
In order to characterize our effects in a more fine-grained manner, we will examine our data both at the level of mean response times (RTs) and at the level of RT distributional characteristics (Balota and Yap, 2011). Specifically, empirical RT distributions will be fitted to the ex-Gaussian function (Heathcote et al., 1991), a convolution of a normal and exponential distribution. Such an analysis yields three parameter estimates: µ and σ (mean and standard deviation of the normal distribution) and τ (mean of the exponential distribution). Along with quantile plots, which provide a graphical representation of distributional effects, ex-Gaussian analysis helps determine the extent to which semantic richness effects in syntactic classification are reflected by distributional shifting (µ) and/or an increase in the tail of the distribution (τ ). More relevantly for the present study, there is evidence that spurious additivity in means can be driven by opposing interactive effects in the underlying distribution. For example, Yap et al. (2008) found additive effects of stimulus quality and word frequency at the level of the mean that were due to the combination of an overadditive interaction in µ (reflecting modal RTs) and an underadditive interaction in τ (reflecting slowest RTs). The distributional analyses will therefore help us to rule out such trade-offs in our data. More broadly, the present analyses help extend our earlier work by providing complementary insights into the influence of semantic richness in a task which places greater weight on semantic processing.

EXPERIMENT 1 Method
Participants Thirty-two undergraduates from the University of Calgary participated for partial course credit. Participants reported in a pre-screening survey that their first language was English; they also had normal or corrected-to-normal vision.

Design
Two 2 × 2 designs were incorporated within the same experiment, with non-overlapping items used to examine the effects of each variable. Specifically, we examined stimulus quality (clear or degraded) × concreteness (high or low) and stimulus quality × number of features (high or low). All variables were manipulated within-participants and the dependent variables were RTs and accuracy rates.

Stimuli
A total of 240 nouns were selected, with 120 words (60 high and 60 low) each for concreteness and number of features. To determine whether a word is a noun, we examined its part of speech in the English Lexicon Project  http:// elexicon.wustl.edu) and selected words that were coded only as NN (i.e., noun); we avoided words (e.g., CAN) which can be used both as a noun and a verb. Concreteness ratings were based on the norms collected by Brysbaert et al. (2014). Number of features values were taken from McRae et al. (2005). Word sets in each of the experimental conditions were matched on number of letters, number of syllables, number of morphemes, orthographic neighborhood size, and log-transformed subtitlebased contextual diversity (Brysbaert and New, 2009; see Table 1 for descriptive statistics). Additionally, words in the two levels of concreteness were matched on semantic diversity and semantic neighborhood size 1 , while words in the two levels of number of features were matched on concreteness, semantic diversity, and semantic neighborhood size. Using the Match program (Van Casteren and Davis, 2007), an additional 240 verbs (120 for each semantic richness dimension) were selected from the English Lexicon Project to serve as distracters; these were matched as closely as possible to the nouns on number of letters, number of syllables, orthographic neighborhood size, and frequency. While there was no significant difference (ps > 0.2) between nouns and verbs on number of letters, orthographic neighborhood size, and number of syllables, nouns (M = 2.21) were slightly higher in frequency than verbs (M = 2.08). We should also note that verbs and nouns were not explicitly matched on semantic properties (e.g., concreteness); this will be further addressed in the General Discussion.

Procedure
Computers running E-prime software (Schneider et al., 2001) were used to present stimuli and collect data. Participants were tested individually in sound-attenuated cubicles, and positioned about 60 cm from the monitor. They were instructed to decide if the word presented formed a noun or verb by making the appropriate button press response (slash key for nouns and Z key for verbs). Participants were encouraged to respond quickly but not at the expense of accuracy. The 20 practice trials were followed by six experimental blocks of 80 trials each, with breaks between blocks. Additionally, the order in which stimuli were presented was randomized anew for each participant. Stimuli were presented in uppercase 18-point Courier New, and each trial comprised the following events: (a) a fixation point (+) at the center of the monitor for 400 ms, (b) a blank screen for 200 ms, and (c) the target. The target remained on the screen for 4000 ms or until a response was made. If a response was incorrect, a 170 ms tone was presented simultaneously with the Frequency, log 10 transformed subtitle contextual diversity (Brysbaert and New, 2009); Orthographic neighborhood size, number of words that can be formed by substituting a single letter in the target word (Coltheart et al., 1977); ARC, semantic neighborhood density (Shaoul and Westbury, 2010).
word "Incorrect" displayed slightly below the fixation point for 450 ms. The same degradation procedure used in  was adopted, i.e., half the targets were degraded by rapidly alternating letter strings with a randomly generated mask of the same length. For example, the mask @$#&% was presented for 14 ms, followed by a five-letter target word for 28 ms, and the two rapidly alternated until a response was detected. Mask patterns were consistent within a trial, and were generated from random permutations of the following symbols: &@?!$ * %#?. Across participants, targets were counterbalanced across degraded and clear conditions.

Results and Discussion
Trials with response errors (7.9% of trials) were first excluded from the analyses. Noun responses faster than 200 ms or slower than 3000 ms (0.8% of responses) were then eliminated before a mean and standard deviation was computed for each participant as a function of stimulus quality. RTs beyond 2.5 SDs from each participant's mean were excluded, removing a further 2.0% of the responses. Estimates for ex-Gaussian parameters (µ, σ , τ ) were obtained using the quantile maximum likelihood estimation (QMLE) procedure in the QMPE program (Version 2.18; Cousineau et al., 2004). All fits converged successfully within 250 iterations. The mean RTs, accuracy rates, and ex-Gaussian parameters are presented in Table 2. Using the lme4 package (Bates et al., 2015), RT effects were analyzed using linear mixed effects (LME) models while accuracy effects were analyzed using generalized linear mixed (GLM) models; p-values for fixed effects were obtained using the lmerTest package (Kuznetsova et al., 2016). The main and interactive effects of stimulus quality and semantic richness were treated as fixed effects. Effect coding was used, whereby clear and degraded words were, respectively, coded as −0.5 and 0.5, and words high and low on semantic richness were, respectively, coded as −0.5 and 0.5. Random intercepts for participants and targets, along with by-participant and by-target random slopes for stimulus quality, were included in each model. To the extent models could converge, the by-participant random slope for the relevant semantic richness variable was also included.   3 presents the results for the joint effects of stimulus quality with concreteness. For RTs, the main effects of concreteness (p < 0.001), and stimulus quality (p < 0.001) were both significant. RTs were faster for high-concreteness (M = 864 ms) than for low-concreteness (M = 1029 ms) nouns, and faster for clear (M = 877 ms) than for degraded (M = 1015 ms) nouns. The stimulus quality × concreteness interaction was not significant. Comparing the additive model (two main effects) to the interactive model (two main effects and an interaction) did not reveal a significant difference in their likelihood, χ 2 (1) = 0.327, ns. For accuracy, only the main effect of concreteness (p < 0.001) was significant; accuracy was higher for high-concreteness (M = 0.95) than for low-concreteness (M = 0.86) nouns.
To illustrate these effects graphically, the mean quantiles (0. 15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85) for the different experimental conditions are plotted on Figure 1. In the top two panels of the figure, the empirical quantiles are represented by data points and error bars, while the theoretical quantiles for the best-fitting ex-Gaussian distribution are represented by lines. The bottom panel of the figure represents concreteness effects as a function of stimulus quality. In general, the empirical data were well-captured by the ex-Gaussian parameters; empirical and theoretical quantiles generally did not diverge by more than one standard error. Table 4 presents the results for the joint effects of stimulus quality with number of features. For RTs, the main effects of number of features (p = 0.019) and stimulus quality (p < 0.001) were both significant. RTs were faster for nouns with more features (M = 838 ms) than for nouns with fewer features (M = 879 ms); they were also faster for clear (M = 789 ms) than for degraded (M = 928 ms) nouns. The stimulus quality × concreteness interaction was not significant. Comparing the additive model to the interactive model did not reveal a significant difference in their likelihood, χ 2

Number of Features
(1) = 0.758, ns. For accuracy, none of the effects were statistically significant.

Summary
In Experiment 1, reliable additive effects of stimulus quality and semantic richness were observed in RTs. That is, responses were faster for clear nouns and for semantically rich nouns, but richness effects were not statistically different in magnitude for clear and degraded nouns. The supplementary distributional analyses indicated that the stimulus quality × semantic richness interaction was not significant for any ex-Gaussian parameter, confirming that the additive patterns in mean RTs were not qualified by trade-offs between distributional parameters. There are a couple of other noteworthy observations. In Yap et al.'s (2015) lexical decision study, richness effects were generally mediated by a combination of distributional shifting (µ) and an increase in the tail of the distribution (τ ). In the present study, while this pattern was indeed observed for concreteness effects, the influence of number of features was predominantly reflected in τ . Interestingly and unexpectedly, the main effect of concreteness (M = 165 ms) was much larger than the main effect of number of features (M = 41 ms); we will comment on this further in the General Discussion.

EXPERIMENT 2 Method
Participants Thirty-two undergraduates from the University of Calgary participated for partial course credit. Participants reported in a pre-screening survey that their first language was English; they also had normal or corrected-to-normal vision. Participants who had taken part in Experiment 1 were not allowed to participate in Experiment 2.

Design
Like E1, two 2 × 2 designs were incorporated within the experiment: Stimulus Quality × Semantic Neighborhood Density (dense or sparse) and Stimulus Quality × Ambiguity (high or low). All variables were manipulated within-participants and the dependent variables were RTs and accuracy rates.

Stimuli
A total of 240 nouns were selected, with 120 nouns (60 high and 60 low) each for semantic neighborhood density and ambiguity. Semantic neighborhood density was operationally defined by average radius of co-occurrence (ARC; Shaoul and Westbury, 2010), which refers to the mean of the distance between the target word and all neighbors within a pre-specified threshold; higher ARC values indicate denser neighborhoods. Ambiguity was operationally defined by Hoffman et al.'s (2013) recently developed semantic diversity measure, which estimates semantic ambiguity by tracking the variability in the contextual usage of words; words with higher values on semantic diversity are more ambiguous.
Experimental conditions were matched on the same control lexical variables described in Experiment 1 (see Table 5 for descriptive statistics). Additionally, words in the two levels of semantic neighborhood density were matched on concreteness and semantic diversity, while words in the two levels of semantic diversity were matched on concreteness and semantic neighborhood size. As in Experiment 1, the Match program (Van Casteren and Davis, 2007) was used to select an additional 240 distracter verbs that were matched as closely as possible to the nouns on number of letters, number of syllables, orthographic neighborhood size, and frequency. There was no significant difference (ps > 0.38) between nouns and verbs on number of letters, orthographic neighborhood size, and number of syllables; however, nouns (M = 2.17) were marginally higher in frequency than verbs (M = 2.08), p = 0.05. Orthographic neighborhood size, number of words that can be formed by substituting a single letter in the target word (Coltheart et al., 1977); Frequency, log 10 transformed subtitle contextual diversity (Brysbaert and New, 2009); Concreteness, concreteness ratings (Brysbaert et al., 2014); ARC, average radius of co-occurrence, a measure of semantic neighborhood density (Shaoul and Westbury, 2010).

Procedure
Same as Experiment 1.

Results and Discussion
As in Experiment 1, trials with response errors (10.6% of trials) were first excluded from the analyses. Noun responses faster than 200 ms or slower than 3000 ms (0.7% of responses) were then eliminated before a mean and standard deviation was computed for each participant as a function of stimulus quality. RTs beyond 2.5 SDs from each participant's mean were excluded, removing a further 2.1% of the responses. The mean RTs, accuracy rates, and ex-Gaussian parameters are presented in Table 6.  main effect of stimulus quality (p < 0.001) was significant; RTs were faster for clear (M = 830 ms) than for degraded (M = 957 ms) nouns. Comparing the model with only a main effect of stimulus quality to the additive model did not reveal a significant difference in their likelihood, χ 2 (1) = 1.018, ns. For accuracy, none of the effects were statistically significant.

Semantic Neighborhood Density
Turning to the ex-Gaussian parameters, for µ, only the main effect of stimulus quality was significant, F p(1, 31) = 26.91, p < 0.001, MSE = 3476.21, η p 2 = 0.46; µ was greater for degraded nouns (M = 639 ms) than for clear nouns (M = 585 ms). For σ , none of the effects were significant. Finally, for τ , only the main effect of stimulus quality was significant, F p(1, 31) = 21.57, p < 0.001, MSE = 8871.92, η p 2 = 0.41; τ was greater for degraded nouns (M = 323 ms) than for clear nouns (M = 245 ms). These effects are graphically represented in Figure 3. Table 8 presents the results for the joint effects of stimulus quality with semantic diversity. For RTs, only the main effect of stimulus quality (p < 0.001) was significant; RTs were faster for clear (M = 836 ms) than for degraded (M = 992 ms) nouns. Comparing the model with only a main effect of stimulus quality to the additive model did not reveal a significant difference in their likelihood, χ 2
Turning to the ex-Gaussian parameters, for µ, only the main effect of stimulus quality was significant, F p(1, 31) =

Summary
Compared to Experiment 1, semantic richness effects in Experiment 2 were far less robust. Specifically, semantic neighborhood density had no effect on RT or accuracy rates, while the influence of semantic diversity was restricted to accuracy rates. Importantly, as in the previous experiment, there was no evidence that these effects were qualified by stimulus quality, either in the mean RTs or in the underlying RT distributional characteristics. To establish the robustness of the null findings in Experiment 2, we conducted supplementary analyses to examine the effects of concreteness, semantic neighborhood density, and semantic diversity, using newly available megastudy data from the Calgary semantic decision project (Pexman et al., 2016). In this megastudy, participants were required to classify words as concrete or abstract. In total, semantic decision RTs and accuracy rates were collected for 5000 concrete and 5000 abstract words from 321 participants. For present purposes, we conducted a simultaneous multiple regression analysis for 2451 concrete words which were associated with an accuracy rate of at least 70%. The predictors included control lexical variables (number of letters, number of syllables, number of morphemes, orthographic neighborhood size, word frequency), along with concreteness, semantic neighborhood density, and semantic diversity 2 . For RTs, the effects of semantic neighborhood density (p = 0.13) and semantic diversity (t < 1) were not significant. However, there was an effect of concreteness, with faster responses to concrete words (β = −0.51, p < 0.001, sr 2 = 0.24). Turning to accuracy rates, the effects of all three variables were significant or approached significance. Concrete words (β = 0.51, p < 0.001, sr 2 = 0.24) and words in dense neighborhoods (β = 0.05, p = 0.015, sr 2 = 0.002) were responded to more accurately, while ambiguous words (β = −0.04, p = 0.052, sr 2 = 0.001) were responded to less accurately. These regression analyses, although based on an independent abstract/concrete semantic decision dataset, are broadly consistent with the key findings in Experiment 2.

Combined Analyses
We conducted an additional analysis in which RT data from both experiments were combined, in order to statistically compare the magnitude of richness effects for different dimensions. To do this, all words high on richness (e.g., high concreteness, high number of features, high neighborhood density, high diversity) were coded as −0.5, while words low on richness were coded as 0.5. As before, clear and degraded words were, respectively, coded as −0.5 and 0.5. Table 9 presents the results for this combined analysis. The main effects of stimulus quality (p < 0.001) and semantic richness (p < 0.001) were significant, but there was no interaction. Following this, we created six contrast codes corresponding to the six possible pairwise comparisons between the four dimensions (C1: concreteness vs. number of features; C2: concreteness vs. semantic neighborhood density; C3: concreteness vs. semantic diversity; C4: number of features vs. semantic neighborhood density; C5: number of features vs. semantic diversity; C6: semantic neighborhood density vs. semantic diversity). For each contrast, we tested a model where the joint effects of stimulus quality, richness, and the respective contrast code were examined. Richness interacted significantly with C1 (p < 0.001), C2 (p < 0.001), and C3 (p < 0.001), but not with the other contrast codes. In other words, although the effects of concreteness were significantly larger than the effects of the other three variables, there was no significant difference between the effects of number of features, semantic neighborhood density, and semantic diversity. This suggests that although the effect of number of features was statistically significant in the individual analyses, while the effects of semantic neighborhood density and semantic diversity were not, this distinction did not hold up in the composite analysis. That being said, our study was designed to separately test the joint effects of stimulus quality with different semantic richness dimensions, and most likely lacks the statistical power to adequately compare the magnitude of different semantic richness effects. This question can be explored more systematically in future research based on more powerful designs.

GENERAL DISCUSSION
In the present study, we examined the joint effects of stimulus quality and four semantic richness dimensions (concreteness, number of features, semantic neighborhood density, semantic diversity) in verb/noun syntactic classification. Our primary objective was to ascertain if the additive effects of stimulus quality and semantic richness previously reported in lexical decision  generalized to a different binary decision task which is not familiarity-based, and which places more emphasis on semantic processing. With respect to this basic question at least, our results are clear-cut. There was no evidence for an interaction between stimulus quality and any of the targeted richness dimensions, either in mean RTs or in the RT distributional characteristics. In other words, the additive effects of stimulus quality and semantic richness cannot be fully attributed to the specific demands of lexical decision. That being said, the study also yielded a number of other findings which are more surprising and less straightforward. For example, semantic diversity (a measure of ambiguity) had no effect on RTs, but ambiguous words were associated with lower accuracy rates. On the other hand, concreteness effects were atypically large compared to the effects of number of features and semantic neighborhood density. These findings will now be discussed at greater length.

Semantic Richness Effects: The Role of Feedback
As mentioned in the Introduction, the semantic feedback account has been a popular perspective for accommodating semantic richness effects. Although not usually articulated, there is an underlying assumption that meaning-level activation also reaches the letter level by way of orthographic and phonological representations. Indeed, this fundamental assumption continues to inform influential computational models of visual word recognition (e.g., DRC, multiple read-out, CDP+) that incorporate the interactive activation model (McClelland and Rumelhart, 1981) as a cornerstone. Complementing the empirical observation of additive effects of stimulus quality and richness in lexical decision , the additive patterns reported by the present study provide further evidence against the view that feedback from semantics is able to reach earlier levels of representation in visual word recognition. The notion that letter-level processing is not modulated by semantic information meshes well with some recent findings from the semantic priming domain. Specifically, there is a wellknown overadditive interaction between stimulus quality and semantic priming, wherein degradation effects are larger for related (e.g., cat-DOG), compared to unrelated (e.g., hat-DOG), prime-target pairs (Meyer et al., 1975). One explanation for this interaction is that the prime word (e.g., CAT) activates related words (e.g., DOG) through spreading activation, and through feedback, there is prospective pre-activation of the lexical-and letter-level representations of the related words, thus attenuating the deleterious impact of degradation. This account has been undermined by a study by Thomas et al. (2012), who examined the stimulus quality × priming interaction for different types of prime-target pairs. Forward asymmetric pairs (e.g., keg-BEER) have a prime-to-target association but no targetto-prime association, backward asymmetric pairs (e.g., small-SHRINK) have a target-to-prime association but no prime-totarget association, and symmetric prime-target pairs (e.g., cat-DOG) are related in both directions. The key finding was that the stimulus quality × priming interaction was reliable only for pairs with a target-to-prime association (i.e., symmetric and backward asymmetric pairs), suggesting that the interaction was carried by a retrospective strategic process that depended on a relationship from the target to the prime. For our purposes, these results cannot be reconciled with an account based on a prospective semantic feedback mechanism, since that would predict an interaction for pairs with a prime-to-target association (i.e., symmetric and forward asymmetric pairs).
As discussed in the Introduction, additivity in computational models can be achieved by implementing thresholded output from the letter level (Besner and Roberts, 2003;Reynolds and Besner, 2004). However, why would letter-level processing be thresholded in the syntactic classification task? We do not have a definitive answer here, but suggest that the results may reflect a flexible lexical processor that is responsive to task context and demands, and which modulates processing pathways in order to optimize performance on a task (Balota et al., 1999;Balota and Yap, 2006;Tousignant and Pexman, 2012). As mentioned in the Introduction, uninflected verb stimuli were used in the present study. Therefore, in order to produce a correct response on the syntactic classification task, it is necessary to precisely identify a specific lexical representation to determine if its meaning denotes an action or entity.
Letter-level thresholding, which allows degraded stimuli to be normalized to match perceptually clear stimuli , can reduce the likelihood of a degraded letter string activating the meaning of an incorrect candidate. The impact of such an error would be profound and particularly difficult to recover from in syntactic classification. Our account is conceptually inspired by O'Malley and  proposal that there is letter-level thresholding in the speeded pronunciation task when participants have to name both words and nonwords. By "cleaning up" the stimulus, thresholding minimizes the possibility of lexical capture, whereby degraded nonwords may activate a word sufficiently strongly such that readers mistakenly read it as a word instead of the nonword.

Task-Specificity of Semantic Richness Effects
The present study was designed to extend the earlier lexical decision study by  by examining the joint effects of the same variables in syntactic classification. There were some noteworthy differences in the results of the two studies. Specifically, in lexical decision, all four semantic richness dimensions (imageability, number of features, semantic neighborhood density, semantic diversity) produced robust effects. In contrast, in syntactic classification RTs, semantic neighborhood density and semantic diversity effects were not reliable. Interestingly, this is not the first time these between-task dissociations have been reported. In order to tease apart task-general from task-specific processing, Yap et al. (2012) evaluated the influence of five semantic richness dimensions (imageability, body-object interaction, ambiguity, semantic neighborhood density, number of features) on five lexical processing tasks (lexical decision, go/no-go lexical decision, speeded pronunciation, progressive demasking, concrete/abstract semantic decision). Importantly, they also found that effects of ambiguity and semantic neighborhood density were not significant in the semantic decision task; Pexman et al. (2008) also failed to find semantic neighborhood effects in semantic categorization. Indeed, these null finding are corroborated by the item-level regression analyses we conducted on the recently published Calgary semantic decision project data (Pexman et al., 2016), which yielded the same pattern of results. In summary, although semantic richness is multidimensional, it is evident that the effect of some of these dimensions (e.g., number of features, concreteness/imageability) are more stable and generalizable across tasks, compared to others (e.g., semantic neighborhood density, semantic diversity) which show greater task-specificity. This suggests that particular facets of a word's semantic representation may carry more weight in influencing lexicosemantic processing. A few additional aspects of the foregoing findings are instructive. First, semantic neighborhood density effects seem to be relatively variable in tasks which place relatively more weight on semantic processing; they appear in some studies but not in others. There have been suggestions (e.g., Mirman and Magnuson, 2006) that there could be a trade-off between close neighbors (facilitatory effects) and distant neighbors (inhibitory effects); such opposing effects would produce diminished or null effects.
Second, it is reassuring that the analyses on the Calgary megastudy data revealed no effect of semantic diversity on RTs, but an inhibitory effect on accuracy rates (i.e., ambiguous words are less accurately responded to). Yap et al. (2011) also found that ambiguous words were less accurately classified in concrete/abstract semantic decision. Taken together, these trends mirror the findings of the present study, and further support the idea that the facilitation afforded by ambiguity is specific to lexical decision (e.g., Piercey and Joordens, 2000). As discussed earlier, an ambiguity disadvantage is typically reported in tasks which place an emphasis on semantic processing. The modeling work of Hoffman and Woollams (2015) suggests that the high contextual variability associated with semantically diverse words leads to noisy, underspecified semantic representations, which could impede semantic coding. It is unclear why the inhibition afforded by semantic diversity tends to influence accuracy rates, rather than RTs. This is an issue that merits future research.
Finally, based on the semipartial correlations in the supplementary regression analyses, it is clear that the proportion of variance accounted for by concreteness is far greater than that of the other richness dimensions. This was also reported by Pexman et al. (2016), and is consistent with the unusually large concreteness effect observed in Experiment 1. In the Pexman et al. (2016) megastudy, participants had to discriminate between concrete and abstract words, and the concreteness ratings of the concrete words, compared to the abstract words, were, by definition, much higher. This encourages participants to rely on the concreteness dimension to drive the concrete/abstract binary decision, thereby exaggerating the size of concreteness effects. Such a line of reasoning is analogous to participants relying on familiarity-based information in lexical decision, which inflates the size of frequency effects (Balota and Chumbley, 1984). Although Experiment 1 featured the verb/noun rather than concrete/abstract decision, the concreteness ratings were higher for nouns (M = 4.41) than for verbs (M = 2.81). As a result, it is likely that the concreteness effect was inflated by an emphasis on concreteness information as a discrimination dimension. Moving forward, it is methodologically better if semantic richness properties were more tightly matched for both categories in a semantic decision task. Nonetheless, the important point here is that concreteness effects, large as they were, were not moderated by stimulus quality in the present study. Even in an experimental setting which places such a premium on a particular semantic richness dimension, there is no evidence that this semantic information reaches early letter-level processes.

LIMITATIONS AND CONCLUDING REMARKS
We acknowledge that the present results may partly reflect the specific task demands of the syntactic classification task adopted. The decision to use the broad categories of verb and noun was to maximize the number of items that can be presented under the same task demands (see Pexman et al., 2016). However, there is evidence that the decision selected for a semantic task can moderate observed effects (Tousignant and Pexman, 2012;see Pexman et al., 2016, for more discussion). That being said, all other things being equal, researchers (e.g., Jared and Seidenberg, 1991) have recommended using broader, rather than narrower, categories. We are also encouraged by the degree of convergence between the results of Pexman et al. (2016), which used concrete/abstract decision, and the present study, which used verb/noun decision.
Certain methodological aspects of the present work could also be further tightened in future research. While the two levels of semantic richness for each dimension were wellmatched on lexical and semantic characteristics, the automated procedure (Van Casteren and Davis, 2007) we used matched nouns and verbs on lexical, but not semantic, variables. Nouns and verbs were not significantly different on number of letters, number of syllables, and orthographic neighborhood size, but nouns, compared to verbs, were slightly higher on frequency. Furthermore, as already discussed, concreteness ratings were higher for nouns than verbs in Experiment 1, which is likely to increase the reliance on concreteness for noun/verb discrimination. However, while verb/noun differences may modulate the emphasis on particular word dimensions for driving binary decisions, this does not qualify the joint effects of stimulus quality with richness, since the counterbalancing procedure ensures that the same items were rotated through clear and degraded conditions, and they thus serve as their own control.
Additionally, while the present study focused on four particular semantic richness dimensions (concreteness, number of features, semantic neighborhood density, semantic diversity) in order to facilitate comparisons to our previous lexical decision study , other richness dimensions of a more embodied nature (e.g., body-object interaction, sensory experience ratings, perceptual strength, emotional valence) remain unstudied in this paradigm and should be the object of future investigations.
Along with the study by , the present work reinforces the claim that one central aspect of the interactive activation framework, i.e., the interactive activation between letter-and lexical-level representations, does not appear to be compatible with how semantic richness effects unfold in visual word recognition. In both lexical decision and syntactic classification, we have observed additive effects of stimulus quality and richness, indicating that the additive pattern cannot be simply explained by lexical decision's emphasis on familiaritybased information. It is possible that the present results reflect a flexible lexical processor that can strategically engage thresholded early processing to optimize task performance. Specifically, we have suggested that thresholding reduces the likelihood that degraded words incorrectly activate the semantics of some other word, but this is speculative and needs to be empirically verified in future investigations.
In sum, the present findings help to further constrain our understanding of the interplay between semantic processing and semantic feedback mechanisms. Our results are consistent with others in the semantic richness literature, in showing that there are multiple dimensions of semantic richness and that these can have different effects both within and between tasks. At a broader level, this study adds to a growing literature showing that lexical semantics is multidimensional, variable, dynamic, and context-sensitive (Pexman et al., 2013).

AUTHOR CONTRIBUTIONS
MY and PP jointly conceptualized the study, and MY designed the experiments. The data were collected and analyzed by PP and MY, respectively. MY wrote the initial draft of the manuscript, and PP edited and commented on it.