What drives response time and accuracy in image naming? Moderators in the relationship between number of phonological neighbors and image naming performance

Hashimoto, Naomi; Heuer, Sabine; Cho, Chi C.

doi:10.3389/flang.2025.1625213

ORIGINAL RESEARCH article

Front. Lang. Sci., 29 October 2025

Sec. Language Processing

Volume 4 - 2025 | https://doi.org/10.3389/flang.2025.1625213

What drives response time and accuracy in image naming? Moderators in the relationship between number of phonological neighbors and image naming performance

Naomi Hashimoto¹^*

Sabine Heuer²

Chi C. Cho³

¹Communication Sciences & Disorders Program, Eastern Michigan University, Ypsilanti, MI, United States
²Program of Communication Sciences & Disorders, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
³Zilber College of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, United States

Insights into phonological activation patterns during lexical retrieval have been gained from simple image naming and picture word interference paradigm (PWIP) studies. Simple image naming studies allow for the manipulation of phonological variables, such as the number of phonological neighbors (NPN). PWIP studies allow for the manipulation of the relationship between a target and distractor, considering the effects of lexical co-activation. PWIP studies have reported a phonological facilitation effect when phonologically related stimuli are introduced during certain time-frames. We conducted a series of experiments in young, neurotypical adults using images that were validated across a number of measures known to affect naming performance. A simple image naming experiment (Experiment 1) was followed by two PWIP experiments, where the SOAs were set at +300 ms (Experiment 2) and +150 ms (Experiment 3) and images were paired with phonologically related or unrelated distractors. Across all experiments, we found that NPN was modulated by other variables such as age-of-acquisition, and image familiarity. While a main effect of distractor type was obtained for the PWIP experiments, there was no interaction between NPN and distractor type. The findings highlight the complex nature of NPN, and the subtle influences that NPN has on picture naming process.

Introduction

Naming an object is an essential act of speech production that may appear to be relatively simple, but actually requires several different processes (Dell, 1986; Indefrey and Levelt, 2004; Levelt et al., 1999; Rapp and Goldrick, 2000). Initially, the speaker must visually recognize the image of the object to be named, such as a picture of a dog. Next, during the semantic processing stage, the speaker must retrieve semantic information, which might include features such as “furry”, “canine”, and so forth. After that, during phonological encoding, the phonological properties for the target, [dOg], are retrieved. The speaker must spell out the entry into individual phonemes ([d], [O], [g]), prepare a motor plan for articulation and, finally, pronounce the word. One of the key insights of speech production research is that words do not activate in isolation. Rather, in the course of producing a particular target word, other words in the lexicon activate too (e.g., Meyer and Schvaneveldt, 1971; Neely, 1977; Oberle and James, 2013). At the phonological stages of naming, activation of the word form corresponding to the meaning we wish to convey will result in co-activation of similar word forms. Therefore, activation of one word spreads to its related words.

The picture-word interference paradigm (PWIP) is a well-established tool designed to study co-activation patterns of lexical items during single word production (see Arrigoni et al., 2025; Korko et al., 2024, for recent reviews). In the PWIP, participants are told to ignore the word and concentrate on naming the image. Although primarily used with young adults in chronometric studies (e.g., Bürki and Madec, 2022; Damian and Martin, 1999; Rayner and Springer, 1986; Schriefers et al., 1990; Starreveld and La Heij, 1995, 1996) or neuromapping studies (e.g., Abel et al., 2009, 2012; De Zubicaray and Mcmahon, 2009; Diaz et al., 2014; Rizio et al., 2017; Sakreida et al., 2019), other populations have been examined, including older adults (Taylor and Burke, 2002), bilinguals (Roelofs et al., 2016; Sá-Leite et al., 2021), or adults and children with language impairments (Hashimoto and Thompson, 2010; Seiger-Gardner and Schwartz, 2008). Additionally, this effect has been reported across different languages, including Chinese (Bi et al., 2009; Qu et al., 2021), Dutch (Meyer, 1991; Meyer and Schriefers, 1991; Starreveld, 2000), English (De Zubicaray et al., 2002; Lupker, 1982), German (Jeschniak and Schriefers, 2001), and Italian (Pisoni et al., 2017). An important element of the PWIP is that it allows us to examine certain aspects of the cognitive architecture of the language production system, namely, the time course of language production processes. This is accomplished by manipulating when the word is presented relative to when the image is presented. Known as stimulus-onset-asynchrony (SOA), this manipulation makes it possible to track the effects that occur over the course of the word retrieval process. The type of distractor word paired with the image elicits different effects: naming response times are slowed if the word is semantically (categorically) related to the picture (e.g., web – NET) compared to if the word is unrelated to the picture (e.g., rabbit – NET). This effect is known as the semantic interference effect (see Arrigoni et al., 2025; Bürki et al., 2020; Korko et al., 2024, for recent reviews). Of relevance to this study is the emergence of another effect, the phonological facilitation effect (PFE), wherein participants provide faster naming response times when phonologically related segment-image pairs (e.g.,/nε-/- NET) are presented relative to unrelated phonologically segment-image pairs (e.g.,/pi/- NET). Typically, PFEs occur when the word is presented after the image (see Indefrey, 2011; Strijkers and Costa, 2011 for reviews).

The emergence of PFEs at late, positive SOAs is taken as an indication that the phonological segments facilitated the preparation of the target name at the right time. If the phonological segments were to be presented at much earlier, negative SOAs (e.g., SOA = −300 ms), there would be no activation because phonological encoding of the segments would have decayed by the time of image presentation. Thus, if the distractor occurs at a late SOA, such as +300 ms, when lexical selection is already complete, the activation of phonemes may facilitate the final stage of production, namely articulatory-motor planning. Alternatively, if the distractor occurs at an earlier SOA, such as +150 ms, activation of phonemes could facilitate the process of phonological encoding. Both SOA conditions should therefore produce PFEs since phonological encoding processes are occurring at a time when phonological processing is active, thereby leading to stronger activation of the target word compared to unrelated words (see Indefrey, 2011; Strijkers and Costa, 2011, for reviews). The manipulation of SOAs in the PWIP therefore allows us to examine how phonological processes unfold over time and perhaps pinpoint the time-frame for the activation of PFEs.

The phonological-based distractors used in picture-word interference studies share the same phonological segments as the image name. As an example, an image such as net would be paired with the segment,/nε-/. The presented distractor,/nε-/, would activate a cohort of similar begin-related words (e.g., neck, nectar, nest), including the target, net, since all of these words share the distractor's component phonemes. While the PFE is well established when initial segments overlap between the distractor word and target picture name (e.g., Bi et al., 2009; De Zubicaray et al., 2002; Jeschniak and Schriefers, 2001; Lupker, 1982; Meyer, 1991; Meyer and Schriefers, 1991; Pisoni et al., 2017; Qu et al., 2021; Starreveld, 2000), no studies to date have examined this effect when using other similar word forms. One metric by which to characterize word form similarity is the number of phonological neighbors a target has. A phonological neighbor is a word that differs from the target by the substitution (and in some instances, the addition or deletion) of a single phoneme (Luce and Pisoni, 1998). As an example, the word net has neighbors such as pet, not, and neck, while the word judge has neighbors such as fudge and jut. Words differ from each other in the number of neighbors they have; thus, net has a relatively high phonological neighborhood density value (23 neighbors) compared to judge which has a relatively low phonological neighborhood density value (four neighbors).

Studies that have explored phonological neighborhood density effects in English picture naming paradigms have not reported consistent results in terms of the presence and direction of an effect: studies have either reported facilitative effects of phonological neighborhood density for naming response times (RTs) and naming accuracy (Newman and Bernstein Ratner, 2007), facilitative effects for naming RTs but not for naming accuracy (Vitevitch, 2002), inhibitory effects on both naming RTs and naming accuracy (Vitevitch and Stamer, 2006), or inhibitory effects on naming accuracy only (Newman and German, 2005). Additionally, some studies have reported no effects (Gordon and Kurczek, 2014; Vitevitch et al., 2004). Therefore, the effects of phonological neighborhood on word retrieval in language production are still not clear. The PWIP may be able to help resolve some of these conflicting results by allowing us to simulate the word selection process while manipulating the relationship between picture name and phonologically similar distractors.

The range of effects reported for phonological neighborhood density in simple image naming tasks also highlights the complexity of phonological processes and their interaction with other variables such as image agreement, name agreement, age of acquisition (AoA), lexical frequency, and conceptual familiarity. Image agreement, which refers to how well the mental image of a concept aligns with the presented image, is thought to index early stages in the picture naming process, specifically at the pre-linguistic object recognition stage (Alario et al., 2004; Perret and Bonin, 2019). Image agreement has been reported as a significant predictor of picture naming response time, indicating that pictures with higher agreement ratings are named faster than those with lower ratings (Alario et al., 2004; Barry et al., 1997; Bonin et al., 2002). A related variable, name agreement, refers to the degree to which participants agree on the name for an image, is measured by the number of different names elicited for a given image. This construct is localized at a pre-linguistic object recognition stage or the post-semantic stage, depending on whether the name represents an error or an alternate correct name, respectively (Barry et al., 1997; Vitkovitch and Tyrrell, 1995). Naming agreement is a robust predictor of naming performance; images with a high name agreement are named more quickly and accurately than images with low name agreement (e.g., Alario et al., 2004; Barry et al., 1997; Dell'Acqua et al., 2000; Lachman et al., 1974; Paivio et al., 1989; Snodgrass and Yuditsky, 1996; Vitkovitch and Tyrrell, 1995). Another important variable is AoA which refers to the age at which a particular concept has been acquired. AoA effect refers to the findings that earlier acquired words are named more quickly (Carroll and White, 1973). Subsequent reviews (Elsherif et al., 2023; Juhasz, 2005; Perret and Bonin, 2019) have consistently reported robust AoA effects in picture naming studies; specifically, naming RTs are significantly faster for earlier acquired words than later acquired ones. While several theories exist to explain the AoA effect, an integrated account of the AoA effect proposes a hybrid of these theories (Elsherif et al., 2023). According to the integrated account, the AoA effect is found because early acquired concepts have richer representations and connections with other concepts in the network compared to later acquired concepts. Later-acquired concepts must further fit into a network that has already well-established early-acquired concepts. Thus, the consolidation for the later-acquired concepts is not as strong as early-acquired concepts. This AoA effect becomes even more pronounced during tasks of arbitrary mapping between semantics and phonology, such as picture naming. Another frequently mentioned variable is conceptual familiarity, which refers to the degree to which a depicted concept (picture or drawing) is familiar to a participant. This variable influences the semantic processing stages, indicating the ease with which a conceptual representation is accessed. Conceptual familiarity effects refer to the fact that higher familiarity ratings will result in faster naming response times. Familiarity effects on naming performance are mixed; some studies report faster naming RTs for highly familiar concepts, while others have found no significant effects (see Alario et al., 2004, for a review). Finally, there is lexical frequency, which indicates how frequently a word is used in a given language. The word frequency effect refers to the fact that naming RTs will be faster for more frequent names. While a recent Bayesian meta-analysis (Perret and Bonin, 2019) found that the effects of lexical frequency were inconclusive, the effect has typically been found to be both reliable and replicable (e.g., Alario et al., 2004; Bates et al., 2003) and has broad support as an important determinant of naming response times and naming accuracy. When found, the lexical frequency effect is assumed to influence the phonological processing stages of naming (Barry et al., 1997).

The impact of these variables on naming performance has long been recognized when creating sets of pictures in English (Snodgrass and Vanderwart, 1980) and other languages (Duñabeitia et al., 2018). More recently, a Bayesian meta-analysis (Perret and Bonin, 2019) revealed that image agreement, name agreement, imageability, age of acquisition, and conceptual familiarity all had strong influences on naming response times. Moreover, subsequent studies have found interaction effects between phonological neighborhood density and phonological frequency (Hameau et al., 2021) as well as between name agreement, age of acquisition, and phonological neighborhood density (Karimi and Diaz, 2020).

Another source for the variability in results may stem from a simple, but often overlooked, issue: image standardization. While some studies have used images from a single standardized source (Hameau et al., 2021; Pisoni et al., 2017), others have not provided any description of their images (Laganaro et al., 2013), included images amassed from more than one source (Chan and Vitevitch, 2009; Middleton and Schwartz, 2010; Newman and Bernstein Ratner, 2007; Vitevitch, 2002; Vitevitch et al., 2004), or included a mixture of line drawings and photographs (Newman and Bernstein Ratner, 2007). Although some studies conducted off-line analyses of factors such as naming agreement or image complexity (Chan and Vitevitch, 2009; Laganaro et al., 2013; Pisoni et al., 2017; Vitevitch et al., 2004), these analyses do not necessarily capture the full range of visual variables that can affect image naming. This is of particular importance for studies that rely on reaction time data where influences could be subtle and more difficult to detect compared to a binary measure such as naming accuracy.

Current study

The aim of the study was to examine the PFE associated with number of phonological neighbors (NPN)¹ on picture naming accuracy and RTs in young neurotypical adults. The first experiment examined the effects of NPN in a simple naming experiment. As a step toward resolving the mixed results reported for simple naming paradigms, some of which we believe are due to a lack of image standardization, we created a new set of images developed from a single source. Since there are numerous factors known to influence naming performance, the list of stimuli would have become very constrained; therefore, our strategy was to obtain ratings for factors known to exert an influence on naming performance which would then be considered later in the statistical analysis. We hypothesized a significant main effect of NPN, wherein better naming performance would be found for images with denser neighbors compared to images with sparser neighbors. Interaction effects are also expected between NPN and other variables known to influence naming RTs and naming accuracy; specifically, earlier acquired words, highly familiar items, and words with higher name and image agreements should produce faster naming RTs and higher naming accuracy (Karimi and Diaz, 2020; Perret and Bonin, 2019). The second and third experiments were PWIP studies in which we manipulated word distractor types, each presented at SOAs of +300ms (Experiment 2) and +150 ms (Experiment 3). The PWIP was used because it allowed us to manipulate distractor word types, and consequently, to examine the time-course of PFEs during lexical retrieval processing. Since no study to date has manipulated phonological neighbors in the context of the PWIP, we chose SOAs that have consistently elicited PFEs when using begin-related phonological segments (e.g., Bi et al., 2009; De Zubicaray et al., 2002; Jeschniak and Schriefers, 2001; Lupker, 1982; Meyer, 1991; Meyer and Schriefers, 1991; Pisoni et al., 2017; Qu et al., 2021; Starreveld, 2000). As is the case with existing literature on PFEs, we also hypothesized PFEs when manipulating NPN. Two SOAs were chosen to ensure that the time-frame of phonological activation was adequately covered. Specifically, we hypothesized a main effect of NPN wherein better naming performance in terms of naming RTs and naming accuracy rates would be found for images with denser neighbors compared to images with sparser neighbors. We also hypothesized a significant main effect for distractor type in that phonologically related distractors should lead to significantly faster naming RTs compared to unrelated distractors. Lastly, we hypothesized an interaction effect between NPN and distractor type whereby images with higher NPN paired with related distractors should be named significantly faster and more accurately than either higher NPN images paired with unrelated distractors or lower NPN images paired with related or unrelated distractors.

Method

Word stimuli

We selected 96 monosyllabic words to use across all experiments. Using the CLEARPOND database (Marian et al., 2012), the words were described in terms of NPN. While only monosyllabic words were used, the number of phonemes differed. Therefore, phoneme complexity, or the number of phonemes of each word, was also calculated. Length effects are indicative of phonological processes (Barry et al., 1997; Perret and Bonin, 2019). Some studies have reported that an increased number of phonemes leads to longer naming RTs in young adults while other studies have reported no effects (see Alario et al., 2004, for review). Word frequency indicates how frequently a word is used in a given language. The word frequency effect refers to the fact that naming RTs will be faster for more frequent names. While a recent Bayesian meta-analysis (Perret and Bonin, 2019) found that the effects of lexical frequency were inconclusive, the effect has typically been found to be both reliable and replicable (e.g., Alario et al., 2004; Bates et al., 2003) and has broad support as an important determinant of naming RTs and naming accuracy. Therefore, we included this variable using a language-specific corpus (Brysbaert and New, 2009). When found, the lexical frequency effect is assumed to influence the phonological processing stages of naming (Barry et al., 1997).

Image stimuli

Three graphic design artists were directed to create 96 black-and-white line drawings, one for each target word. Each artist was asked to create 32 images. The following parameters were predetermined and incorporated equally across all images: size; line weight; (absence of) color; orientation; viewpoint; depth cues and shading; luminance; and visual complexity. Size, orientation and level of detail across the depictions of these images were edited for the same level of visual complexity and consistency, as judged by one of the authors and the artists. Images were edited until consensus was achieved. Thus, the physical aspects of the images were consistently created. Figure 1 provides an example.

Figure 1

Illustration of a fishing net with a long handle and a diamond-patterned netting. The net is depicted in black and white. The handle is straight and extends to the left of the net.

Figure 1. An example of an image used across the three experiments.

Given the robust effects of AoA, name agreement, image agreement, familiarity, visual complexity on picture naming, we included AoA ratings (Kuperman et al., 2012) and obtained normative ratings for the other variables (see Perret and Bonin, 2019, for review). This was a particularly important step given that we were using newly created images for our naming experiments. A description of the participants, procedures, instructions, and results of normative data collection are detailed in the Supplementary material - Validation of Images. Table 1 displays a summary of ratings across the various variables. Supplementary material – Image Stimuli provides the lists of images used in Experiment 1 and image-distractor word pairings used in Experiments 2 and 3.

Table 1

Table 1. Descriptive statistics–normative ratings of images.

Experiment 1 – image naming

Experiment 1 was a simple naming experiment. Its primary purpose was to test a newly-developed set of standardized images, and to replicate previous studies using phonological neighbors as a variable. On each trial, participants saw a single image, and named it as quickly as possible. The variable of interest was the NPN of the image label, which included a range of high number of neighbors (e.g., net has 24 neighbors) to low number of neighbors (e.g., judge has 4 neighbors).

Participants

Forty-eight participants (M = 21.91; SD = 2.17) provided informed consent to be part of the study (project number 18.051, granted 11.03.2017). They met the following inclusionary criteria: (a) age range between 18 and 35 years of age; (b) English as a native language or English as the primary language; (c) normal or corrected-to-normal vision with contacts or glasses; (d) adequate hearing acuity by self-report. Exclusionary criteria included past or current language or cognitive impairments.

Procedure

Experiment 1 consisted of two phases. During the familiarization phase, participants were seated in front of a computer screen to view all images while the experimenter named each image. This procedure ensured that participants recognized each image and knew its label. During the testing phase, images were presented in a randomized order and participants were asked to name each image as accurately and as quickly as possible. Each participant saw all 96 images. E-Prime 2.0 was used to present the stimuli. Chronos, a voice-activated response recording device (Psychology Software Tools) was used to record participants' naming RTs. All responses were provided in English. The entire experimental session took approximately 15–25 min.

Data analyses

The outcome of interest for each trial of an experiment was the response time to correctly identify the image, commonly referred to as ‘time-to-event” outcome that requires a specific type of statistical method called survival analysis (Altman and Bland, 1998). In our experiments, we were interested in accuracy as well as naming RTs. Since participants may make errors in identifying the image, simply taking the response time as the outcome would have been inappropriate because it was possible to have a fast RT with an erroneous response. Conversely, excluding erroneous responses and using only the RT from correct responses would have reduced the sample size and potentially led to selection bias. A survival analysis method called Cox's proportional hazard (PH) model was used to analyze the time needed to correctly name an image and hazard ratios (HR) are estimated from the model to summarize the difference in the risk of event (i.e., correct identification of image) between groups (Harre et al., 1988; Machin et al., 2006). In addition, Kaplan-Meier curves were created to illustrate the probability of getting the correct response over time, with greater separation between curves (i.e., lines) indicating greater difference in the probability correct identification of the image (Rich et al., 2010).

A total of 4,688 trials were included in the analyses. Data were excluded due to equipment errors (3.69%) and RTs that were less than 500 ms (1.13%), resulting in a total of 4.82% of data that were excluded.

The independent factors considered for Experiment 1 included: NPN (range: 3–39); word frequency (natural-log transformed range: 0.77–4.97); phoneme complexity, indexed as number of phonemes (range: 2–5); image familiarity, indexed as ratings from the normative sample (range: 2.59–5.0); and AoA (range 2.5–12.5). Data were collected at two sites and site was used as a control variable. To answer the hypothesis for Experiment 1, the main effects and two-way interaction between NPN and the other factors were examined in the model. All analyses were completed using SAS 9.4 (Cary, NC).

Results

Overall, 91.88% of images were correctly identified (M = 1,019.03 ms; SD = 579.11 ms). See Table 2 for the descriptive statistics. A significant NPN × Image Familiarity (p < 0.0001) interaction was found. While the time needed to correctly name an image generally decreased with greater image familiarity, NPN contributed significantly to how fast an image was correctly identified. Specifically, the results indicated that for images with high familiarity ratings (i.e., images with ratings greater than 4), there was no significant difference in the time needed to correctly name an image across the NPN. However, images with relatively low familiarity image ratings (i.e. images with ratings less than 4), required significantly longer time to be correctly named if they were images with higher NPN. See Figure 2 for an illustration of the NPN x Image Familiarity interaction effect. The results also revealed a significant NPN x AoA (p = 0.0278) interaction. Generally, images whose concepts were acquired later in childhood required significantly longer time to be correctly named. The significant interaction effect, however, indicated that for words acquired in early childhood, NPN had minimal effects while for words acquired in later childhood, NPN had a significant effect: the time to correctly name an image took significantly longer when NPN was higher. See Figure 3 for the NPN x AoA interaction effect. Finally, there was a significant phoneme complexity effect (p < 0.0001); words with less phonemes took a significantly shorter time be named correctly.

Table 2

Table 2. Descriptive statistics – response time (RT) in milliseconds and accuracy data (%) for experiments 1, 2, and 3.

Figure 2

Three-panel line graph showing cumulative correct response percentage versus reaction time in milliseconds. Panels represent image familiarity levels 3, 4, and 5. Four lines in each panel indicate phonological neighbors: blue line for 4, red dashed for 14, green dashed for 24, and black for 34. Reaction time varies across familiarity levels, with response times significantly longer for image names with higher NPN.

Figure 2. The panel headings indicate image familiarity ratings from low (3) to high (5). The x- axis indicates response time in milliseconds for each panel. The y axis indicates the response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image decreased with increased image familiarity. For highly familiar images, there was no significant NPN effect on response time. The less familiar the images, the greater the differences in response time became for images of varying NPN and response times were significantly longer for image names with higher NPN (p < 0.0001).

Figure 3

Four cumulative distribution graphs show the cumulative correct response percentage over reaction time in milliseconds for acquisition ages of 3, 6, 9, and 12. Each graph displays different lines representing the number of phonological neighbors: 4, 14, 24, and 34. Each panel suggests that older acquisition ages and more phonological neighbors correspond to slightly longer reaction times.

Figure 3. The panel headings indicate age of acquisition (AoA) for the image name from early-acquired age (3) to later-acquired age (12). The x-axis indicates response time in milliseconds for each panel. The y-axis indicates response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image increased with increased AoA. For images with early AoA, there is no significant NPN effect on response time. For image names with late AoA, NPN had a significant effect on response time. Response times were significantly longer for image names with higher NPN, (p = 0.0278). Moreover, the later the AoA, the greater the impact of NPN.

Experiments 2 and 3: naming with an auditory distractor at SOA+300 and +150 ms

The visual stimuli were the same images used in Experiment 1. However, each image was accompanied by a spoken distractor that occurred 300 and 150 ms after image onset, respectively. The distractor was either phonologically related (e.g., neck – net) or unrelated to the target (e.g., wedge – net).

Participants

For Experiments 2 and 3, 41 college-age participants (M = 21.85; SD = 3.22) and 46 college-age participants (M = 21.85; SD = 3.22) were recruited, respectively. They met the same inclusionary criteria described for Experiment 1 and provided informed consent to be part of the study (project number UHSRC-FY-19-20-39, granted 08.21.19). None had participated in any of the other experiments.

Stimuli

In addition to the same 96 images, we selected 192 distractor words (96 target images x 2 distractor types). That is, for each target word, we selected two monosyllabic distractor words, one that was phonologically and semantically unrelated to the target (e.g., wedge for target net), and one that was a phonological neighbor of the target (e.g., neck for target net). The 96 phonologically-related distractors were evenly distributed among C1 substitutions (e.g., shop for target mop), vowel substitutions (e.g., bolt for target belt), and C2 substitutions (e.g., neck for target net). Additionally, semantic relatedness (e.g., categorical, subordinate, superordinate, associative, coordinate) between related and unrelated distractors was avoided. See Table 3 for descriptive data by distractor type.

Table 3

Table 3. Descriptive statistics – distractor words.

Lists

Two lists were developed. In List A, half of the 96 target words were paired with their unrelated distractor. In List B, the other half were paired with their phonologically-related distractor.

Procedures

All participants were randomly assigned to either List A or List B. During the familiarization phase, participants were seated in front of a computer screen to view all images while the experimenter named each image. During the testing phase, images were paired with auditory distractors. Each trial consisted of the following: first, a fixation point was displayed in the center of the screen; second, the target image was presented; third, the auditory distractor was presented 300 or 150 ms after image presentation. Participants were asked to name each image as accurately and as quickly as possible. E-Prime 2.0 was used to present the stimuli, and Chronos (Psychology Software Tools) was used to record participants' responses and RTs. All responses were provided in English. The entire experimental session took approximately 15–25 min.

Data analyses for Experiments 2 and 3

The analysis for Experiments 2 and 3 were also conducted using a similar Cox's PH model as in Study 1. However, to address the specific hypotheses for Experiments 2 and 3, the main effect of NPN was examined along with Distractor Type (Neighbor/Unrelated). In addition, the model also explicitly tested the interaction effects of NPN x Distractor Type and NPN × AoA.

A total of 3,937 trials were included for Experiment 2. Data were excluded due to equipment error (17.96%) and RTs that were less than 500 ms (1.75%). This resulted in a total of 19.72% of data that were excluded. A total of 4,416 trials were included for Experiment 3. Data were excluded due to equipment error (1.15 %) and RTs that were less than 500 ms (2.58 %), resulting in 3.74% of data that were excluded.

Results – Experiment 2

Overall, 91.46% of images in Experiment 2 were correctly identified (M = 889.12 ms; SD = 259.55 ms). See Table 2 for the descriptive statistics. The NPN x Distractor Type interaction was not significant (p = 0.4586). However, a significant main effect of Distractor Type was obtained (p = 0.0211): A higher percentage of images paired with related distractors (92.53%) were correctly named relative to images paired with unrelated distractors (90.30%). Consequently, the time needed to correctly name images paired with related distractors was significantly shorter compared to images paired with unrelated distractors. A significant NPN x Image Familiarity interaction effect (p = 0.0002) was also obtained, replicating the results described in Experiment 1: Images with lower familiarity ratings and high NPN required significantly longer times to be named. Please refer to Figure 4 for graphical illustration of this interaction effect. The NPN × AoA interaction effect was moderately significant (p = 0.0487), and replicated effects described for Experiment 1: NPN had minimal effects on words acquired in early childhood while for words acquired in later childhood, the time to correctly name an image took significantly longer when NPN was higher. Please refer to Figure 5 for a graphical illustration of this interaction effect. Additionally, significant frequency effects (p = 0.0393) and phoneme complexity effects (p = 0.0030) were obtained, such that more frequent words and words with less phonemes resulted in faster times to correctly name an image.

Figure 4

Cumulative correct response curves across three panels display reaction time in milliseconds versus percent correct responses. Panels compare image familiarity from 3 to 5. Each panel shows curves for different numbers of phonological neighbors: 4 (blue), 14 (red), 24 (green), and 34 (black). The response times were significantly longer for image names with higher NPN.

Figure 4. The panel headings indicate image familiarity ratings from low (3) to high (5). The x- axis indicates response time in milliseconds for each panel. The y axis indicates the response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image decreased with increased image familiarity. For highly familiar images, there was no significant NPN effect on response time. The less familiar the images, the greater the differences in response time became for images of varying NPN and response times were significantly longer for image names with higher NPN, (p = 0.0002).

Figure 5

Cumulative correct response percentage graphs are plotted against reaction time in milliseconds, segmented by acquisition age: 3, 6, 9, and 12. Each panel shows lines for four numbers of phonological neighbors: 4, 14, 24, and 34, represented in blue, red, green, and black respectively. The response curves demonstrate a decrease in cumulative correct response with increasing reaction times, especially notable in older acquisition ages.

Figure 5. The panel headings indicate age of acquisition (AoA) for the image name from early-acquired age (3) to later-acquired age (12). The x-axis indicates response time in milliseconds for each panel. The y-axis indicates response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image increased with increased AoA. For images with early AoA, there was no significant NPN effect on response time. For image names with late AoA, NPN had a significant effect on response time. Response times were significantly longer for image names with higher NPN, (p = 0.0487).

Results – Experiment 3

Overall, 93.56% of images in Experiment 3 were correctly identified (M = 1,294.94 ms; SD = 810.76 ms). See Table 2 for the descriptive statistics. Findings from Experiment 3 were similar to those found in Experiment 2. An NPN x Distractor Type interaction effect was not obtained (p = 0.5002). Rather, a significant main effect of Distractor Type was found (p = 0.0006). Specifically, a higher percentage of images paired with related distractors (94.86%) were correctly named relative to images paired with unrelated distractors (92.27%). Consequently, the time to correctly name image—related distractor word pairs was significantly faster compared to image—unrelated distractor word pairs. A significant NPN x Image Familiarity interaction effect (p = 0.0002) and a significant NPN × AoA interaction effect (p = 0.0015) was found again. See Figures 6, 7 for graphical illustration of these effects. Finally, a significant phoneme complexity effect was again found (p < 0.0001).

Figure 6

Line graphs show the cumulative correct response percentage over reaction time in milliseconds across three panels with different image familiarity levels: 3, 4, and 5. Each graph presents lines for phonological neighbor numbers 4, 14, 24, and 34, distinguished by blue, red, green, and black colors. Reaction time varies across familiarity levels, with response times significantly longer for image names with higher NPN.

Figure 6. The panel headings indicate image familiarity ratings from low (3) to high (5). The x- axis indicates response time in milliseconds for each panel. The y axis indicates the response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image decreased with increased image familiarity. For highly familiar images, there was no significant NPN effect on response time. The less familiar the images, the greater the differences in response time became for images of varying NPN and response times were significantly longer for image names with higher NPN, (p = 0.0002).

Figure 7

Four line graphs compare cumulative correct response percentages against reaction time in milliseconds for acquisition ages 3, 6, 9, and 12. Each graph has lines representing four levels of phonological neighbors: 4, 14, 24, and 34. The graphs show that higher correct responses correspond to higher acquisition ages, with varying reaction times across different numbers of phonological neighbors.

Figure 7. The panel headings indicate age of acquisition (AoA) for the image name from early-acquired age (3) to later-acquired age (12). The x-axis indicates response time in milliseconds for each panel. The y-axis indicates response in percent accuracy. The curves within each panel indicate the response times for images of varying number of phonological neighbors (NPN). In general, time to correctly name an image increased with increased AoA. For images with early AoA, there was no significant NPN effect on response time. For image names with late AoA, NPN had a significant effect on response time. Response times were significantly longer for image names with higher NPN, (p = 0.0015). Moreover, the later the AoA, the greater the impact of NPN.

Discussion

The aim of the study was to examine the PFEs associated with PWIPs when we manipulated the NPN during picture naming in young neurotypical adults. While the first experiment involved a simple naming experiment, the second and third experiments utilized a PWIP in which word distractor types, either phonologically related or unrelated to the image, were presented at 300 ms (Experiment 2) or 150 ms (Experiment 3) after image presentation.

In both PWIP experiments, we observed a PFE which confirmed our prediction that images paired with phonologically related distractors would be named significantly faster compared to images paired with unrelated distractors. Our finding of whole-word facilitation adds to the PWIP literature which report PFEs for phonological segments (e.g., Bi et al., 2009; De Zubicaray et al., 2002; Jeschniak and Schriefers, 2001; Lupker, 1982; Meyer, 1991; Meyer and Schriefers, 1991; Pisoni et al., 2017; Qu et al., 2021; Starreveld, 2000). The significant distractor type effect suggests that words will co-activate other phonologically related words, thus allowing for stronger convergence onto the targeted image name. The PFE appears to be a robust phenomenon that occurs not only at the phonological segment level (e.g., Bi et al., 2009; De Zubicaray et al., 2002; Jeschniak and Schriefers, 2001; Lupker, 1982; Meyer, 1991; Meyer and Schriefers, 1991; Pisoni et al., 2017; Qu et al., 2021; Starreveld, 2000), but at the lexical (word) level as well.

The hypothesized significant interaction between NPN and distractor types was not, however, found in Experiments 2 and 3. Findings from English picture naming paradigms have reported facilitative effects of phonological neighborhood density for naming RTs and/or naming accuracy (Newman and Bernstein Ratner, 2007; Vitevitch, 2002), inhibitory effects on naming RTs and/or naming accuracy (Newman and German, 2005; Vitevitch and Stamer, 2006), or no effects (Gordon and Kurczek, 2014; Vitevitch et al., 2004). These mixed results had been thought to be due to variables known to influence naming performance (e.g., AoA, name agreement, lexical frequency) but which had not been carefully controlled (e.g., Hameau et al., 2021; Karimi and Diaz, 2020; Perret and Bonin, 2019). However, those variables were carefully considered in this study so the lack of a significant main effect of NPN or significant interaction between NPN and distractor type are not completely understood. These findings underscore the fact that NPN is a construct which exerts subtle yet complex influences on the picture naming process.

The emergence of an NPN effect only in the context of other variables is exemplified by the significant interaction effects we obtained across the three experiments, a finding which has also been reported by others (e.g., Karimi and Diaz, 2020). In our study, the time to correctly name an image was shorter for highly familiar items compared to low familiar items (as indexed by familiarity ratings). This was presumably because highly familiar items activated a strong semantic network structure (i.e., a highly inter-connected network with strong representations) relative to low familiar items (Alario et al., 2004; Ellis and Morrison, 1998; Snodgrass and Yuditsky, 1996). However, the more interesting finding was the significant interaction obtained between NPN and familiarity: the time needed to correctly name highly familiar images was not influenced by NPN, while the time needed to correctly name low familiar images was significantly influenced by NPN, with longer RTs for images with higher NPN. In other words, an NPN effect appeared only for images with low familiarity ratings and the effect was opposite to the hypothesized direction. Thus, it appears that highly familiar target images possessed stable semantic network activation such that the co-activation of the image's phonological neighbors did not impact naming performance. In contrast, for low familiar images, the semantic network activation was not as strong or as stable as for highly familiar images; therefore, it could be that co-activation of many phonological neighbors created competitive processes that negatively impacted naming performance.

Across both PWIP experiments, we also observed a robust AoA effect. According to the integrated theory (see Elsherif et al., 2023, for review), AoA effects occur because early acquired concepts establish stronger, richer connections with other concepts in the network compared to later acquired concepts. These AoA effects become even more apparent when using tasks such as picture naming because the arbitrary mappings between semantics and phonology produce even greater naming latencies compared to other tasks such as word naming or lexical decision tasks. In the current study, images whose concepts were acquired later in childhood required a longer time to be correctly named than images whose concepts were acquired earlier in childhood. These findings were consistent with the AoA literature (see Elsherif et al., 2023; Juhasz, 2005; Perret and Bonin, 2019, for reviews). The significant interaction effect between NPN and AoA was in line with our predictions that NPN could interact with other variables such as AoA (Karimi and Diaz, 2020). The time needed to name images whose concepts were acquired early in childhood was not affected by the image's NPN, whereas images whose concepts were acquired later in childhood took longer to be named correctly if the image's NPN was higher. This inhibitory effect of NPN in the context of later-acquired AoA concepts was also observed in a meta-analysis of simple image naming studies by Karimi and Diaz (2020) who, like us, did not observe a main effect of phonological neighborhood density but a significant interaction effect with AoA. In our study specifically, later-acquired AoA concepts were associated with longer naming RTs which may have been due to not only a weaker network structure, but also because of competitive processes induced by the activation of a greater number of phonological neighbors. Thus, two factors, the network's structure and competition, negatively impacted naming performance for later acquired concepts whose images had higher NPN. However, there was no NPN effect for early acquired concepts, presumably because the stronger semantic and phonological network structures characteristic of early AoA concepts was stable enough so as to not be influenced by the co-activation of NPN of the image when a related distractor was presented.

To summarize, our results highlight the dynamic processes that can occur between NPN and image naming such that naming performance can be influenced by other variables such as AoA and image familiarity (Hameau et al., 2021; Karimi and Diaz, 2020). More specifically, NPN can inhibit word retrieval processes when using later-acquired and low familiarity stimuli but have no effect on earlier acquired AoA and highly familiar stimuli. Thus, the manipulation of NPN in naming studies must take into consideration the influence of factors, such AoA and familiarity, on picture naming RTs and accuracy. These results also align with the phonological neighborhood density literature which reports a range of effects in English picture naming paradigms (see Hameau et al., 2021, for a review).

The observed phoneme complexity effect found across all three experiments indicated that words with less phonemes resulted in faster naming times. These findings were in keeping with studies which report that more phonemes in a word leads to longer naming RTs (Snodgrass and Yuditsky, 1996). A frequency effect, which was only found in Experiment 2, indicated that high frequency words would result in faster naming RTs. This finding was consistent with several previous studies (e.g., Alario et al., 2004; Bates et al., 2003). However, the fact that this effect was only found in one of three experiments attests to its tenuous status, a finding that is consistent in reviews that examine the effects of both lexical frequency and AoA effects in picture naming. Such reviews typically find significant AoA effects without a corresponding lexical frequency effect (e.g., Elsherif et al., 2023; Juhasz, 2005; Perret and Bonin, 2019).

The other aspect of the study involved a manipulation of SOA. The results in Experiment 2 and 3 differed only marginally. We observed a significant frequency effect at SOA +300 ms, but not at SOA +150 ms. Further, the significance level for the AoA by NPN interaction was different at +300 ms (p = 0.0487) compared to the +150 ms SOA (p = 0.0015), indicating subtle differences in underlying activation processes at the two time points. Nevertheless, the overall findings were replicated at both SOAs, indicating consistency in activation patterns and a robust time frame in which these processes are observed, at least in young, language-normal adults. Further, the interaction effects between NPN, familiarity, and AoA effects suggested a degree of interactivity between the semantic and phonological processing levels at both SOAs (It should be noted, however, that the specific nature and direction of activation was not explored in this study).

Another finding that has methodological implications relates to image use. Variables known to affect naming RTs and naming accuracy (e.g., name agreement, familiarity, visual complexity) are image-set specific and should be carefully controlled; otherwise, factors that influence naming performance can exert unknown influences on participants' performances. Since we accounted for these variables in our study, we feel that the significant interactions found between NPN, AoA, and image familiarity were valid. The interactions between the variables described in the study highlight the complex dynamics of underlying cognitive-linguistics processes that affect naming RTs and accuracy rates, even when they are not the target of experimental manipulations. Researchers and clinicians should therefore consider image norms for characterizing their visual stimuli in order to control for potential confounding effects in research and clinical applications.

Our study included the following limitations: although the Cox's PH model used in this study assumed that responses for different images were independent, this was probably not the case (since each participant provided responses for a number of the same images). Nevertheless, this analysis was still preferred over an analysis of only correct responses (i.e., linear regression) since the observed reaction time for incorrect responses was included as censored data. With regards to the interpretation of the results using linear regression vs. survival analysis, for linear regression, the results are interpreted as the relationship between the independent variable and the dependent variables, which for our study would be correct RTs. The focus of linear regression would be on how well the set of independent variables explain the variation in the outcome of correct reaction time. In contrast, for survival analysis, the results are interpreted in terms of the probability of responding correctly (as opposed to responding incorrectly) throughout the range of reaction times. The focus for survival analysis is on the timing of events (correct/incorrect) and the impact of the independent variables on the probability of being correct/incorrect. Furthermore, it should be noted that both linear regression and survival analysis are statistical methods to identify associations and do not address causation and thus, cannot provide answers to why people succeed. Related to this, we also acknowledge that by not imposing a time limit on a participant's response, we may have captured responses that reflected other cognitive processes than those associated with the intended automatic, rapid naming responses (e.g., participants might have focused more on accuracy than response speed in our experimental paradigm). A final limitation was our decision to remove outliers below 500 ms. While this choice was based on review studies (Indefrey, 2011; Strijkers and Costa, 2011), that provide the average timeline for verbal production, we also recognize that eliminating these outliers could have resulted in eliminating some valid responses.

A direction for further research is the application of this novel image naming experiment to relevant populations who experience prominent naming difficulties since the PWIP mirrors what is typically done in clinical practice: cues are provided to the person with word finding difficulties after the picture has been presented, when it is apparent that some form of clinician support is needed to facilitate the naming process. Thus, the application of the paradigm to individuals who experience word retrieval difficulties as a result of neurological conditions (e.g., aphasia, TBI, dementia) or aging processes (e.g., older adults) may provide further insights into the role of NPN on naming performance.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Eastern Michigan University Institutional Review Board and the University of Wisconsin-Milwaukee Institutional Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

NH: Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing. SH: Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing. CC: Formal analysis, Methodology, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/flang.2025.1625213/full#supplementary-material

Footnotes

1. ^We chose to use the term, “number of phonological neighbors” (NPN) since this variable was treated as a continuum and not artificially separated into low, medium, or high PND items.

References

Abel, S., Dressel, K., Bitzer, R., Kümmerer, D., Mader, I., Weiller, C., et al. (2009). The separation of processing stages in a lexical interference fMRI-paradigm. NeuroImage 44, 1113–1124. doi: 10.1016/j.neuroimage.10.018