Role descriptions induce gender mismatch effects in eye movements during reading

The present eye-tracking study investigates the effect of gender typicality on the resolution of anaphoric personal pronouns in English. Participants read descriptions of a person performing a typically male, typically female or gender-neutral occupational activity. The description was followed by an anaphoric reference (he or she) which revealed the referent's gender. The first experiment presented roles which were highly typical for men (e.g., blacksmith) or for women (e.g., beautician), the second experiment presented role descriptions with a moderate degree of gender typicality (e.g., psychologist, lawyer). Results revealed a gender mismatch effect in early and late measures in the first experiment and in early stages in the second experiment. Moreover, eye-movement data for highly typical roles correlated with explicit typicality ratings. The results are discussed from a cross-linguistic perspective, comparing natural gender languages and grammatical gender languages. An interpretation of the cognitive representation of typicality beliefs is proposed.


INTRODUCTION
In talking about human beings, gender information can be transmitted in different ways, e.g., via grammatical gender cues and gender-typical lexemes. Grammatical gender is marked, for example, in morphological elements which may express the gender of the referent such as the suffix -in in German (e.g., Lehrer-in, teacher feminine ). The gender typicality of lexemes results from the likelihood of personal nouns to refer to men or women. Thus, the noun nurse has female typicality and surgeon male typicality, because of their likelihood to be associated with a female or a male referent respectively, as shown in typicality ratings (cf. Kennison and Trofe, 2003). The purpose of the present paper is to analyze the effect of gender typicality on the resolution of a pronominal anaphor when gender typicality is conveyed by a description of a role rather than a role noun antecedent. Namely, we investigate a socio-psychological concept, expectations about gender roles, with the help of a psycholinguistic tool, the paradigm of anaphor resolution during sentence reading. Our approach makes use of verbal descriptions and allows for comparing a natural gender language with a grammatical gender language, as will be outlined in detail below. The present study deals with English, a language which does not possess a grammatical gender system ("natural gender language, " see Hellinger and Bußmann, 2001). Since most professional roles lie in the range of moderate stereotypicality, we explore both the effect of roles with high and moderate degrees of gender typicality. Previous studies, however, mainly focused on the gender typicality effect of strongly stereotyped roles; thus, in a reading time study employing role nouns, Kennison and Trofe (2003) presented gendertypical roles as antecedents and personal pronouns as anaphors. The gender mismatch condition (e.g., The executive. . . She. . . ) prompted longer reading times in the spillover region following the pronoun compared to the matching condition. The results indicated that the role nouns triggered gender-typical representations of the referent which either agreed or disagreed with the following pronominal anaphor. Garnham et al. (2002) conducted a reading study employing both role nouns and short expressions referring to gender typical habits or biological characteristics (e.g., wearing a bikini; giving birth). The study shows that a mismatch between the two pieces of information produced longer reading times, even when the presentation order of the two pieces of information was reversed, suggesting that gender inferences were made elaboratively and not only when the inference was necessary for the coherent interpretation of the text.
In a reaction time study, Oakhill et al. (2005) asked participants to judge if pairs composed of gender stereotypical and gender definitional role nouns (e.g., surgeon-sister) could apply to the same person. Results showed that the activation of stereotypical information was automatic and difficult to suppress, even with instructions encouraging participants to explicitly reconsider the stereotypical representations of the roles. Pyykkönen et al. (2010) explored the effect of gender stereotypes on spoken language processing in Finnish, a language which also does not possess a grammatical gender system, by means of the visual-world paradigm. Participants heard stories presenting a gender typical role noun, in association with pictures of male or female characters. Results showed an activation of gender stereotypes triggered by the spoken role nouns, even if this activation was not needed to establish greater discourse coherence.
Most psycholinguistic studies investigating gender typicality effects on anaphor resolution in English (e.g., for eye-tracking methodology Sturt, 2003;Duffy and Keir, 2004;Kreiner et al., 2008; for ERP methodology, Osterhout and Mobley, 1995;Osterhout et al., 1997) used reflexive pronouns (himself/herself ) to reveal referential gender. The results of these studies document a consistent mismatch effect on the anaphor region or the subsequent region, caused by conflicts between the gender typicality of role noun antecedents and the following anaphors.
To summarize the main findings of studies on natural gender languages, one can state that incongruence between the gender typicality of the antecedent role nouns and the anaphor gender triggers a slowdown in resolution, for both personal and reflexive pronouns.
In grammatical gender languages, in contrast to natural gender languages, role nouns carry additional grammatical gender cues, which also affect the representation of referential gender. As a consequence, the effect of grammatical gender and gender typicality usually appear in interaction, and the specific contribution of the different factors can be difficult to disentangle. Esaulova et al. (2014), for example, analyzed anaphor resolution after role nouns carrying both grammatical gender cues and gender typicality in an eye-tracking study on German, (e.g., Oft hatte der Elektriker/die Elektrikerin gute Einfälle, regelmäßig plante er/sie neue Projekte. "Often had the electrician masculine/feminine good ideas, regularly planned he/she new projects."). In the condition of a mismatch between grammatical gender and gender typicality of the role noun results showed a mismatch effect not only on the anaphor region but also on the role noun region. The antecedent contained grammatical gender markings (either masculine or feminine ones), therefore the effect of the noun's gender typicality on anaphor resolution resulted from a combined processing of grammatical gender cues and typicality (see also Gygax et al., 2008;Irmen and Schumann, 2011).
A series of experiments conducted by Jäger et al. (2015), analyzed the online processing of reflexives in German and pronominal possessives in Swedish, by means of self-paced reading and eye-tracking methodology. The study focused on grammatical gender, conveyed through gender markings on role nouns (in German) or proper names (in Swedish). Materials presented an antecedent and a distractor, which could match or mismatch in gender (masculine/feminine). In contrast to previous studies, the results of these experiments showed no evidence for an online similarity-interference effect triggered by a gender overlap between the competitor role nouns. Only offline response accuracy to the comprehension questions in the self-paced reading experiment showed that the similarityinterference might have produced misretrievals of the distractors. These results suggest that the previously reported interference effects in reflexive processing may arise at the stage of retrieval rather than at the encoding stage.
The interplay of grammatical gender and gender typicality was further explored in a reading study on another grammatical gender language (Italian): Cacciari et al. (2011) investigated the resolution of personal pronouns in interaction with gender typicality. In the first part of each item, gender typicality was established through a context which described a typically male, female or neutral setting, for example "During the last Grand Prix of Formula One a terrible car accident provoked a crash close to the stands" (typically male context), or "Within the couple, scenes of jealousy were frequent but this time they came to blows and they got close to tragedy" (typically female context). In the second part of the item an epicene (a noun with a defined grammatical gender, but which can refer to both a male or female referent, e.g., vittima, male or female victim feminine ) or a bigender role noun (a noun which can function both as a feminine and a masculine noun, e.g., assistente, assistant) was introduced as antecedent for an anaphoric pronoun. The anaphor could match or mismatch the typical context and/or the grammatical gender of the epicene. Results showed that for bigender role nouns, which did not present a defined grammatical gender, the influence of gender typicality was essential to trigger the mismatch effect; however, when the antecedent was an epicene the grammatical gender of the role noun, even though purely formal, affected the resolution of the anaphor and interfered with the typicality effect.
The reviewed literature shows that role nouns can represent a useful tool to convey and investigate gender typicality. However, role nouns can preclude a direct comparison of natural gender languages and grammatical gender languages, because in grammatical gender languages personal role nouns are usually marked for grammatical gender and therefore carry an additional cue to referential gender, whereas in natural gender languages most role nouns are not morphologically marked. This causes different processes in the resolution of anaphors with role noun antecedents, for in grammatical gender languages readers are presented both with grammatical information and information from gender typicality, while natural gender languages mostly present only cues from gender typicality. The complex interaction between grammatical cues and gender typicality represents a challenge in investigating effects of gender typicality, since the grammatical gender of role nouns may compete with gender typicality cues in the representation of referent gender. To overcome this issue, the present study employs a paradigm which replaces role nouns with corresponding role descriptions, in order to convey the gender typicality of a role without presenting the role noun itself. In a study by Reali et al. (2015), a description-based paradigm was developed to study the effect of gender typicality on anaphor resolution in a grammatical gender language, while excluding grammatical cues of the antecedents. This research raised a further research question, namely a cross-linguistic comparison of cognitive processes occurring in a "naturalized" grammatical gender language (i.e., a grammatical gender language without grammatical gender cues) and those in a natural gender language. Even in the absence of grammatical gender cues in the materials, speakers of a grammatical gender language may process gender typicality cues differently from speakers of a language without grammatical gender. Evidence from studies with bilinguals suggests that readers may activate different cognitive representations of referent gender according to the language of the task they are engaged in, shifting gender representations when switching from a natural gender language to a grammatical gender language and vice versa (see Sato et al., 2013). Starting from these considerations, the present study analyzes the processing of gender typicality in a natural gender language and compares the resolution process with previous studies conducted on a grammatical gender language (cf. Reali et al., 2015).
Another research question concerns the degree of gender typicality of the items. Earlier studies employing the anaphor resolution paradigm usually relied on highly typical roles and thus excluded the majority of social and professional roles, which do not occupy extreme positions on the gender typicality scale. Therefore, the second experiment of the present paper focuses on effects triggered by roles with lower degrees of gender typicality and examines if role descriptions with moderate degrees of gender typicality are able to elicit expectations in the referent gender representation, thus producing a disruption in the reading process when the mismatching pronoun is encountered.
The present research employs the methodology of eyetracking, which provides high spatial and temporal resolution in mapping the process of anaphor resolution during reading.

EXPERIMENT 1
The aim of Experiment 1 was to analyze the effect of gender typicality on pronominal anaphor resolution with a description-based paradigm. Specifically, the paradigm employed descriptions of gender-typical occupational roles instead of role nouns to convey gender typicality. The absence of role nouns allows us to compare the processing of gender typicality cues in natural gender and grammatical gender languages.

Method
Participants Thirty-one students (17 women and 14 men) from the University of Sussex, UK, participated in the study. Participants were English native speakers, with normal or corrected-to-normal vision (mean age = 21 years, SD = 3.9). They received monetary compensation or course credit for their participation. Ethical approval for the study was granted by the University of Sussex's Research Ethics Committee and all participants provided written informed consent before taking part in the study.

Design and Hypothesis
The experiment was designed to test the interaction between the gender typicality of the occupational role (typicality: male, female, or neutral) and the gender of the anaphoric reference (pronoun: masculine or feminine). In accord with the German study (Reali et al., 2015) and earlier research using gender-typical role nouns, we expected a mismatch between gender-typical role description and anaphor gender to evoke longer fixation times and more frequent regressions compared to the matching and neutral conditions.

Materials
Materials were created to provide gender-typical information associated with different occupational activities without employing role nouns. The experimental sentences are based on the material of a study which had been conducted in German (Reali et al., 2015). In this previous study, a list of roles had been first selected from published collections of role nouns gender typicality ratings for different languages (Kennison and Trofe, 2003;Irmen, 2007;Gabriel et al., 2008). Then participants (30 women, 20 men, mean age = 23.1, SD = 4.1, students from the University of Heidelberg, Germany) estimated to which extent a specific professional role (e.g., primary school teacher) was held by men and/or women, using a 7-point scale with anchor points 1 = only men, 7 = only women, and 4 = same amount of women and men. Items (N = 77) were categorized as follows: male: ≤ 2.5, neutral: 3.5-4.5, female: ≥ 5.5. The same sample provided, through a written computer-based production task, a description of each role, on which the experimental items were based. These descriptions were then presented, in a paper-based questionnaire, to a new participant sample (N = 40, students from the University of Heidelberg), which had to guess the role nouns corresponding to the descriptions. This sub-test had the goal to check the correspondence between the role representation conveyed by the descriptions and the corresponding role nouns. Descriptions presenting less than 80% description-noun correspondence were discarded. This selection yielded 12 female, 12 male, and 12 neutral descriptions, to constitute the final material of 36 experimental items for the eye-tracking study. The last participant sample also rated the typicality of the final descriptions, which presented a strong correlation with the role noun rating (r = 0.995, p < 0.001). The differences between the three typicality conditions, calculated on the description typicality ratings (M male = 1.87, SD = 0.42, M female = 5.98, SD = 0.37, M neutral = 4.17, SD = 0.37) were statistically significant, male-female: t (22) = −30.23, p < 0.001; male-neutral: t (22) = −20.24, p < 0.001; female-neutral: t (22) = −18.99, p < 0.001. The pre-test procedure was fully conducted at the University of Heidelberg, Germany (see Reali et al., 2015). The resulting experimental material was translated and adapted to be employed for the present eye-tracking study.
Each experimental sentence consisted of a first part which described an occupation ("context"), and a second part containing a pronominal anaphor ("target sentence"). The personal pronoun ("he"/"she") referred back to the person presented in the previous context, which had been introduced with initials, as in examples (1) (male typicality), and (2) (female typicality): (1) K. L. installs power lines and cables, checks electricity voltage.
In this field he/she has a lot of experience.
(2) L. K. teaches at a primary school, instructs children in reading. At work he/she wears thick glasses.
The gender neutrality of the target sentences had been ensured through a rating pre-test. In order to keep the anaphoric pronoun in a comparable position across items, all target sentences had a fixed linguistic structure, with the anaphor positioned between an initial adverbial expression and the verb. In addition to the experimental sentences we presented 50 filler sentences containing descriptions of non-professional roles (e.g., moviegoer) and anaphoric expressions referring back to an inanimate object, to avoid drawing attention to the gender topic. Finally, we presented 24 content-related questions (e.g., "Is the lab coat green?") in order to promote attentive reading, leading to a total number of 110 trials (including experimental items, fillers and questions).

Procedure
Eye movements were monitored with a video-based head mounted eye-tracker (Eyelink II, sampling rate of 250 Hz, average accuracy 0.5 • ). Materials were presented with the software Eyetrack 1 on a 21-inch CRT computer screen, with an active screen size of 40 × 30 centimeters and a resolution of 1024 × 768 pixels. Participants were seated 70 cm away from the screen, at which distance 3 characters subtended approximately 1 • of visual arc. A chinrest was used to minimize head movements. Reading was binocular but only the dominant eye was tracked. The dominant eye was determined through the Miles test 2 .
The experiment began after a calibration procedure which was performed on a nine-point grid.
The presentation of sentences started with a small rectangle indicating the position of the first word of the sentence. The item appeared when the rectangle was fixated accurately. Whenever, the fixation on the rectangle was judged as inaccurate, recalibration was carried out.
To familiarize participants with the task, the experiment started with four practice trials, one of which was followed by a comprehension question. Then the experimental sentences and filler items were presented. Sentences were displayed in a monospaced 22-point Lucida Console font, in black characters on a light gray background and consisted of three lines, presenting a maximum number of 49 characters each. The first two lines contained the role description; the third line presented the target sentence with the anaphoric reference. Experimental items were presented in randomized order across participants. After reading an item, participants pressed a button on a keypad to prompt the next item or a question. Two buttons of the keypad were used for answering the comprehension questions.
As a follow-up procedure, participants completed a questionnaire asking for gender typicality ratings, on a 7point Likert scale, concerning the job descriptions that were presented in the eye-tracking session. The experiment lasted in total approximately 30-45 min.

Data Analysis
In order to investigate the effect of the priming context on the target sentence, we analyzed fixation times and regression patterns on different regions of the target sentences. The target sentence was divided into four regions of analysis: adverb region, anaphor region, spillover region, and final region. The segmentation into regions of analysis is shown in Table 1.
In order to reflect the processing of the text from early to late stages, data were analyzed for the following eye-tracking measures: first fixation time, first pass time, regression path time, total time, and probabilities of regressions into and out of a region. First fixation time is the duration of the first fixation in a given region. First pass time is the time from first entering a region of interest from the left until leaving it either to the right (i.e., moving forward in the sentence) or to the left. Regression toward their eyes, while fixating the point through the opening. At a close distance, in order to continue to fixate the point, the opening was drawn either in front of the left or the right eye, according to ocular dominance. path is the time from first entering a region until leaving it to the right, including the time for regressions from this region. Total time is the total amount of time spent in a certain region including re-reading, but not including regressions from this region. Regressions into and out of a region, respectively, consist of the proportion of backward movements into a specific region, or leaving the region to the left after a first pass fixation of the region (cf. Sturt, 2003;Boland, 2004). In general, longer fixation times and a higher probability of regressions are indicative of greater difficulty in processing the respective region. Initial stages of data analysis were carried out using the software EyeDoctor and EyeDry provided by the Department of Psychology at the University of Massachusetts Amherst. Short fixations (below 70 ms) were merged with neighboring fixations within three characters. Following Reali et al. (2015), we removed fixations below 70 ms and above 600 ms, as they can be assumed to be not representative of regular information acquisition during reading (4.1% of the data). The remaining data have been logarithmically transformed to meet the normality assumption for the following analyses. No significant difference emerged in the distribution of missing data across typicality conditions for all regions and fixation duration measures [M male = 74.00; M female = 74.19; M neutral = 69.06, F (2, 45) = 0.86, ns]. Analyses were based on linear mixed-effect modeling, implemented by the lmer function from the lme4 package (Bates et al., 2014) in R (R Core Team, 2012, version 2.15.2). We included in our models participants and items as random effects (see Baayen et al., 2008). As fixed effects for our models we selected the experimental factors that were assumed to influence the target sentence processing: gender typicality of the priming sentence (male, female, or neutral) and pronoun of the target sentence (masculine, feminine). In addition, we included region length (number of characters for each region of analysis) in all fixation duration measures (i.e., excluding regression measures), and participant gender, as fixed effects, since these factors could affect the reading processes, Model<-lmer [fixation_time ∼ typicality * pronoun * participant_gender * region_length + (1 |participants) + (1 |items)].
To systematically detect the best fitting model for each measure and region, we employed the step function available in lmerTest package (Kuznetsova et al., 2013), which was developed with the purpose of automatizing and standardizing the model building process. Starting from a fully specified model, step performs a backward elimination of both random and fixed effects that are not warranted by the data by conducting iterative model comparisons. The function is based on likelihood ratio tests and step-wise removal of non-significant fixed effect terms. Significant effects of pronoun, typicality and their interaction were further explored through contrast analyses. Pairwise comparisons tested each typicality condition followed by masculine and feminine pronouns (male-he vs. male-she; female-he vs. female-she; neutral-he vs. neutral-she).

Eye-tracking Results
The final models for each measure and region (including all significant random effects, fixed effects, and interactions) are reported in Supplementary Material (Table S1). Means and standard deviations of fixation duration time and percentages of regressions are reported in Table 2 3 . Details on statistical results are reported in Table 3. We report below eye-tracking measures presenting statistically significant fixed effects of typicality, pronoun, and typicality*pronoun (p < 0.05), and corresponding significant or marginally significant (p < 0.1) results of contrast analyses, separated for measure.

First pass time
The first reliable interaction effect between typicality and pronoun was detected in first pass time on the region immediately following the pronoun (spillover) 4 . Contrast analyses revealed that the effect was statistically significant only when the priming sentence was female, with congruent trials being read faster,

Regressions out of a region
The interaction between typicality and pronoun emerged in the proportion of regressions out of the last region of the target sentence. Contrast analyses showed a significant effect for the neutral condition, presenting less regressions in association with a masculine as compared to a feminine pronoun, M neutralHE = 8.1, M neutralSHE = 13.2, t (947) = −2.26, p = 0.02; M maleHE = 8.9, M maleSHE = 11.7, ns; M femaleHE = 14.8, M femaleSHE = 11.2, ns.

Total fixation time
The interaction between typicality and pronoun emerged on the spillover region. Pairwise comparisons revealed a significant effect for the female condition, but not for the male and neutral conditions, with shorter fixation time on congruent trials as compared to incongruent ones, M femaleSHE = 380, M femaleHE = 427, t (998) = 2.14, p = 0.03; M maleHE = 363, M maleSHE = 355, ns.; M neutralHE = 437, M neutralSHE = 437, ns. Furthermore, a main effect of participant gender emerged on the pronoun region. Contrasts revealed a tendency for female participants to read faster, M men = 355, M women = 316, t (30) = 1.86, p = 0.073.

Gender Typicality Ratings and Eye Movements
Typicality, ratings for Experiment 1 are reported in Supplementary Material (Table S2). Typicality ratings were based on the data collected in a previous study (see Materials section), from a sample which did not participate in the eye-tracking experiment. In order to investigate if eye movements reflected the extent of gender expectations, we conducted a by-item linear regression analysis with typicality ratings as predictors of eye movements. We selected the regions of analysis where the gender mismatch effect emerged. Since pairwise comparisons revealed an asymmetry between the male and female condition, we conducted separate analyses for the two anaphoric pronouns. Results revealed that typicality ratings predicted first pass fixation times after a masculine anaphor (β = 0.35, p < 0.05). As the scale for typicality ratings presented the poles 1 = male, and 7 = female, the β coefficient showed a direct correlation in the condition of the masculine pronoun, with lower ratings predicting shorter fixations after the pronoun he. This result indicates that fixation time on a region where the mismatch effect emerged corresponded to the degree of gender typicality expressed in the explicit typicality ratings of the respective items.

Follow-up Typicality Ratings
Follow-up typicality ratings were collected from participants immediately after completing the eye-tracking experiment. The follow-up ratings showed a high correlation with the pre-test ratings (r = 0.966, p < 0.001). However, male and female typicality turned out to be more skewed toward neutrality, so that typically male and particularly typically female occupations received less extreme ratings as compared to the pretest ratings,

Discussion
The study analyzed the effect of gender typicality cues on the resolution of a pronominal anaphor. As antecedents, the commonly used role nouns were replaced with role descriptions which contained only gender typicality cues to referent gender. The experiment was conducted in English, a language which does not possess a grammatical gender system. A main effect of pronoun emerged in regression path on the pronoun and spillover region, with the feminine pronoun receiving shorter fixation time than the masculine pronoun. This effect may suggest a general greater difficulty to integrate a male as compared to a female referent. However, it should be noted that this effect is limited to this time measure, therefore representing an isolated finding rather than a systematic pattern.
The interaction between gender typicality of the description and pronoun gender is in the focus of the study and emerged in measures representing different stages of processing. Results showed that a mismatch effect between the two factors occurred reliably in a measure of early processing on the region following the anaphoric pronoun. Moreover, this interaction was detected consistently in a measure of intermediate stage of processing (i.e., when participants regressed from the last region at the end of the target sentence to re-check the previously read sentence) and in one measure of late processing, namely the total amount of time spent on the pronoun spillover region. Furthermore, correlational analyses with gender typicality ratings showed that the typicality degree of the different items predicted the mismatch effect revealed by early fixation times, confirming the validity of the description paradigm as a tool to investigate gender typicality.
The location of the early mismatch effect is consistent with data from reading studies in English which employed role nouns as antecedents and personal pronouns as anaphors (Kennison and Trofe, 2003). The effect appears to be delayed in location and time in regard to studies employing reflexive pronouns to trigger the mismatch (e.g., Sturt, 2003). However, the effect cannot be compared directly because of relevant differences in sentence structure and paradigms used in the studies.
The present data can now be compared to a parallel study on German, where grammatical gender cues were avoided in the materials (Reali et al., 2015). Interestingly, in the German study the mismatch effect occurred earlier (in first fixations), on the pronoun region. Furthermore, in the German experiment the mismatch effect surfaced in two further measures (regressions in and total time) on the pronoun region itself. A possible explanation of the difference to the present findings concerns the presence or absence of grammatical gender in the two languages. The description-based paradigm served to keep the texts free of morphological gender cues in both languages. However, the processing of gender typicality cues may activate grammatical gender in the language with a grammatical gender system and thus cognitively facilitate the assignment of referent gender in the direction suggested by gender typicality. This would explain why the reference resolution process appears to be faster in the grammatical gender language. Previous eyetracking studies using plural role nouns as antecedents also may support the interpretation that grammatical gender cues make gender typicality cues more salient and speed up the eventual gender mismatch effect. For example, in an eye-tracking experiment with German material, Irmen (2007) employed a noun phrase as anaphor ("these men/these women"). When antecedents were masculine generics, the typicality mismatch effect appeared on the first word of the anaphoric phrase itself in first pass reading ("these"). In contrast, when the antecedents had the form of gender-unmarked role nouns (e.g., Alleinerziehende, single parents) the typicality mismatch effect fully emerged only in later measures on the spillover region.
A further point of discussion is the asymmetry for the male and female condition, revealed in the pairwise comparisons of the mismatch effect. Specifically, gender mismatch was reliable only for the female condition, which produced an impairment in the sentence processing when followed by a masculine pronoun. This asymmetry was reliable in early and later stages of processing, on the target sentence spillover. The asymmetry effect may be interpreted as indicative of readers' difficulty to integrate a male referent with the representation of a typically female occupation; in contrast, reconciling a female referent with a typically male professional role apparently required less cognitive effort. Moreover, regressions launched from the last region show that the neutral condition may be integrated more easily with a masculine rather than a feminine anaphoric pronoun. This finding may represent a wrap-up effect emerging at the end of the sentence, after all the available information presented in the text had been collected. In this case, it may reflect a generally easier integration for the masculine as compared to the feminine referent when no specific gender cue is available, as in the case of neutral context.
Finally, follow-up typicality ratings, collected immediately after the eye-tracking session, showed less extreme ratings as compared to the pre-test ratings, for the male and particularly for the female condition. This finding is surprising since it was the female typicality that triggered the significant mismatch effect. In other words, participants found it particularly difficult to associate the representation of a male referent to a female occupation in the online measure, while the explicit ratings show that the female roles were judged as partially suitable also for men. We believe that participants may have been primed with counter-stereotypical representations of the roles through the recent exposure to the eye-tracking stimuli. While the present experiment was not designed to determine such a priming effect, it is plausible to suspect such an effect after a task where participants had to perform the cognitive task to integrate a stereotypical gender context with the gender incongruent referent. As shown by the eye movement data, this task may have been particularly surprising and consequently more salient for the female condition, thus priming later, on the offline ratings, a more equal representation of the gender distribution in the typical occupational roles.

EXPERIMENT 2
Experiment 1 investigated the effect of typicality with the help of highly gender-typical items. However, the selection of such items excluded occupational roles in the range between gender-typical and neutral (see the Materials section for details). Therefore, the second experiment examines the following research question: Do occupational roles which are judged as slightly typical-but not as gender-neutral-affect the process of anaphor resolution? In other words, do readers develop a probabilistic cognitive expectation of referent gender when reading a description of roles with low gender typicality, such as psychologist or lawyer, which were rated as only slightly female and slightly male in the off-line measures?

Method
Participants Twenty-nine students (17 women and 12 men) from the University of Sussex, UK, participated in the study. Participants were native English speakers, with normal or corrected-tonormal vision (mean age = 21 years, SD = 2.4). None of them had participated in Experiment 1. They received monetary compensation or course credit for their participation. All participants provided written informed consent before taking part in the study.

Design and Hypothesis
The experiment was designed to test the interaction between the gender typicality of the occupational role (typicality: slightly male, slightly female, or neutral) and the gender of the anaphoric reference (pronoun: masculine or feminine). If stimuli with moderate degrees of gender typicality can elicit expectations on the referent gender, then a disruption in the reading process would emerge when the mismatching pronoun is presented. This disruption would result in longer fixation times and higher probabilities of regressions. No effect is expected with neutral priming stimuli.

Materials
Item structure was identical to the one used in Experiment 1. In Experiment 2, the priming context was constituted of slightly male, slightly female, or neutral occupational roles. The selection of the roles was based on the role noun pretest (see Materials section, Experiment 1). We selected items with role noun typicality ratings between 2.5 and 3.5 (slightly male), 4.5 and 5.5 (slightly female) and 3.5 and 4.5 (neutral) on a 7-point Likert scale for gender typicality, where 1 represented the pole of male and 7 the pole of female typicality (M s.male = 2.99, SD = 0.16, M s.female = 4.98, SD = 0.31, M neutral = 4.04, SD = 0.14). (3) and (4) are examples of a slightly male (3) and a slightly female (4) experimental item: (3) C. H. earned a degree in law after many years of study.
Nowadays he/she does mostly paperwork. (4) H. C. receives calls from many customers at the call-center.
Regularly he/she takes short breaks.
Participants were presented with 12 slightly male, 12 slightly female, and 12 neutral role descriptions. In addition, we randomly presented 50 filler sentences (the same items as in Experiment 1), and 24 content-related questions to promote attentive reading.

Procedure and Analysis
The experimental procedure with eye-tracking recordings and the analyses were identical to those in Experiment 1. No significant difference emerged in the distribution of missing data across typicality conditions for all regions and fixation duration measures [M s.male = 42.00; M s.female = 35.00; M neutral = 46.88, F (2, 45) = 1.01, ns]. The mixed-effect models included participants and items as random effects. As fixed effects we included typicality (slightly male, slightly female, neutral), pronoun (masculine, feminine), region length (in fixation duration measures) and participant gender, Model<lmer(fixation_time ∼ typicality * pronoun * participant_gender * region_length + (1 |participants) + (1 |items).

Eye-tracking Results
The final models for each measure and region (including all significant random effects, fixed effects, and interactions) are reported in Supplementary Material (Table S1). Means and standard deviations of fixation duration time and percentages of regressions are reported in Table 4. Details on statistical results are reported in Table 5. We report below eye-tracking measures presenting statistically significant fixed effects of typicality, pronoun, and typicality*pronoun (p < 0.05), and corresponding significant or marginally significant (p < 0.1) results of contrast analyses, separated for measure. Contrast analyses tested each typicality condition followed by the masculine and feminine pronoun (slightly male-he vs. slightly male-she; slightly femalehe vs. slightly female-she; neutral-he vs. neutral-she).

First fixation time
A main effect of typicality emerged on the second region of the target sentence. Pairwise comparisons between all the factor levels showed no reliable difference, M s.male = 191, M s.female = 186, M neutral = 186, ns.

First pass time
The

Regressions into a region
The interaction between typicality and pronoun emerged in regressions in the first region of the target sentence. Contrast analyses showed a significant effect for the female priming condition, where the congruent trials presented fewer regressions as compared to the incongruent ones, M s.femaleSHE = 1.6, M s.femaleHE = 2.5, t (978) = 2.48, p = 0.01. The effect was also significant for the male condition, with congruent trials presenting fewer regressions as compared to the incongruent ones, M s.maleHE = 2.4, M s.maleSHE = 3.5, t (978) = −2.14, p = 0.03. No effect was found for the neutral priming condition, M neutralHE = 2.1, M neutralSHE = 2.3, ns.

Regressions out
Regressions out of the last region showed a main effect of typicality. Pairwise comparisons revealed a smaller proportion of regressions for the neutral condition as compared to the slightly male condition, M s.male = 14.1, M neutral = 7.2, t (33) = −2.58, p = 0.01, as well as a tendency for the neutral condition to present fewer regressions as compared to the slightly female condition, M s.female = 11.2 M neutral = 7.2, t (33) = −1.75, p = 0.09. Probability of regressions did not differ for female and male conditions, M s.female = 11.2, M s.male = 14.1, ns.

Total fixation time
A main effect of participant gender emerged on the pronoun region. Contrasts revealed no significant difference, M men = 363, M women = 355, ns.

Gender Typicality Ratings
Typicality ratings for Experiment 2 are reported in Supplementary Material (Table S3). Follow-up typicality ratings correlated with the pretest ratings of the role nouns The mismatch effect found in eye movements did not correlate with explicit typicality ratings (β 's ≤ 0.07).

Discussion
Experiment 2 documents an effect of slightly gender-typical roles on the resolution of mismatching anaphoric personal pronouns, manifest in an early to intermediate stage of sentence processing. As in Experiment 1, gender typicality cues were conveyed through sentences describing a professional activity. In this experiment the occupations had been rated as only slightly typical for men or women, or as neutral. Still, slightly typical contexts were able to trigger the mismatch effect, as opposed to neutral priming trials. When description typicality and pronoun gender mismatched, readers regressed to the beginning of the target sentence, in order to re-check information and eventually resolve the gender conflict. The description-paradigm proved to be sensitive, showing that low degrees of typicality may evoke an impairment in the resolution process, and may thus be considered an adequate tool for investigating gender typicality, even when typical gender cues are too subtle to be categorized as "stereotypical." Differently from Experiment 1, in Experiment 2 the mismatch effect emerged in relation to both gender priming contexts. This may be explained by the fact that the second experiment presented slightly typical contexts, which may not produce a specific difficulty for the integration of the two gender conditions, as in the case of the integration of male referents in highly stereotypical roles. In other words, in the second study both gender priming conditions produced a reading impairment, as opposed to the neutral priming condition, in which integration with the pronoun did not prove problematic.

GENERAL DISCUSSION
The study presented a paradigm to investigate the effect of gender typicality on pronominal anaphor resolution without relying on role nouns as antecedents. Gender typicality was prompted through descriptions of occupational roles. Results showed that gender typicality was conveyed effectively, that it affected the process of anaphor resolution in both a condition of high (Experiment 1) and low (Experiment 2) degree of the priming gender context. Incongruence between gender typicality of the description and pronoun gender produced a mismatch cost, which was mainly located on the pronoun region and its immediate spillover for fixation duration measures, and at the beginning and ending of the target sentence for the regression measures. While in Experiment 1 the explicit ratings could predict eye movements, no correlation was found in Experiment 2.
Taken together, these results offer insight into the representational format of gender typicality beliefs. First, the results suggest that the cognitive process of correcting for and integrating the initial mismatching gender representation exhibited a different time course in the two experiments: a more complex repair strategy involving early and late stages of processing was applied in the case of highly typical items, whereas less typical items only affected an early to intermediate stage of sentence processing.
Second, the results suggest that the effect of gender typicality can have two different cognitive sources: gender typicality and gender stereotypes. Gender typicality refers to the cognitive representation of the proportion of men and women in certain occupational roles and can be measured through explicit ratings. Gender stereotypes are cognitive representations which associate an occupational role with a specific gender and may be implicit, i.e., may not be directly measurable through typicality ratings, but can be captured with indirect methods such as eye movements during reading. The cognitive dissociation between these two factors is evident in the results of Experiment 2, where items possessed a low degree of gender typicality. Based on explicit ratings, the roles (e.g., manager, politician) were not classified as gender-typical, but they still triggered a mismatch effect in the eye-tracking measures, due to an automatic association of the professional role with a gender stereotype. Therefore, we can conclude that the concept of gender typicality could actually be split into two cognitive components: an explicit one, which can be recorded through classical typicality ratings and corresponds to beliefs on the distribution of men and women in a specific field, and an automatic one, which is revealed with indirect methods and is stored in readers' longterm memory together with the semantics of the respective role. Furthermore, a cross-linguistic comparison with studies on grammatical gender languages suggests that the presence or absence of a grammatical gender system in the investigated language may play a key role in the processing of gender typicality cues, even when morphological/grammatical gender cues are not present in the text, but only cognitively available to the reader. More specifically, we argue that a grammatical gender system may make gender typicality cues more salient in comparison to a natural gender language. This is, however, open to debate [cf. Irmen and Rossberg, 2004;Gygax et al., 2008, on the relation between gender typicality and grammatical gender]. In a study employing a picture categorization paradigm in Italian and Spanish, Cubelli et al. (2011) show that grammatical gender is automatically activated, even if its retrieval is not required to accomplish the task. This consideration may suggest that gender information is already available in the cognitive representation of a reader possessing a grammatical gender system-even when no morphological markings are required for comprehension or presented in the stimuli-and trigger a faster processing of the gender mismatch.
Finally, a cross-linguistic comparison of the present study with grammatical gender language studies reveals a similar finding on the asymmetrical distribution of the gender mismatch effect, which had been previously reported only in studies on languages with a grammatical gender system (in Italian, Cacciari and Padovani, 2007;in German, Irmen et al., 2010). Specifically, pairwise contrasts in Experiment 1 revealed a significant effect in the condition of the masculine pronoun related to the incongruent female context, but no effect on the feminine pronoun related to the incongruent male context. In a study with event related potentials, Siyanova-Chanturia et al. (2012) document an N400-like effect for the masculine pronoun only, preceded by an incongruent typically female role noun (e.g., insegnante-lui). The N400 is assumed to represent a violation in semantic expectations, which is also at the basis of the gender mismatch asymmetry effect in eye movements. Our findings in English supports the crosslinguistic evidence that gender stereotypes may affect the processing of masculine and feminine anaphors differently. Socio-psychological theories on expectations related to gender roles may be required to explain this effect, as it may not only be due to the features of a particular gender system. However, further comparative studies and replications are necessary to determine the exact role of the gender system of a reader's language on the interpretation of gendertypical cues and its interaction with the process of anaphor resolution.

ACKNOWLEDGMENTS
The research was supported by the European Community's Seventh Framework Programme (FP7/2007(FP7/ -2013 under grant agreement 237907. We are grateful to Alan Garnham and Jane Oakhill for their support at the time of data collection at the University of Sussex, UK. We also would like to thank David Tobinski for his help and valuable suggestions.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01607