On Coding Non-Contiguous Letter Combinations

Starting from the hypothesis that printed word identification initially involves the parallel mapping of visual features onto location-specific letter identities, we analyze the type of information that would be involved in optimally mapping this location-specific orthographic code onto a location-invariant lexical code. We assume that some intermediate level of coding exists between individual letters and whole words, and that this involves the representation of letter combinations. We then investigate the nature of this intermediate level of coding given the constraints of optimality. This intermediate level of coding is expected to compress data while retaining as much information as possible about word identity. Information conveyed by letters is a function of how much they constrain word identity and how visible they are. Optimization of this coding is a combination of minimizing resources (using the most compact representations) and maximizing information. We show that in a large proportion of cases, non-contiguous letter sequences contain more information than contiguous sequences, while at the same time requiring less precise coding. Moreover, we found that the best predictor of human performance in orthographic priming experiments was within-word ranking of conditional probabilities, rather than average conditional probabilities. We conclude that from an optimality perspective, readers learn to select certain contiguous and non-contiguous letter combinations as information that provides the best cue to word identity.

discarding their irrelevant and idiosyncratic details (generalizing to symmetrical or mirror stimuli is often, but not always, useful in this respect). In the domain of reading, because letter size, font, and case are usually irrelevant, the cognitive system learns to devise abstract representations of letters devoid of size, font, and case information (Petit et al., 2006;Chauncey et al., 2008;Grainger, 2008; see also Carreiras et al., 2007). However, a key question driving current research on printed word perception concerns the nature of the orthographic code that enables location-invariant recognition of printed words. This is the focus of the present study.
In the present analysis, we need to make three assumptions in order to get started. The first is that visual word recognition in languages that use an alphabetical script is essentially letter-based. That is, we assume that the printed word is recognized via its constituent letters as opposed to some more holistic word-shape representation (e.g., Pelli et al., 2003;see Grainger, 2008, for a summary of the arguments). The second assumption is that the simple visual features that make up the different letters of the alphabet are coded retinotopically in visual cortex (see Tootell et al., 1998, for a review of retinotopic coding in human visual cortex). This suggests that the mapping of such features onto a string of letters must retain a certain amount of retinotopicity (Dehaene et al., 2005;Dehaene, 2007). On the other hand, although there is a processing cost associated with shifts in word position relative to eye fixation (e.g., Stevens and Grainger, 2003), printed words can be recognized at different locations. Finally, we assume that letter information is input to the cognitive system in a parallel fashion during an eye fixation (Slowiaczek and Rayner, 1987). With letter data available

IntroductIon
When trying to understand the functioning of cognitive architectures, important insights can be obtained by studying characteristics of the task and the constraints they place on the processing system. Rational Analysis (Anderson, 1990;Oaksford and Chater, 1998a,b) aims at studying such characteristics and constraints, taking the standpoint that the brain optimizes processing to perform near optimality given limited cognitive resources. A number of mechanisms, such as heuristics and biases, can be interpreted as optimizing performance under minimal resources. For instance, the inferotemporal neurons of monkeys trained to recognize wireframe objects generalized their responses to stimuli rotated by 180° around the vertical axis (Logothetis and Pauls, 1995). Similar biases and tendencies to spontaneously generalize to symmetrical or mirror stimuli can be found across the cognitive system (e.g., Freyd and Tversky, 1984;Chater and Vitányi, 2003;Feldman, 2003;Chater and Brown, 2008). For instance, children learning letter identities typically go through a mirror stage in which they can write indifferently in both directions (Walsh and Butler, 1996). In this line, Dehaene et al. (2010) have recently presented behavioral and fMRI data showing that mirror-invariance is present for pictures in a masked mirror priming paradigm, while this was not the case for words, under the assumption that once a certain level of reading experience is attained, mirror writing and/or reading is no longer an automatic spontaneous process.
An important function of the process of learning to perform a particular cognitive activity is to build compact and general representations of stimuli that condense important information while in parallel, the cognitive system can easily extract or code for letter combinations. So, what is the nature of the orthographic code that enables location-invariant recognition of printed words?
Today, there is a general consensus that the literate brain implements some form of word-centered, location-independent orthographic coding such that letter identities are coded for their position in the word independently of their position on the retina, at least for words that require a single fixation for processing. This consensus also extends to the principle that such within-word position coding of letter identities is flexible and approximate -in other words, letter identities are not rigidly allocated to a given position (Whitney, 2001;Grainger and van Heuven, 2003;Gomez et al., 2008;Davis, 2010). Key evidence in favor of such approximate, flexible coding has been obtained using the masked priming paradigm. For a given proportion of letters shared by prime and target, priming effects are not affected by small changes of letter order (transposed-letter priming: Perea and Lupker, 2004;Schoonbaert and Grainger, 2004), and length-dependent letter position (relativeposition priming: Peressotti and Grainger, 1999;Grainger et al., 2006). Furthermore, using a combination of masked priming and EEG recordings, Dufau et al. (2008), Carreiras et al. (2009), andGrainger andHolcomb (2009) have studied the time course of such flexible orthographic processing. They found that, by the time the ERP components were found to be sensitive to flexible orthographic priming, they were no longer sensitive to whether or not prime and target stimuli were spatially aligned.
In the present work we start from the further assumption that flexible orthographic processing is achieved by coding for letter combinations (Whitney and Berndt, 1999;Whitney, 2001;Grainger and van Heuven, 2003;Dehaene et al., 2005). Our study does not aim to pit this particular type of coding scheme against alternative coding schemes that do not use letter combinations, such as the coding schemes implemented in the SOLAR model (Davis, 2010) or the overlap model (Gomez et al., 2008), although this is indeed an important goal for present and future research (see Davis, 2010, for a recent comparison of these different theoretical approaches). Given the popularity of the general notion of n-gram coding (especially bigrams and trigrams), and the almost exclusive use of contiguous n-grams before the pioneering work of Mozer (1987) and Whitney and Berndt (1999), it seemed important to analyze the arguments behind this theoretical shift to including non-contiguous letter combinations in n-gram coding. We present and analyze these arguments within the specific framework of Grainger and van Heuven's (2003) parallel open-bigram model, and although some of the arguments are shared with Whitney's (2001) serial open-bigram model (the SERIOL model), a detailed analysis of this particular approach is beyond the scope of the present work (see Whitney, 2008;Whitney and Cornelissen, 2008, for more information about this specific approach). Instead, we focus on two distinct perspectives within Grainger and van Heuven's parallel open-bigram approach, which we now describe.
The Grainger and van Heuven (2003) proposal is illustrated in Figure 1. In this model, location-invariant orthographic coding occurs at what is referred to as the "relative-position map." The relativeposition map abstracts away from absolute letter position and, instead, focuses on relationships between letters. The first stage of processing in this model is the parallel mapping of visual features onto letter identities at a specific horizontal location with respect to eye fixation (the alphabetic array). This location-specific coding of letter identities is then transformed into a location-invariant prelexical orthographic code (the relative-position map) before matching this information with whole-word orthographic representations in long-term memory.
The key distinguishing feature of this specific approach is that flexible orthographic coding is achieved by coding for ordered combinations of contiguous and non-contiguous letters (see Mozer, 1987;Whitney, 2001, for earlier proposals along the same lines). Grainger and van Heuven (2003) proposed open-bigram coding as one means of achieving the gain in flexibility that is necessary in order to capture empirical findings that standard bigram coding schemes cannot account for. As already pointed out by several authors (see Grainger, 2008, for a review), standard bigram and trigram coding schemes cannot capture some basic empirical phenomena, such as the transposed-letter and relative-position priming effects mentioned above. In the present theoretical note we provide an exploration of Grainger and van Heuven's open-bigram coding scheme using the general framework of rational analyses of human information processing. More specifically, we examine the utility of coding for non-contiguous as well as contiguous letter combinations within this general approach to orthographic processing, over and above the fact that this confers the flexibility that is necessary to account for key empirical results.
When we begin to think about how coding for non-contiguous letters might emerge in a system that starts with parallel locationspecific letter encoding, then two different perspectives emerge. On the one hand, it has been proposed that the coding of noncontiguous letter combinations arises because of noisy position coding in location-specific (retinotopic) letter detectors (e.g., Dehaene et al., 2005). Within this particular perspective, the system aims to code for contiguous letters combinations, and ends up coding for non-contiguous elements as well, because of positional errors (see Grainger et al., 2006, for a specific implementation of letter combinations is easier (i.e., less complex) in that less precise information about letter positions is required compared with a system that codes for contiguous letter combinations. Put differently, it is clear that relative-position information (A is before B) is more reliable when A and B are farther apart. In other words, non-contiguous letter combinations might provide a more robust relative-position code than contiguous letter combinations.
In order to test this proposal, in what follows we first describe the corpuses of words on which our analyses were based. Second, we provide empirical data concerning letter-in-string visibility. Third, we describe how conditional probabilities are computed under this visibility constraint. We then provide mathematical analyses of the information carried by different types of letter combinations, providing a measure of a bigram's informativeness with respect to a given word, and a ranking of bigrams within a word in terms of their relative informativeness. Finally, we provide empirical tests of these measures by examining to what extent they can capture key patterns of data obtained in orthographic priming experiments.

MaterIals and Methods corpuses and word lengths
The current study covers three languages: French, English, and Spanish. In all three languages, analyses are based on lemma forms of words composed of letters only (i.e., lemmas that included hyphens, spaces, etc., were excluded) and of lengths between five and seven letters. Words between five and seven letters were selected because visibility data was available for such lengths (see section below), or could be interpolated 2 . As a French lexicon, we used the Lexique (version 3.55) database (New et al., 2004). It contains 125,653 words (including 46,942 lemmas) along with information such as phonology, lexical category, and lexical neighborhood. After selection based on the criteria above, 11,811 words were chosen. As an English lexicon, we used the Celex database (Baayen et al., 1993). It contains 50,591 lemma forms. After selection based on the criteria above, 18,275 words were selected. As a Spanish lexicon, we used the BuscaPalabras database (Davis and Perea, 2005) which is a subset of the Spanish lexicon called LEXESP (Sebastián-Gallés et al., 2000). It contains 31,491 lemmas. After selection based on the criteria above, 11,394 words were selected.

letter-In-strIng vIsIbIlIty
An important aspect of our analyses is the use of realistic letter-instring visibility constraints in the computation of conditional probabilities, along with a comparison with a control condition in which all letters are equally visible. In fact, empirical evidence suggests that not all letters are equally visible when an eye fixation is made on a word. It is possible that two constraints play a role. First, visual acuity is best at the center of the fovea, and degrades as eccentricity increases. Thus, the letter at fixation is more visible. Furthermore, outer letters of words are more visible, which may be explained by the fact that crowding is reduced for outer letters compared to inner letters (e.g., Tydgat and Grainger, 2009;Grainger et al., 2010).
Empirical data on letter-in-string visibility used here are from Stevens and Grainger (2003). For the present theoretical note, we assume that fixations occur on the central position this mechanism referred to as the "overlap open-bigram model"). Therefore, within this perspective, there is no need to justify the coding of non-contiguous letter combinations, they simply arise by accident, and accidentally confer the additional flexibility that allows the coding scheme to capture key data patterns. Here we take a very different perspective, proposing on the contrary that the coding of non-contiguous letter combinations is deliberate, and not the result of inaccurate location-specific letter coding. In other words, noncontiguous letter combinations are coded because they are beneficial with respect to the overall goal of mapping letters onto meaning, and not because the system is not accurate enough to determine the precise location of letters. Within this particular perspective it is therefore important to ask why the coding of non-contiguous letter combinations might have developed during reading acquisition in order to optimize orthographic processing. To answer this question, we first perform analyses of the mathematical dependencies and regularities in real corpuses of words, and the constraints they place on the process of lexical identification. We then apply these analyses to explain some experimental results. These analyses also take into consideration visual constraints on the quality of the information being processed. Before presenting these analyses, we first describe the overall methodology employed in the present work.

optIMal letter coMbInatIons for word recognItIon
Here we apply a rational analysis to Grainger and van Heuven's (2003) model of orthographic processing, in order to understand why a biological system learning to read a language with an alphabetical orthography would code for non-contiguous letter combinations. Expressed in terms of Anderson and Milson's (1989) methodology for deriving rational analyses of cognitive processes, we define one key goal of the word recognition system as the mapping of location-specific letter representations onto location-invariant whole-word orthographic representations. This, of course, represents one (critical) sub-goal of the overall task of mapping visual information onto meaning. A key characteristic of the environment that constrains the optimization of this mapping is letter-in-string visibility (see section below). Also, inputs available to the reading system vary in informativeness or diagnosticity. The probability of encountering a unit (letter or bigram) is a function of its frequency. Rare (low probability) events are more informative than frequent (i.e., predictable) events (c.f., information theory: Shannon, 1948) We therefore propose three reasons why a biological system learning to read a language with an alphabetical orthography would deliberately code for non-contiguous letter combinations. The first has to do with letter visibility: the most visible letters -the letter at fixation, the first and the last letter of a word -are non-contiguous (Stevens and Grainger, 2003), see section below for details. These letters are more easily and reliably identified. The second concerns the positions of letters in a word that are the most informative with respect to lexical identity. We investigate whether a system that tries to minimize letter-level processing while maximizing information would be well advised to code for non-contiguous letter combinations. The third possible reason is that coding for non-contiguous 1 It should be noted that the use of diagnostic information is thought to be only one part of the complete set of processes involved in skilled word reading (Grainger and Ziegler, 2011). 2 We considered other word lengths to be unwarranted because they would have required an extrapolation of the available visibility data.
Extending Stevens and Grainger's (2003) method applied to single letter frequencies, we approximated the conditional probability of a word given a bigram. We compare two visibility conditions: perfect visibility and realistic visibility.
Under perfect visibility, both letters are optimally visible and the equation is as follows: Under realistic visibility, the visibility constraints are taken into account, as follows: where vis 1 and vis 2 are the visibility of letters constituting the bigram, as they appear in the target word. For example, for the word SILENCE, bigram SL contains letter S in position 1 which has a visibility vis 1 of 0.786, and letter L in position 3 has a visibility vis 2 equal to 0.708 (see Figure 2).

results analysIs 1: InforMatIon values of letter coMbInatIons -MInIMal substrIngs to IdentIfy words
For a reading system aiming at the compression of data while maximizing the information retained, is it beneficial to code for non-contiguous letter combinations? To answer this question, we identified the shortest ordered (but not necessarily contiguous) letter subset that uniquely identifies words. These ordered letter subsets can be seen as compressing input data by dropping letters that are not critical to the identification of the target word, defining in a sense the optimal abbreviation of a word. For example, the word "fatigue" can be uniquely identified by ordered letter substrings "ftge" and "atge." The analysis consists in counting how many contiguous letter pairs of the substring are formed of contiguous and non-contiguous letters in the baseword. For example, the sequence "atge" from the baseword "fatigue" contains one contiguous letter pair ("at"), and two non-contiguous pairs ("tg" and "ge"). When multiple ordered letter subsets of the same length existed (for example, there were two sequences of four letters for word "fatigue"), all these sequences were considered in the analysis. Note that, for Analysis 1 only, we further selected words that were not strict letter subsets of other words (example "table" is a strict letter subset of "tableau") because, for such words, any substring (including the word itself) also identifies words composed of supersets of letters 3 . Of the total words reported in the Section "Corpuses and Word Lengths," 85.5% of the French words (N = 10,129) were selected for analysis under this criterion, as well as 74.9% of the English words (N = 13,696) and 87.9% of the Spanish words (N = 10,012).
(e.g., letter position number four in a string of seven letters), which is a useful first approximation of actual distributions of fixation positions (Rayner, 1979). Stevens and Grainger (2003) presented uppercase letter targets embedded in strings of uppercase Xs (e.g., XXTXXXX) forming strings of either five or seven letters. These stimuli were presented briefly and were preceded and followed by a pattern masking stimulus. Participants first focused on a central fixation point and then had to identify the letter in the string that was not an X. Figure 2 presents the visibility graph for seven-letter strings. Data also exists for five-letter strings, and, for our analysis, visibility values for six letters were interpolated from empirical values for five and seven letters. It should be noted that similar patterns of letter visibility have been found when strings are composed of a random series of consonants, and post-cueing is used to indicate the location for letter identification (e.g., Tydgat and Grainger, 2009).

coMputatIon of condItIonal probabIlItIes
Analyses 2 and 3 examine the information carried by different ordered pairs of letters (i.e., bigrams) that can be contiguous or not in the target string. For example, for the word TABLE, bigrams TA, AB, BL, and LE are contiguous, whereas TB, TL, TE, AL, AE, and BE are non-contiguous. Unless specified otherwise, the analyses consisted of comparing contiguous and non-contiguous bigrams. Note  Note that this recognition curve is based on data collected in French.

3
The subset-superset relationship described is only for selected words of 5-7 letters, and not for all words in the language. for the word, and therefore the more informative that bigram is. We wanted to determine if the most informative bigram for a word tended to be contiguous or not.
We computed the conditional probability of words given bigrams, p (word | bigram) and we examined the distribution of the positions occupied by the letters that composed the most informative bigrams. For this analysis, we computed conditional probabilities based on all words, but we selected only the words without any repeated letters as test words. In this way we could unequivocally assign a letter in a bigram to a precise position in the word, and therefore know whether the bigram in question is contiguous or not. The number of words included in the test set were 5838 in French, 8412 in English, and 4750 in Spanish. Proportion of letters at each position in words of a given length (5-7 letters) that appear in the most informative bigram of words are provided for French, English, and Spanish (see Table 1 for the perfect visibility condition, and Table 2 for the realistic visibility condition). The results show that the 1 st letter of words occurs more often in the most informative bigram than any other letter position, while the last letter occurs the least often. More important for the present purposes is that, after initial letters, it is letters located near the center or the left-of-center of words that occur most often in the most informative bigram, such that the non-contiguous bigrams 13 and 14 are most often the most informative bigram in words of lengths 5-7 letters. This result follows logically from the fact that letter-frequency typically shows a serial position function with the We found that the mean percentage of non-contiguous letter pairs in minimal substrings were 48.4, 47.9, and 47.8% for the words selected from the French, English and Spanish corpuses, respectively. While consonants constitute 55.2, 62.8, and 53.0% of letters in the French, English, and Spanish corpuses, they formed respectively 59.2, 65.4, and 57.4% of letters in the minimal substrings. Given that consonants occur less frequently than vowels, they carry more information, and so their over-representation is indeed reflected in greater information carried by these elements. Analysis 3a will further explore informativeness of consonants and vowels. Thus, a system that (a) computes letter combinations and (b) tries to extract the minimal set of letters that uniquely identifies a given word, should pay attention to non-contiguous letters. In other words, the system would be performing sub-optimally if only contiguous letter combinations were taken into consideration.

analysIs 2: condItIonal probabIlItIes
The second analysis involved comparing how informative noncontiguous bigrams are compared with contiguous bigrams 4 . The higher the conditional probability, the more evidence a bigram is As a step common to both analyses, we first computed the conditional probability of each word in some realistic corpus given their constituent bigrams (as in Analysis 2), and then ranked bigrams by decreasing conditional probability (see example below taken from an experiment in Spanish). Table 3 presents an example for the target word DOPAJE.
For the first analysis, we simply took the average of the conditional probabilities of the target word given the bigrams in the prime. In the example above, we obtained conditional probabilities of 0.0067 for PJ, 0.0053 for DJ, and 0.0035 for DP, and thus an average of (0.0067 + 0.0053 + 0.0035)/3 = 0.0052.
For the second analysis based on within-word ranking, we sought what ranks these prime bigrams had in the target word, and computed informativeness as the inverse of rank (1/ rank). We computed an overall informativeness measure for the prime by averaging the three individual values of inverse ranks. The inverse of ranks measure is indicative of priming: the lower the rank, the more informative the prime. In the example above, PJ has a rank of 1, DJ has a rank of 2, and DP has a rank of 4, which yields an informativeness measure of 1/3 * (1/1 + 1/2 + 1/4) = 0.58. lowest value for initial letters, followed by positions 3 and 4 (e.g., Grainger and Jacobs, 1993), such that it is a combination of the positions with the lowest letter frequencies that carry the most information. Overall, this analysis shows that the most informative bigrams in words are non-contiguous bigrams (on average 78% in French, 83% in English, and 78% in Spanish), and adding a visibility weighting does not change this pattern much.

analysIs 3: orthographIc prIMIng
In the following analyses, we investigated how conditional probabilities of letter combinations could explain some key patterns of orthographic priming effects, and in particular (a) effects of consonant vs. vowel status of letters shared by primes and targets, and (b) effects of position of letters shared by prime and target. Note that these effects arise over and above the effects driven by amount of orthographic overlap between primes and targets. The hypothesis guiding our investigation is that the more informative the primes are with respect to target identity (as measured by conditional probabilities), the larger the priming effects will be. We performed two analyses based on bigram conditional probabilities: the first one using the average of conditional probabilities and the second one using the within-word ranking of these probabilities. The conditional probabilities are calculated for primes and targets tested in experiments with human participants, and the values compared with the priming effects found in the experiments. altogether summed 17.16% of letter occurrences in Spanish (mean frequency in percentage of occurrences: 1.91%). The letters that formed the high-frequency consonant subset were [t,n,c,r,s,l], summing 35.68% of letter occurrences (mean frequency in percentage of occurrences: 5.95%). The vowels subset accounted for 45.63% of the letter-frequency distribution (mean frequency in percentage of occurrences: 9.13%). These figures are reflected in the conditional probabilities computed on the basis of letters (see Figure 3): conditional probabilities of high-frequency consonants and of vowels are relatively well-matched when calculated at the level of individual letters. In their study, Duñabeitia and Carreiras (2011) showed that masked relative-position priming is found for primes consisting of consonants of high or low frequency, but that vowel relative-position masked primes did not represent any processing benefit as compared to the corresponding controls. According to what was also shown in other experiments reported in that study (see also Carreiras et al., 2009), vowels did not yield significant relative-position priming effects, while both high-and low-frequency consonants led to reliable priming effects that were comparable in magnitude. Hence, the authors concluded that frequency of the letters is not the underlying factor that determines whether relative-position priming is found. Rather, Duñabeitia and Carreiras (2011) hypothesized that the consonant-vowel processing difference relies on the main role of each of these letters. While the role of consonants is related to lexical access by constraining the lexicon, the main role of vowels is related to the identification of properties of the syntactic structure and the rhythmic class (see also Bonatti et al., 2005;Mehler et al., 2006;Pons and Toro, 2010). Across many languages, consonants are more numerous than vowels, and consequently consonant combinations are by default less frequent than vowel combinations, leading to a higher lexical constraint imposed by consonants as compared to vowels. In the present theoretical note, we focused on the key contrast between high-frequency consonant and vowel primes. Empirical results in human participants are presented in Figure 4. We tested the hypothesis that, despite the fact that the materials used by

Analysis 3a: letter type (consonants vs. vowels)
Consonants and vowels are known to play different roles in language processing, as shown for example in the processing of continuous speech (Bonatti et al., 2005). Differences between consonants and vowels might stem from the different lexical constraints they impose; for instance, vowel and consonants may activate different numbers of lexical candidates (Carreiras et al., 2009). Consonants tend to be more indicative of word identity than vowels are, as we have seen in Analysis 1. This advantage for consonants can be attributed, at least in part, to differences in frequencies. Vowels being more frequent, they are thus less informative. However, Duñabeitia and Carreiras (2011) have recently shown that even when vowels and consonants are matched for frequency and various measures of lexical constraint, consonant primes are still more effective than vowel primes. They concluded that consonants and vowels have distinct properties over and above frequency.
More specifically, Duñabeitia and Carreiras (2011) tested relative-position priming using the masked priming paradigm with a lexical decision task. Relative-position priming constitutes a specific form of subset priming in which a masked prime is made of some of the letters of the target word, preserving their relative ordering within the string (e.g., CSN from CASINO; see Grainger et al., 2006). In one of their experiments, targets were six-letter Spanish words, while the three-letter primes were either related to the target (e.g., prime = DPJ and target = DOPAJE) or unrelated (e.g., prime = MBZ and target = DOPAJE). The latter condition acted as a control to establish baseline response times for the targets (see Figure 4). Critically, the three letters of the related primes could be formed uniquely of vowels, high-frequency consonants, or low-frequency consonants (according to the number of appearances of each letter in the Spanish lexicon). The three critical letters that constituted the related primes were in all cases the first, third, and fifth letters of the targets. In further detail, the letters selected for the low-frequency consonant set were [b,d,j,m,v,z,g,f,p], which  two levels (realistic and perfect) and letter type as an independent factor with two levels (vowels and frequent consonants). Both measures of informativeness were log-transformed prior to the analyses. For the ranks-based measure of informativeness, we found a main effect of letter type [F(1,94) = 111, p < 0.001], a main effect of visibility [F(1,94) = 29, p < 0.001], and an interaction [F(1,94) = 16, p < 0.001]. A similar pattern was observed for the average conditional probability measure of informativeness: a main effect of letter type [F(1,94) = 16, p < 0.001], a main effect of visibility [F(1,94) = 5348, p < 0.001], and an interaction [F(1,94) = 27, p < 0.001]. The pattern of results was consistent for both realistic and perfect visibilities: informativeness was higher for consonant primes than vowel primes when calculated using both the rankbased measure and the average conditional probability measure.
The explanation of Duñabeitia et al.'s data, which was based on the lexical constraint imposed by relative-position consonantal masked primes as compared to the constraint imposed by vowel primes, can be effectively modeled in terms of the informativeness of the bigrams that formed the primes. Although frequencies of letters were effectively matched as measured by their frequency of occurrence in the Spanish corpus, high-frequency consonants still had an advantage over vowels when considered at the level of letter combinations. This effect may be partly due to the within-word ranks of letter positions of the letters that form the bigrams, as evidenced by the control condition. Recall that the primes, composed of letters 1, 3, and 5, corresponded to two categories (letter 1 is the first letter of the word, whereas letters 3 and 5 are inner letters). Since most of the letters of the word consist of inner letters, it is possible that the first letter of the word drives the difference. Namely, the fact that consonants tend to be more often among the most informative letters for the word than vowels at the first position. Thus, this analysis based on combinatorial probabilities of open-bigrams provides computational confirmation of the idea suggested by Bonatti et al. (2005) and Duñabeitia and Carreiras (2011), among others, who claimed for a higher lexical information value of consonant combinations as opposed to vowel combinations.

Analysis 3b: effect of the central letter on priming
We next investigated if bigram conditional probabilities could explain some experimental results on relative-position priming. In an experiment involving masked priming of seven-letter French Duñabeitia and Carreiras (2011) were well-matched in terms of individual letter frequencies (as well as for lexical frequency and number of orthographic neighbors of the target words), the two critical prime conditions (high-frequency consonants and vowels) were not equivalent when taking into account the information carried by letter combinations. Our hypothesis was that observed results could be explained, at least in part, by the fact that informativeness of high-frequency consonants was significantly greater than informativeness of vowels when calculated at the level of letter combinations.
Using the primes and targets of the Duñabeitia and Carreiras (2011) study, we computed average conditional probabilities of the target words given the open-bigrams (contiguous and noncontiguous) of their respective three-letter primes. These informativeness values are shown in Figure 5. As we can see, the pattern of informativeness measures is consistent with human response times: more information in high-frequency consonant primes results in greater priming effects.
We next tested the results of the two informativeness measures (inverse of rank measure and average conditional probability) using two-way mixed ANOVAs with visibility as a repeated factor with  visibility as a repeated factor with two levels (realistic and perfect) and the presence of the central letter as an independent factor with two levels (with and without). Both measures of informativeness were log-transformed prior to the analyses.
The analysis of the inverse ranking measure revealed a trending main effect of the presence of the central letter [F(1,298) = 3.8, p = 0.052], a strong effect of visibility [F(1,298) = 21, p < 0.001], and an interaction [F(1,298) = 50, p < 0.001]. Rank-based informativeness was higher for primes that included the central letters compare to those that did not, and the interaction shows that the advantage of primes with the central letter was larger in the realistic visibility than the perfect visibility condition. Guided by the interaction, we verified that presence of the central letter had indeed a significant impact in the perfect visibility condition only [F(1,299) = 6.6, p = 0.011]. Thus, results with realistic visibility are consistent with experimental results on priming in humans: given an equal number of letters as primes, those that contain the central letter are more effective.
For the informativeness measure based on average conditional probabilities, the ANOVA revealed a strong effect of visibility [F(1,298) > 10,000, p < 0.001], an interaction [F(1,298) = 180, p < 0.001], but no effect of the presence of the central letter words using five-letter primes, Granier and Grainger (unpublished data) investigated the effect of primes on response times in a lexical decision task. Primes were selected such that they either included the central letter of the target word or did not. Granier and Grainger found better priming for primes that contained the central letter compared to those that did not. Our hypothesis is that better priming can be explained by larger informativeness when primes include the central letter. More specifically, Granier and Grainger (unpublished data) tested sixty 7-letter target words in two sub-experiments each containing four prime conditions. For one sub-experiment the four priming conditions were: 13457, 12457, 13467, ddddd; and for the other sub-experiment: 13457, 12367, 12567, ddddd. Numbers indicate the positions (out of seven letters) in targets of the letters selected to form the primes, and "d" stands for a letter that is not in the target. For example, for the word SILENCE, the prime 12367 would be formed by the letters SILCE. Prime-target pairings were counterbalanced across four lists associated with four independent groups of participants in each sub-experiment. 52 participants were tested in the first sub-experiment, and 56 in the other subexperiment. Standard masked priming procedure was used with the lexical decision task and 50 ms prime durations (see Grainger et al., 2006, Experiment 1, for a description of the procedure that was used). The results revealed a dissociation between priming effects obtained with primes containing the central letter of the target (primes 13457, 12457, 13467) compared with primes that did not contain the central letter of the target (primes 12367, 12567). Priming effects measured against the unrelated prime condition (ddddd) were significant for the former and non-significant for the latter.
In order to test our account of this central letter effect, we computed average conditional probabilities of target words given the open-bigrams (contiguous and non-contiguous) of their respective five-letter primes. Because the present study focuses on the presence of the central letter, data were aggregated over conditions containing the central letter (that is, primes 13457, 12457, 13467) and conditions that did not (primes 12367, 12567). Empirical results in human participants are shown in Figure 6, and results of the informativeness analysis in Figure 7.
Next, for the bigram-based measure, we performed two-way mixed ANOVAs of the two measures of informativeness (inverse of within-word ranking, and global conditional probability) with about half of the letters in the resulting subset are non-contiguous. Second, the most informative pair of letters in a word is a noncontiguous combination in 83% of 5-7 letter words (having no letter repetition) in English, and 78% in French and Spanish. In the second part of the paper, we saw that the superiority of the central letter in relative-position priming can be explained by rational analyses based on maximizing conditional probabilities of words given bigrams. Also, we reanalyzed proposed evidence that consonants and vowels play a different role during word recognition. We re-examined the finding that consonants form better primes in masked relative-position priming even when matched for letter frequencies with vowels. When considering consonants and vowels not individually, but as letter combinations (bigrams in the present analyses), we can account for the superior priming effect of consonants over vowels using a parsimonious explanation based on informativeness, without having to appeal to qualitatively distinct representations for these two types of letter. This is in line with the proposal of Mehler and colleagues (Nespor et al., 2003;Bonatti et al., 2005Bonatti et al., , 2007Mehler et al., 2006;Toro et al., 2008;Pons and Toro, 2010) that consonants are lexically more constraining than vowels, and therefore more important for word identification (see also Carreiras et al., 2009).
These results suggest that an optimal or rational agent learning to read corpuses of real words should deliberately code for noncontiguous letters based on informational content, and given visibility constraints. Thus, the optimality or rationality argument is sufficient to explain the coding of non-contiguous letters. However, this does not preclude some contribution of biological constraints in readers, such as noise and inaccuracies, in such encoding.
In the present work we compared two measures of bigram informativeness with respect to their ability to account for orthographic priming data. Both measures used the conditional probability of bigrams in a given prime stimulus, with one measure taking the average of the conditional probabilities for a given prime, and the other taking the average rank value of each bigram in the prime stimulus among the complete set of bigrams in the target stimulus. Only the rank values successfully predicted the complete set of empirical findings, and rank values were generally more sensitive than conditional probabilities. So, given a prime stimulus TBL for the target TABLE, it would appear that priming effects depend more on how constraining the letters shared by prime and target are with respect to the other bigrams in the target word (TA, TE, AB, AL, AE, BE, LE) than on how the prime's bigrams (TB, TL, BL) constrain target word identity (i.e., their average conditional probabilities). This important finding suggests that during the process of learning to read, the most informative bigrams are "selected" or at least given priority over less informative bigrams, and the manner in which this priority is accorded has a long-lasting influence on the speed with which this information is processed. Furthermore, this finding explains why, in prior unpublished research from our laboratory, manipulations of lexical constraint failed to modulate the size of orthographic priming effects, since the manipulation of lexical constraint was based on average conditional probabilities.
The finding that the informativeness ranking of open-bigrams was a superior predictor of word identity than the average informativeness of open-bigrams can be linked to the work on diagnosticity [F(1,298) < 1]. In sum, average conditional probabilities of bigrams do not explain the effect of letter centrality, but the inverse ranking measure does. The within-word ranking measure of informativeness appears to be more sensitive than the conditional probability measure, an effect that was already observed in the size of the effects reported in Figure 5. This suggests that the relative informativeness of each bigram in the prime stimulus with respect to the other bigrams of the target word might be a better measure than the average measure of informativeness of letter combinations in the entire corpus (average conditional probability). In other words, the stimulus-specific relative information is perhaps more critical than the absolute measure of informativeness.

dIscussIon
In this theoretical note, we investigated the nature of the orthographic code that subserves the parallel mapping of location-specific letter identities onto location-invariant lexical representations. We hypothesized that some form of intermediate orthographic code is necessary, and that this code involves subsets of letter identities and information about their relative positions. We then examined the kind of intermediate orthographic code that would enable skilled readers to map letters onto words as efficiently as possible by optimizing utilization of the available information while minimizing resources. More precisely, we asked whether the constraints of optimality would lead to the coding of non-contiguous letter combinations as well as contiguous ones.
We examined two kinds of constraints that an optimal reader should take into consideration when processing orthographic information. The first involved variations in letter visibility across the different letters of a word during a single fixation. The second kind of constraint concerned the varying amount of information carried by the different letters in the word. We hypothesized that a system that aims to optimize orthographic processing, that is, to map letter identities onto whole-word orthographic representations as efficiently as possible, will adapt to such variations in visibility and informativeness. More specifically, we hypothesized that this optimization would involve coding of non-contiguous letter combinations, under two assumptions: (i) that letter combinations need to be computed, and (ii) that optimal letter combinations are not necessarily contiguous. Furthermore, we pointed out that in principle it should be easier to code for non-contiguous letter combinations, since order information is more reliable (at the cost of being less precise) in this case than when letters are contiguous 5 .
Our corpus analyses suggest that, indeed, optimal readers should code for non-contiguous letters. First, when selecting an ordered subset of letters which are critical to the identification of a word (that is, dropping non-essential letters that bear little information), 5 In the more general framework for visual word recognition developed by Grainger and Holcomb (2009) and Grainger and Ziegler (2011; see also Grainger and Dufau, 2011), the coding of non-contiguous letter combinations is used in one of the two routes of a dual-route approach to orthographic processing. As argued in the present work, this particular route is thought to optimize processing by selecting the most informative letter combinations in order to get from print to meaning as efficiently as possible. The other route is thought to optimize processing by performing data compression via the chunking of frequently co-occurring contiguous letters such as multi-letter graphemes and affixes. This route is therefore thought to be involved in getting from print to meaning via intermediate phonological and morphological representations. transparent relation between spelling and sound show a less pronounced advantage for the final position in the string (e.g., Ktori and Pitchford, 2008, for Greek). Therefore future work will need to examine whether such variations in letter visibility across different languages, combined with a measure of informativeness such as applied in the present work, could be usefully applied to examine possible differences in orthographic priming effects as a function of the language under study.
Summing up, our analyses of the information carried by orthographic prime stimuli in different orthographic priming conditions (matched in terms of the number of letters shared by prime and target) revealed one measure that captured the empirical data patterns. This is the rank of the conditional probabilities of the bigrams shared by prime and target among the complete set of target bigrams. This key finding suggests that further exploration of the role of letter combinations, and in particular non-contiguous letter combinations, in optimizing the extraction of meaning from print during skilled reading, is in order. acknowledgMents and visual object recognition (Gosselin and Schyns, 2001). In this work, participants are typically asked to classify visual images as belonging to one of a possible set of object identities, and one examines how classification performance varies as a function of the nature of the information that is made available using the socalled "bubbles" technique. For example, Fiset et al. (2008), using this technique, showed that on average only 32% of the printed area of uppercase and 24% of lowercase letters was used by observers to identify letters. Furthermore, their analysis revealed that terminations were by far the most diagnostic piece of information for letter identification, with intersections and horizontal lines providing further significant sources of information for uppercase letters. For example, the letter W was mainly distinguished from other letters by the presence of two terminations, one in the upper left corner and the other in the upper right corner, an example of a highly informative non-contiguous feature combination.
Finally, taking into consideration estimates of letter visibility had a significant impact on our ability to capture the pattern of priming effects when letter position varied (outer and central letter manipulations), but not when letter type varied (consonant vs. vowel manipulation). This is a logical result in that only changes in letter position cause a change in letter visibility, and the position of letters was matched for the consonant and vowel primes (across different targets) in the work of Duñabeitia and Carreiras. However, we acknowledge here that our letter visibility values were obtained with French native speakers, and there is evidence that such letter visibility functions can differ as a function of language. More precisely, there is evidence that languages with a more