What you learn is more than what you see: what can sequencing effects tell us about inductive category learning?

Inductive category learning takes place across time. As such, it is not surprising that the sequence in which information is studied has an impact in what is learned and how efficient learning is. In this paper we review research on different learning sequences and how this impacts learning. We analyze different aspects of interleaved (frequent alternation between categories during study) and blocked study (infrequent alternation between categories during study) that might explain how and when one sequence of study results in improved learning. While these different sequences of study differ in the amount of temporal spacing and temporal juxtaposition between items of different categories, these aspects do not seem to account for the majority of the results available in the literature. However, differences in the type of category being studied and the duration of the retention interval between study and test may play an important role. We conclude that there is no single aspect that is able to account for all the evidence available. Understanding learning as a process of sequential comparisons in time and how different sequences fundamentally alter the statistics of this experience offers a promising framework for understanding sequencing effects in category learning. We use this framework to present novel predictions and hypotheses for future research on sequencing effects in inductive category learning.

Inductive category learning takes place by studying several examples of novel concepts (for instance, several images of a species of snakes). There are several factors that influence the effectiveness of inductive category learning. Amply studied factors include the level of supervision (e.g., Kalish et al., 2011) the delay between seeing an example and receiving information about its category membership (e.g., Maddox et al., 2003;Smith et al., 2014), or the type of categories being learned (e.g., Ashby et al., 1998). In this review we will focus on another important component of inductive category learning-its temporal dynamics. Category learning takes place across time, by studying a series of examples rather than getting all category-relevant information at once, as would be the case with a definition of the category. As such, it is not surprising that the way in which information is organized, i.e., the sequence of events, can have a deep impact on what we learn (e.g., Goldstone, 1996;Schyns and Rodet, 1997), as well as how well we learn it (e.g., Kornell and Bjork, 2008;Wahlheim et al., 2011). There are many theoretically and practically important issues related to sequencing during category learning. Of these, the focus of this review is on the effects of interleaved vs. blocked sequencing. This focus is justified by the current spate of studies focused on this question, and its ease of implementation in classroom and online tutoring contexts.
Analyzing how the sequence of events during category learning shapes learning has obvious relevance for improving learning in different contexts, the classroom being probably one of the most societally impactful ones (Dunlosky et al., 2013). More broadly, we would like to argue that looking at the impact that sequencing factors have on learning has the potential to inform the theoretical views of how learning takes place over time, expanding our understanding of the learning process and its dynamic properties.

What Makes a Study Sequence Better for Learning?
The presentation of examples in inductive category learning can be organized in several different ways, such as by level of difficulty (Hull, 1920;Lee et al., 1988;Spiering and Ashby, 2008), variability (Elio and Anderson, 1984;Sandhofer and Doumas, 2008), or similarity relations (Elio and Anderson, 1981;Mathy and Feldman, 2009). Here, we focus on sequences differing not on the properties of the items being shown but on the degree of alternation between categories across successive presentations. For instance, imagine that learners were given the task of learning categories A, B and C, each composed of several items (A 1 , A 2 , B 1 , and so on). Items of each category can be presented in separate blocks (e.g., A1, A2, A3, A4, B1, B2, B3, B4, C1, C2, C3, C4), resulting in a study sequence with infrequent category alternation known as blocked study. On the other extreme of this continuum are sequences with frequent category alternation, known as interleaved study (e.g., A1, B1, C1, A2, B2, C2, A3, B3, C3, A4, B4, C4).
Most research comparing these two study sequences and asking the question of which results in better learning concludes that interleaved study is the most beneficial (e.g., Kornell and Bjork, 2008;Kang and Pashler, 2012;Wahlheim et al., 2012;Zulkiply et al., 2012). For example, Kornell and Bjork (2008) presented learners with several paintings from 12 different artists in either of these two sequences and showed that categorization accuracy for novel paintings in a subsequent transfer task was better following interleaved study (Experiments 1A and 1B). Moreover, following interleaved study participants were also better at determining whether a new painting was painted by a previously studied vs. a new artist (Experiment 2). These results have been shown for different kinds of stimuli and different types of tasks with diverse populations (Taylor and Rohrer, 2010;Wahlheim et al., 2011;Kang and Pashler, 2012;Zulkiply et al., 2012;Birnbaum et al., 2013;Li et al., 2013;Zulkiply and Burt, 2013;Carvalho and Goldstone, 2014b). There is, however, also evidence that blocked study can result in improved or equally effective learning (Kurtz and Hovland, 1956;Goldstone, 1996;Carvalho and Goldstone, 2011,b, 2015Birnbaum et al., 2013;Carpenter and Mueller, 2013;Zulkiply and Burt, 2013;Rawson et al., 2014), which raises the question of what changes with different learning sequences and how do these factors affect learning.
Interleaved and blocked sequences of study differ in several aspects that might contribute to differential learning. For instance, they differ in the amount of time between repetitions of the same category, amount of alternation between categories, and the amount of repetition of parts of successively presented stimuli. The importance of each of these factors and their combinations is not well understood.
One important feature of an interleaved sequence of study is the temporal delay between repetitions of the same category (akin to the "spacing effect" in memory, Ebbinghaus, 1913). Another important factor is the close juxtaposition of stimuli of different categories (akin to the intermixed/blocked effect in discrimination learning, Dwyer et al., 2004). A third factor is the overall "spread" of exemplar presentations throughout the entire learning task (and the associated difference in retention intervals), i.e., in the interleaved condition all categories are equally represented at all points during study, which is not the case in a purely blocked sequence. There is currently no general consensus over which of these properties is the basis for the interleaved study advantage when it is observed. This results in different theoretical proposals based on different cognitive mechanisms with different empirical predictions. Moreover, there are additional contextual factors that have been identified as having a possible modulating effect on the relative benefit of interleaved over blocked study. These factors include the structure of the categories at study (for instance the degree of similarity shared among different categories, Zulkiply and Burt, 2013;Carvalho and Goldstone, 2014b) and the type of task (Rawson et al., 2014;Carvalho and Goldstone, 2015).
An important theoretical and empirical question, thus, is: what makes a specific study sequence beneficial for learning? The objective of this paper is to answer this fundamental question by reviewing the literature examining the effect of interleaved vs. blocked study sequences on category learning. This review will focus on four key factors in the interleaved study advantage for category learning: temporal spacing, temporal juxtaposition, category structure, and retention interval. For theoretical purposes, and to allow a better contrast between the independent benefits of time and space for category learning and the interleaved advantage, in the development of this paper we will use the term interleaved study to refer to alternation of categories in opposition to blocked study which is the repetition of categories. Spaced study refers to sequences in which repetitions of specific items are spaced over time rather than massed, i.e., verbatim repetitions presented in immediate succession.

Temporal Spacing
The temporal delay between repeated presentations of items has a known effect in retention rates. Overall, if item X is presented several times during study, sequences that include a temporal delay between repetitions of X (spaced study) will result in better memory for that item compared to sequences in which the same number of repetitions of X are presented in immediate succession (massed study). This effect has been repeatedly demonstrated in memory tasks using paired associates and cued recall paradigms (Glenberg, 1976;Glenberg and Lehmann, 1980;Pashler et al., 2007;Cepeda et al., 2009;Delaney et al., 2010).
A related question is whether larger spacing is better than smaller spacing intervals. Larger spacing lags should improve retention because larger spacing increases the difficulty of repeated tests and this increased retrieval difficulty should increase long-term retention (Pyc and Rawson, 2009). Research looking at whether larger temporal lag between repetitions results in improved recall has demonstrated an overall memory benefit with longer spacing intervals (Bloom and Shuell, 1981;Carpenter and DeLosh, 2005;Logan and Balota, 2008;Rickard et al., 2008;Karpicke and Roediger, 2009;Karpicke and Bauernschmidt, 2011). Moreover, recent reviews of the literature indicate that the benefits of increasing the temporal spacing during study depend on the length of the retention interval (Donovan and Radosevich, 1999;Janiszewski et al., 2003;Cepeda et al., 2006). Cepeda et al. (2008) compared a set of temporal lags during study in the context of different retention intervals and noted that when retention interval increases the optimal temporal spacing during study increases as well.
There are several reasons why spacing might benefit long-term retention. One possibility is the increased encoding variability that comes with spacing. When repetitions are spaced in time they are more likely to occur in the context of different items, different emotional/attentional states and different learning states (Glenberg, 1976). This encoding variability may, in turn, result in a higher likelihood of a match between study and test conditions or a memory representation that does not include a particular, narrow context and is, thus, more accessible in the absence of that context (Smith et al., 1978).
Another possibility is that massed presentation increases item familiarity: often times the second encounter with a massed item feels more familiar than a second encounter with a spaced item (Dellarosa and Bourne, 1985). Because of this familiarity sense during massed study, attention to repeated presentation of the same item will be decreased, which will result in less efficient encoding and poorer memory.
Finally, spacing might benefit long-term retention because of increased recall difficulty. Every time a repetition of an item is presented the participant might recall the previous encounter. This recall process is easy in massed study conditions but more effortful with increased temporal lags. Because practice retrieving items from memory improves learning of those items, the more effortful this retrieval (i.e., the longer the delay), the greater is the predicted memory advantage (Bjork and Allen, 1970;Cuddy and Jacoby, 1982;Krug et al., 1990).
The benefits of interleaving different categories have been explained in terms of the benefit of spacing of repetitions (Kornell and Bjork, 2008;Wahlheim et al., 2011). When compared to blocked study, interleaving categories results in more temporal delay between repetitions of the same category. Thus, one possible reason why interleaving is often beneficial for category learning is that it increases temporal spacing. Increased temporal delays between repetitions of the same category allows for more forgetting of previous encounters with that category and a more effortful recall of the category properties, which would result in improved encoding of the exemplars and improved test performance, both for memory as well as categorization tasks. Moreover, interleaved study would also result in the same category being studied in the temporal context of a larger number of different categories and across different temporal stages (e.g., beginning, middle and end of the study sessions), resulting in greater encoding variability.
Before we can embrace the possibility that interleaved study benefits learning because of its temporal spacing nature, there is one aspect that needs to be taken into account: it is possible that repeating categories (as is the case in interleaved vs. blocked study sequences) and repeating items (as in spaced vs. massed study sequences) recruit different processes. When the same item is presented a second time there is no variation in its properties. However, when a novel item of a repeated category is presented, it is possible that everything other than the category assignment is different. In this sense, when a category is repeated, the learner's recollection of the previous encounter of the category will often not match the current presentation exactly, unlike what is the case with exact repetitions. This variability may be beneficial because it results in an iterative recall of several exemplars with each novel category encounter (Murray, 1983;Ross, 1984;Ross and Kennedy, 1990;Benjamin and Tullis, 2010;Wahlheim et al., 2014) or in the abstraction of commonalities between repetitions while forgetting the differences (Vlach, 2014), but there is also evidence that similar representations across repetitions result in better memory (Xue et al., 2010). In any case, this iterative recollection process is highly taxing on memory processes, and the resulting memory benefit might be reduced compared to exact item repetitions.
In blocked study sequences this iterative recall process would be less effortful because the item had just been presented when the category is repeated while in interleaved study sequences the greater time lag would result in more effortful recollection of previous items of the same category and better learning (Kornell and Bjork, 2008;Kornell et al., 2010). This conceptualization of the benefit of interleaved study being the result of spacing makes two empirical predictions, based on previous results from studies with item repetitions: (1) the more items are presented in each category, the more effortful recall is and thus learning would be better and (2) the more categories are presented the larger the lag between repetitions of the same category when interleaved, resulting also in more productively effortful recall. This would mean that interleaved study would be more beneficial in situations that include a greater number of categories and items in each category.
Although these hypotheses have not yet been subjected to systematic empirical test, the current evidence from different studies seems to be contrary to the predictions of a theory envisaging larger temporal spacing as the key factor for improved learning with interleaved study. Some studies comparing blocked and interleaved study in the context of category learning have shown interleaved benefits with as few as four categories with only four items presented in each category (e.g., Rohrer and Taylor, 2007, exp. 2;Taylor and Rohrer, 2010) and as many as twelve categories with ten items each (Zulkiply and Burt, 2013). A large number of studies with similar results (better performance after interleaved study than blocked study) include six categories or more with six exemplars or more in each (e.g., Kornell and Bjork, 2008;Kornell et al., 2010;Higgins and Ross, 2011;Wahlheim et al., 2011;Birnbaum et al., 2013). However, there is also evidence for a benefit of blocked study in studies with twelve categories with ten items each (Zulkiply and Burt, 2013), and as few as two categories with two items each (Higgins and Ross, 2011). Table 1 in Supplementary Materials presents a summary of a survey of 51 studies comparing interleaved and blocked study and how many categories and items in each category were used (for the 36 studies using stimuli organized in categories). For each study a repetition ratio was calculated by dividing the number of items in each category by the total number of categories (if a category contains less items, the category will be repeated less frequently). The median repetition ratio for studies showing a benefit of interleaved study is 1 (range = [0.5, 8]) compared to a median repetition ratio of 1.3 (range = [0.5, 4]) for studies showing a benefit of blocked study. Even though this analysis does not take into account other factors such as the number of repetitions of individual items within each category or the type of test task (memory, problem solving or categorization, for instance) and includes only a small number of studies, it suggests that temporal spacing, as measured by the number of items and categories studied, does not have a large impact over whether interleaving or blocking are more beneficial.
Another possible issue with this formulation of temporal spacing benefits in interleaved study is that, in studies of categorization, learners do not usually recall all of the properties of an individual exemplar, but rather have a biased representation toward some of its features. Research using eye-tracking technology shows that, with the progression of a category learning task, participants progressively attend less to irrelevant properties of the objects and more to the relevant ones (Rehder and Hoffman, 2005;Blair et al., 2009), which would result in a biased representation of the stimuli and, thus, a memory encoding that departs systematically from a faithful representation of the exemplars (Sloutsky and Fisher, 2004;Sloutsky, 2012, 2013). It is unlikely that when presented with a new exemplar of an old category, participants try to recollect all the features of all of the exemplars seen in that category.
There is however, some evidence that blocked study might result in decreased encoding of immediate repetitions of the same category. Wahlheim et al. (2011) analyzed memory performance for studied items at test based on which position in the study sequence the item had been studied. The results show a decreasing function for blocked study with learners more accurately classifying earlier items into the correct category than later ones. For interleaved study, however, there was no difference in categorization performance across study positions. These results indicate that memory/attentional processes are in play in the relative advantage of interleaved study.
Additionally, Kang and Pashler (2012; see also Mitchell et al., 2008) directly tested the possibility that the benefit of interleaved study is related to greater temporal spacing between repetitions of the same category. In this study, the authors contrasted learners' test performance following a spaced study sequence (in which repetitions of each category were spaced in time but not interleaved-similar to blocked study with added temporal spacing between repetitions) with a blocked study condition and an interleaved condition. The results show that interleaved study results in the best test performance. Moreover, blocked and spaced study result in equivalent test performances. Birnbaum et al. (2013) expanded on these results by comparing test performance in a categorization task following study organized by immediate succession of the same category (blocked contiguous study) with a study condition in which there was an intervening, unrelated, task between each repetition of the same category (blocked spaced study). The results from these experiments showed a benefit of blocked spaced study over blocked contiguous study, while both study sequences were less efficient than an interleaved contiguous version of the sequence (similar to Kang and Pashler, 2012). Interestingly, adding an unrelated task between successive presentations of different categories (interleaved spaced study) resulted in worse performance than simple interleaved study and no different than performance following blocked spaced study (for similar results using a discrimination task see Mitchell et al., 2008). The evidence from these studies fits in well with the proposal that there might be some decrease in attentional resources with successive repetitions of the same category which temporal spacing helps break, possibly by introducing a new task that reduces feelings of familiarity. However, temporal spacing, by itself, cannot be the underlying mechanism behind the benefit of interleaved study and is in fact detrimental for category learning with an interleaved study sequence.
In sum, the evidence for temporal spacing as the one factor influencing the advantage of interleaved study is tenuous, although some temporal spacing benefits cannot be ignored for a complete conceptualization of the sequencing effects during category learning. Furthermore, even though blocked study may decrease attentional processing in some situations, it still benefits learning in other situations.

Temporal Juxtaposition
In addition to the possible benefits of temporal spacing, there is another factor that has a substantial impact on learningtemporal juxtaposition. Interleaved and blocked study sequences differ in how closely in time items from the same vs. different categories are experienced. In interleaved study, items from different categories occur temporally closer together than items from the same category while in a blocked study sequence the reverse is true.
Another example of temporal juxtaposition having an influence is the finding that simultaneous presentation of two objects of different categories results in improved discrimination between them when compared to either interleaved or blocked study (MacCaslin, 1954;Williams and Ackerman, 1971;Oakes and Ribar, 2005;Mundy et al., 2007Mundy et al., , 2008Kang and Pashler, 2012;Carvalho and Goldstone, 2014b). For instance, Mundy et al. (2007, Experiment 3) presented learners with pairs of similar morphed faces in a pre-exposure procedure. Each pair could be presented successively in an interleaved fashion or simultaneously. Later, participants completed a same/different task that included the pairs previously presented. Regardless of whether test discrimination was done successively or simultaneously, discriminations were better following simultaneous presentation compared to successive presentation in two ways: (1) two stimuli from the same category are presented at the same time and (2) pairs of the same category are presented on consecutive trials, similar to the blocked sequence of study (for similar results with category learning see Kang and Pashler, 2012;Birnbaum et al., 2013).
In fact, Mundy et al. (2007) proposed that simultaneous presentation is more informative precisely because it reduces the memory constraints in comparing and contrasting tobe-discriminated stimuli. In a follow-up experiment the authors noted that increasing the number of trials in the successive sequence (interleaved study) reduces the numeric difference in discrimination performance following interleaved and simultaneous presentation of items from two different categories, which is consistent with the notion that simultaneous presentation increases the informational value of each trial, hence, more trials in the successive sequence is equivalent to a smaller number of simultaneous trials.
One prominent proposal is that interleaved study differs from blocked study in the way common and differentiating parts of the categories are presented and the same can be said for simultaneous vs. successive presentations (Hall, 1991;McLaren and Mackintosh, 2002;Mundy et al., 2007). In an interleaved sequence, the time lag between repetitions of the common features between items of the same category is considerably larger than between repetitions of discriminating features (those which distinguish between the categories). This would lead to differences in habituation levels to each part of the stimuli and better discrimination. In a blocked sequence there is no difference in time lag between repetitions of these two parts. Simultaneous presentation results in an intermediate amount of time between successive fixations on the discriminating features. Perhaps these temporal dynamics result in even more habituation to common features, which increases the attentional weighting toward discriminating features and improves discrimination (for similar proposal in the context of category learning see Goldstone, 1996;Kang and Pashler, 2012).
However, before we can fully embrace this possibility, there are two factors that need to be taken into account: (1) there are situations in which successive presentations are more beneficial than simultaneous and (2) in other situations, blocked study results in better performance than interleaved study.
Regarding the possibility that simultaneous presentation is not always better, Lipsitt (1961) showed that when children are asked to discriminate between three similar lights by pressing a different key for each, simultaneous presentation of similar lights (red, pink, and blue) resulted in improved discrimination compared to successive presentation. However, this pattern was reversed for high discriminability stimuli: successive presentation of less similar lights (red, green, and blue) resulted in improved discrimination. Similar results have been shown with children using a delayed match-to-sample procedure (Samuels, 1969), as well as adults discriminating between groups of shapes (Loess and Duncan, 1952), and non-human animals (MacCaslin, 1954). A similar interaction in the opposite direction has also been found with children using a different measure of performance (Williams and Ackerman, 1971).
It is conceivable that in both cases, the advantage results from unequal habituation to discriminating and common features of the stimuli, or a process of contrast that increases the saliency of the discriminating features. In the case of high discriminability stimuli (low similarity categories), less habituation would take place for most of the properties of the stimuli with every new trial (because most of the stimulus would be different in every trial), whereas for low discriminability stimuli (high similarity categories), habituation would occur to a larger degree for the common features. In this sense, perhaps a greater temporal lag between repetitions would help in the case of high discriminability stimuli because it increases memory consolidation which might be important given the greater complexity of encoding all the features of a stimulus-an idea we will come back to in the last sections of this paper.
However, none of these experiments included a blocked successive sequence of study. Can the habituation hypothesis be adapted to account for data showing a learning advantage with the use of a blocked study sequence? Whitman and Garner (1963) had adult participants learn two categories organized by the relational structure of geometrical objects in a figure.
The results showed that participants achieved criterion more quickly when stimuli from the same category where presented grouped. Similarly, Carpenter and Mueller (2013) demonstrated that following blocked study of several French words that instantiate the same pronunciation rule, participants were better at identifying that pronunciation rule in novel words in a subsequent multiple-choice test phase. It is hard to imagine how the habituation hypothesis could account for these results. If the same rule was presented over several trials in close succession, participants should have habituated to it and thus it should have been less available for use during test. In much the same way, if during blocked study in Whitman and Garner's (1963) experiments, participants had habituated to the common relation between successive stimuli of the same category, that relation would not have been available and it should take longer to achieve criterion than in an interleaved study sequence.
Another possibility, proposed by Hammer et al. (2008), is that the information available during simultaneous presentation of items from the same category differs quantitatively and qualitatively from the information available if the two stimuli presented simultaneously are from different categories. Hammer et al. (2008), have proposed that simultaneous presentation of items from different categories is most informative when the similarity between the items is high, while the reverse is true for simultaneous comparisons of items of the same category. Specifically, the authors propose that simultaneous study of same category pairs is to some extent always informative because it allows one to infer what properties are not relevant for categorization by determining features that vary across the pairs and further deduce what properties are likely to be relevant. Simultaneous presentation of objects from different categories, on the other hand, may allow inferring which features might be relevant for differentiating objects from different categories, but it is not effective for decisively learning the permitted variability within a category. Moreover, the advantage of simultaneous study of different categories decreases with increasing number of differences between categories, requiring the selection of ideal pairs to allow the isolation of relevant properties across trials (Hammer et al., 2008(Hammer et al., , 2009a. Evidence for this account comes from experiments with adults and children showing that there is a high level of individual differences in learning proficiency when adults are presented with only few objects from different categories during learning, even if the pairs are selected to be maximally informative (Hammer et al., 2008(Hammer et al., , 2009b. Moreover, young children (6-10 years) show similar proficiency to adults when learning a complex rule only if presented with pairs of two items from the same category (Hammer et al., 2009a). Hammer and colleagues proposal suggests that while blocked study (similar to simultaneous presentation of same-category items) will foster category learning across a wide range of tasks and developmental ages, interleaved study would require the selection of highly similar exemplars to be effective. In fact, presentation of arbitrary different-category examples that may differ in features relevant for differentiating between categories as well as features that are irrelevant is likely to hinder learning.
In sum, there is some evidence that temporal juxtaposition might play an important role in the benefits of specific study sequences. This evidence led to the proposal of the habituation and discriminative contrast hypotheses. However, these theories are unable to account for all the existing evidence. They cannot account for the learning benefit of blocked study, and accounting for why successive is sometimes better than simultaneous presentation requires resorting to a third variable.

Category Structure
When discussing the importance of temporal juxtaposition there was a variable that stood out as an important factor for whether a study sequence resulted in improved learning or not-category structure. In fact, category structure has been proposed as a key factor modulating sequencing effects in learning. For instance, based on an experiment showing an advantage of blocked study of two categories, Goldstone (1996), proposed that this advantage might be related to the relatively high discriminability between the stimuli of the same category (low within-category similarity). The author proposed that blocked study of the categories would allow participants to notice the subtle but critical within-category similarities that were necessary to learn these high discriminability categories. In agreement with this theory, Carvalho and Goldstone (2014b) demonstrated that by changing only the type of categories presented, participants could show improved learning following interleaved or blocked study. More precisely, interleaved study resulted in better performance for low discriminability categories (categories in which all the stimuli were highly similar, both within and between categories), whereas blocked study resulted in better performance for high discriminability categories (in which all the stimuli were dissimilar, both within and between categories). Similar results have been shown using different types of categories as well (Zulkiply and Burt, 2013).
Although comparing the category structure across different types of stimuli and tasks is unavoidably imprecise, a qualitative survey of the types of categories in the studies surveyed (see Table   1 in Supplementary Materials) indicates that this seems to be the case. All studies showing an interleaved advantage use categories that can be considered to be low discriminability categories, with a large number of shared properties and few discriminating features. Studies showing a blocked study advantage, on the other hand, often involve high discriminability categories in which the objects can be easily discriminated but learning the category rule requires identifying subtle common features.
In sum, category structure seems to have an important modulating effect over which study sequences might result in improved learning and it modulates the effects of different sequences of study (interleaved vs. blocked and successive vs. simultaneous study).

Retention Interval
Another factor that has been demonstrated to play a role in the advantage of interleaved over blocked sequences of study is the time lag between study and test (i.e., the retention interval). It is possible that the benefits of contextual interference promoted by interleaved study are more marked after some retention interval (Shea and Morgan, 1979), although in some cases the benefits also fade way with longer delays (Ste-Marie et al., 2004). Carvalho and Goldstone (2014a) recently, tested this possibility in the context of different category structures. The results showed an interaction between interleaved and blocked study and the type of category studied (similar to previous results mentioned above). However, this interaction was not modulated by the temporal delay between the end of study and the test.
Conversely, there is some evidence of increased benefit of interleaved study with increased delays in naturalistic settings. For instance, Rohrer et al. (2014b) created an intervention for middle-school math class that included blocked or interleaved study of four types of problems followed by a review session of all the problems studied and a test, either 1 or 30 days after the end of the review session. The results show an overall benefit for interleaved study as well as an increase in the numerical benefit of interleaved study with the increase in retention interval. It is possible, however, that the benefit of retention interval seen here is orthogonal to the relative benefits of different study sequences. It is possible, for example, that greater delays between the end of study and test promote better performance for all sequences of study, the difference being that one characteristic of interleaved study is that it includes a more even distribution of the problems across the entire learning sequence. Rohrer et al. (2014a) report partial evidence for this account in an experiment comparing interleaved and blocked study of mathematical problems in a naturalistic setting that did not include a review session before test. The results show that the benefit of interleaved over blocked study is smaller for materials studied in earlier blocks, that is, for materials for which the period between last study and test was the longest for blocked study and increases monotonically with decreasing retention intervals between end of blocked study and test. These results indicate that while retention interval may play a role, it is not exclusive so for interleaved sequences.
There is, however, a lack of research exploring the benefits of blocked study using long retention intervals. At first sight this might suggest that whatever blocked advantage is found, it is a short-term one. To optimize learning one needs to take into consideration not only short-term gains but also long-term gains and if blocking is only found as beneficial at short delays, it is perhaps not generally useful in many learning situations. However, regardless of retention interval, the number of studies showing a blocked advantage is low and, as we saw in the previous sections, related to the type of stimuli used.
Why would interleaved study potentiate long-term retention of information to a greater degree than blocked study? One possibility is that learners acquire the information equally well in both the blocked and interleaved sequences, but the introduction of contextual interference, by changing topics or tasks frequently, that accompanies interleaving results in long-term retention of this learning. Nonetheless, it is usually the case that during category acquisition, learners' performance is better for blocked compared to interleaved study (but see Lee, 2012). This may indicate that perhaps participants do learn more in the blocked study sequence but this learning is more transitory, because less cognitive effort is required. Vlach et al. (2012) presented what can be considered a direct test of this proposal. The authors taught 2 year-old children eight different categories organized around shape, each containing four similar exemplars varying in other properties (color, texture, and size). Different groups of children learned the categories either by studying all the exemplars simultaneously, individually blocked by category, or spaced (similar to the blocked condition but a play time was introduced after each naming trial). No interleaved condition was present in these experiments. Children were tested (1) immediately after learning each category (i.e., after learning the first category, a test session for that category would take place, before teaching the next category), and (2) 15 min later. Simultaneous presentation resulted in the best generalization performance for immediate tests. Interestingly, 15 min later, only children in the Spaced condition were able to generalize the categories learned above chance level. In fact, performance in the spaced condition group did not seem to change across the two time tests, while it decreased considerably for both blocked and simultaneous presentations. This experiment seems to indicate that inserting some contextual interference (in this case by playing between learning trials) resulted in improved long-term benefits. However, there are some procedural details that are worth taking into account in explaining these results. Immediate testing required children to remember only one category (the one they had just studied), while delayed test required children to discriminate between eight categories (unlike for immediate testing, in the delayed test condition all the tests were presented after study of all categories was completed). Accordingly, it is possible that spacing is advantageous for judgments requiring category discriminations, rather than delayed tests per se.
There is, however, another possibility that does not involve contextual interference effects. Perhaps performance during study is not related to how much is being learned but with ease of responding (it is easier to always give the same response than changing it on every trial). Furthermore, to the extent that learners are able to identify the relevant components of the task or the categories being learned, long-term retention should be improved, regardless of sequence of study. This would predict that, (1) long-term retention is a function of being successful at a learning task and (2) perhaps if studies using highly discriminable categories, for instance, had employed longer retention intervals, a benefit for blocked study would have been seen as well. Nonetheless, as of now, this remains an open question.

Implications for Category Learning
From the previous section one thing is certain: There is not one single factor that seems to explain the advantage in inductive category learning for one sequence over another. In fact, there might not be one single factor influencing study sequencing differences but several, acting in conjunction to shape learning. Additionally, most of the frameworks proposed thus far have envisaged different processes acting when study is interleaved compared to when study is blocked. For instance, the "desirable difficulties" framework (Bjork, 1994) proposes that these desirable difficulties are more present when interleaving and the habituation/discriminative contrast hypotheses (Mundy et al., 2008;Dwyer et al., 2011;Kang and Pashler, 2012;Rohrer, 2012) envision between-category comparison as the principal way to learn categories. However, it is possible to conceptualize a single learning process that would result in study sequencing differences naturally. What this means is that the effect of different sequences on learning efficacy might not be due to the sequence per se but the effect that sequencing has on a general-purpose learning process.
Consistent with this view, we have proposed the attentional bias framework Goldstone, 2014a,b, 2015). One of the fundamental assumptions of this proposal is that during any inductive category learning, learners focus their attention on and encode mostly differences between objects of different categories and similarities among objects of the same category, albeit not necessarily to the same degree. The two main ways for concept learning to proceed are by identifying withincategory similarities or between-category differences (Goldstone, 1996). While some accounts of category learning assume that learning must proceed by learning to attend to the features that discriminate among the categories being acquired (e.g., the between-category differences), a claim of the attentional bias framework is that categorization can also proceed by developing a positive, stand-alone characterization of a category that highlights the within-category similarities among examples (see also Hammer et al., 2008).
Another assumption is that during category learning, participants weight more heavily information acquired in the previous trial compared to the information acquired longer ago. This assumption is also based on empirical results in the category learning literature. For example, Jones and Sieck (2003) demonstrated the existence of what the authors termed a "recency effect" in category learning-participants are more likely to categorize a novel stimulus into the same category as the previous one if they are similar (see also, Stewart et al., 2002). The basic idea put forward by this research is that recent categorization events play a stronger role in a novel categorization decision than do older events, and categorization decisions are not based on a veridical analysis of the distribution of exemplars across time.
Taking into consideration these assumptions, one can hypothesize category learning as a continuous stream of category decisions in which attention is successively directed toward relevant similarities/differences by successive comparisons between the current trial and the recollection from the previous one(s). On each learning trial, the learner evaluates similarities and differences between the current stimulus and the recollection they have of the previous one(s), as well as the correct category assignment of the previous exemplar and the current one. If the previous object is similar to the current one and they belong to the same category, attention will be directed toward similarities. However, if they belong to different categories, attention will be directed toward differences. In this way, across time attention will be more and more biased toward relevant within-category similarities and between-category differences. This will in turn affect category representation, which will affect category encoding and recollection. With each new trial, the relevant properties will be progressively better encoded while irrelevant ones will be poorly or not encoded at all (see Figure 1 for a schematic representation of this proposal across different category structures).
When categories are studied interleaved, the number of transitions between objects of different categories is highly frequent, which will result in attending to differences between categories in most trials by the process described above (potentially also encoding within-category similarities that are repeated close in time). In the same way, when categories are studied blocked, the likelihood of a within-category transition is high, which will increase attention toward within-category similarities by the same process. This, in itself, cannot explain why interleaved study is more beneficial under some situations than others, of course. To do that, we need to also take into account the appropriateness of each of these two categorylearning processes (identifying between-category differences and identifying within-category similarities) for the task at hand.
What this framework posits is that in situations that require learning differences between categories, interleaved study will accelerate learning by promoting encoding of exactly these properties of the objects. On the contrary, for situations that require learning similarities within categories, blocked study will accelerate learning by promoting encoding of these similarities. It is important here to note that what the sequence of study is doing, according to this framework, is changing the relative frequency/statistics of different temporally proximate similarities and differences. This, in turn, affects the normal learning process resulting in differential encoding of stimuli properties.
What is the evidence for this framework? For one, these results are consistent with the research showing that interleaved study improves learning of low discriminability categories (where finding differences is key) while blocked study improves learning of low similarities categories (where finding similarities is key) (Zulkiply and Burt, 2013;Carvalho and Goldstone, 2014b). It is also consistent with most of the research showing benefits of interleaved study for low discriminability categories (e.g., Kornell and Bjork, 2008;Taylor and Rohrer, 2010;Kang and Pashler, 2012), and blocked study benefits for high discriminability categories (e.g., Kurtz and Hovland, 1956;Goldstone, 1996;Carpenter and Mueller, 2013). This might also explain why simultaneous presentation benefits learning of low discriminability categories but not high discriminability categories (e.g., Loess and Duncan, 1952), if one construes simultaneity as a limiting case of extreme temporal proximity.
However, the implications of this framework go beyond the properties of the categories. Any situation that changes the relative importance of differences between categories vs. similarities within categories should show similar results. Evidence for this comes from research showing that interleaved study benefits active learning situations while blocked study benefits passive learning situations (Carvalho and Goldstone, 2015), A similar interaction between the type of study sequence and whether examples of different categories are presented with a definition or not has also been found (Rawson et al., 2014). In this study when learners where given a definition of each concept and studied examples of the different concepts, blocked study resulted in better later classification of new examples. However, when no definition was given along with the examples, interleaved FIGURE 1 | Schematic representation of the "attentional bias framework" for sequential order effects in category learning. Objects are represented by pairs of letters, where each letter refers to a feature. There are two categories; one characterized by the presence of Feature A and the other Feature B, while the category exemplars have other features as well.
The top panel represents categorization of two low discriminability categories (both categories share the X feature), while the bottom panel represents categorization of two high discriminability categories in that the two categories do not share any feature between them. The size of each letter reflects the attention paid to it.
Frontiers in Psychology | www.frontiersin.org study resulted in better classification of new examples during test (Rawson et al., 2014). Importantly, changes in the relative importance of differences between categories vs. similarities within categories are not limited to different encoding situations but can be created by different testing situations as well. For example, Carvalho and Albuquerque (2012) demonstrated that interleaved study resulted in better performance in a test task that required discrimination between pairs of stimuli but not in a test task that did not require knowing these discriminative properties (see also de Zilva and Mitchell, 2012).
What happens in situations where the statistics of the sequence do not target the properties relevant for the particular learning situation at hand? One possibility is that learners will try to memorize individual exemplars instead of noticing these properties (for a detailed description of this possibility see Carvalho and Goldstone, 2014a). Evidence of this comes from studies showing that changing the sequence to include factors known to improve memory result in learning benefits in situations that do not target the relevant properties in the learning situation at hand (e.g., blocked study of low discriminability categories) but deter learning in situations that target such properties (e.g., interleaved study of low discriminability categories) (Kang and Pashler, 2012;Birnbaum et al., 2013).

Predictions and Hypothesis for Future Research
The theory proposed here makes three main predictions: (1) whether a study sequence is beneficial for learning or not depends on the locus of categorization difficulty or the test situation, (2) biasing the object comparison underlying category learning toward the difficulties involved in learning the specific set of categories results in improved learning, and (3) long-term representations of the categories are biased toward the properties more attended to during study, whether those benefit learning or not.
Some of these predictions have already been tested empirically. For instance, as mentioned before, interleaved study benefits learning low discriminability categories while blocked study is beneficial when learning high discriminability categories. These findings are in agreement with predictions (1) and (2). However, there is currently no direct evidence for prediction (3). Future research is needed to test whether the long-term benefits of interleaved study compared to blocked study are related to poor encoding in the absence of interference or, as proposed here, with the match between what was learned during study (the encoded representation of the categories) and the properties of the test phase. Perhaps following blocked study learners have a better memory representation of the similarities within categories, which would allow them to use these features more efficiently if they later become relevant (which is not the case in the studies conducted so far).
This possibility is particularly relevant for considering the consequences of blocked vs. interleaved presentations for transfer to new categorizations. If the representations for two simultaneously acquired categories emphasize their discriminating features, as would be expected with interleaved study, then transfer to a situation with partially new categories might be poor. If a math student learns to discriminate between quadrilaterals that are rhombuses vs. those that are not, the feature "four equal length sides" becomes discriminatively critical. This could lead to poor transfer if the student must subsequently learn to distinguish parallelograms from quadrilaterals that are not parallelograms because this feature is no longer discriminative for the new categorization. By comparison, category learning that emphasizes all of the features shared by category members, as expected from blocked study, might be expected to lead to better transfer. The feature "two pairs of parallel lines" is common to all rhombuses, and would likely be extracted from blocked study, even though it does not serve to discriminate rhombuses from other parallelograms. The general prediction is that if one is uncertain about what future categorizations are needed, blocked study may be a safer, less presumptive approach. In any case, understanding what properties of the stimuli are more efficiently encoded with different sequences of study would allow for general predictions about learning beyond how it benefits specific situations tested.
Moreover, the relative benefits of interleaved and blocked study as envisioned here are dependent on a myriad of other variables, which have not been tested yet. For instance, comparing successive objects is likely to be affected by the overall variability found in the category space, and the type of category space used, for instance whether rule-based or informationintegration categories are being studied (Ashby et al., 1998). Another relevant factor is the number of verbatim repetitions vs. category repetitions during category learning. Perhaps the benefits of interleaved study can be offset by increasing the number of new exemplars vs. exact exemplar repetitions during study because novel objects require longer inspection and encoding effort than repetitions. Alternatively, it is possible that the more varied and different the stimuli in a category are, the larger the benefit of blocked study would be.
Additionally, identifying between-category differences as proposed here requires one to be able to notice (and ignore) within-category similarities. Perhaps the optimal way to learn a category is to start by blocking each category, identifying relevant similarities, and then transitioning to interleaved study, using the identified similarities to bootstrap the identification of differences (Gentner and Markman, 1994). This is going to be particularly beneficial with novel categories or with young children because the space of categorization is unknown and blocked study allows for the identification of relevant category groups, which will support identifying differences later on.
Finally, the research presented here raises the broader point about category learning as a process across time and the importance of considering sequential factors in theories of category learning as well as formal models of category learning. Most of the current models of categorization assume that learners have access to the entire category space when making novel category decisions (Nosofsky, 1986;Kruschke, 1992;Ashby et al., 1998;Love et al., 2004) and the process through which differential encoding of stimulus features takes place across learning is not fully understood. One of the important steps for a complete understanding of how category learning takes place across time is the development of complete formal models that include temporal dynamics such as the sequencing factors reviewed here, capable of generating new hypothesis and testable predictions (see for example Stewart et al., 2002). It will also be important to extend the efforts in employing this research to applied situations, for example in schools. Taking advantage of formal models and clear quantitative predictions would allow suggestions for improved learning beyond crude one-solution-fits-all recommendations. Understanding the learning situation as a whole allows one to suggest not a single learning strategy but rather provides the flexibility of knowing how to choose between different learning strategies.