Multiple Reversal Olfactory Learning in Honeybees

In multiple reversal learning, animals trained to discriminate a reinforced from a non-reinforced stimulus are subjected to various, successive reversals of stimulus contingencies (e.g. A+ vs. B−, A− vs. B+, A+ vs. B−). This protocol is useful to determine whether or not animals “learn to learn” and solve successive discriminations faster (or with fewer errors) with increasing reversal experience. Here we used the olfactory conditioning of proboscis extension reflex to study how honeybees Apis mellifera perform in a multiple reversal task. Our experiment contemplated four consecutive differential conditioning phases involving the same odors (A+ vs. B− to A− vs. B+ to A+ vs. B− to A− vs. B+). We show that bees in which the weight of reinforced or non-reinforced stimuli was similar mastered the multiple olfactory reversals. Bees which failed the task exhibited asymmetric responses to reinforced and non-reinforced stimuli, thus being unable to rapidly reverse stimulus contingencies. Efficient reversers did not improve their successive discriminations but rather tended to generalize their choice to both odors at the end of conditioning. As a consequence, both discrimination and reversal efficiency decreased along experimental phases. This result invalidates a learning-to-learn effect and indicates that bees do not only respond to the actual stimulus contingencies but rather combine these with an average of past experiences with the same stimuli.

to A and B, thus exhibiting an apparent lack of discrimination. This performance results from the fact that A and B acquired the same associative strength across trials.
The honeybee, Apis mellifera, constitutes an excellent model for studying the strategies implemented by a relatively simple and yet cognitively sophisticated brain (Menzel and Giurfa, 2001;Giurfa, 2007) to solve multiple reversal learning. In a natural context, honeybees are constant pollinators that remain faithful to a single floral species as long as it provides a profitable nectar and/ or pollen reward. The basis for such constancy is the fact that bees learn the floral features (colors, odors, etc) associated with reward (Menzel, 1985, Giurfa, 2007. Changes in food source profitability occur rapidly so that bees have to quickly switch to another floral species to ensure efficient foraging. This scenario could promote, therefore, fast solving of multiple reversal learning and eventually mastering the concept of alternation. On the other hand, it may also be efficient to solve this ecological problem by averaging positive and negative experiences over time, thus deciding whether or not it is timely to switch to another species. In the laboratory, appetitive learning in honeybees is studied using a Pavlovian conditioning protocol, the olfactory conditioning of the proboscis extension reflex (PER) (Takeda, 1961;Bitterman et al., 1983). In this protocol, a hungry bee that is harnessed and whose antennae are touched with sucrose solution exhibits a PER to reach out and suck the sucrose. Odors to the antennae do not release such a reflex in naive animals. If an odor is presented immediately before sucrose solution (forward pairing), an association is formed that enables the odor to release PER in a following test. Thus, the odor can be viewed as the conditioned stimulus (CS) and the sucrose solution as the unconditioned stimulus (US).

IntroductIon
Adapting to a changing environment requires constant evaluation of action outcomes. Reversal learning (Pavlov, 1927) is an example of how animals can deal with changing environments. In this paradigm, a subject is first trained to discriminate a rewarded stimulus A+ (where A stands for the stimulus and + for the presence of reward) from a non-rewarded stimulus B− (where − stands for the absence of reward) and once the discrimination is mastered, the contingencies are inversed (A− vs. B+) so that the subject has to learn to reverse its response to A and B. Reversals tend to be difficult as there are negative transfer effects; e.g., the individual tends to persist in responding to the stimulus that was originally reinforced. Eventually, however, this tendency becomes weaker, and the response to the alternative stimulus becomes more frequent until it is consistently evoked.
The capacity of animals to solve reversal learning tasks has been extensively studied using different conditioning procedures (for review, see Davey, 1989). In multiple reversal learning, successive reversals are performed using the same stimuli (e.g. A+ vs. B−, A− vs. B+, A+ vs. B−). A question underlying this protocol is whether or not animals solve successive discriminations faster (or with fewer errors) with increasing reversal experience. Indeed, a possible outcome of this kind o experiment is that, after extended reversal training, some animals are able to make the next reversal in the sequence faster or in fewer trials. They behave as if they have mastered the abstract concept of alternation or of regular sequence. However, another outcome is also possible if the animal applies a purely associative strategy that averages reinforcements and absence of reinforcements obtained for A and B over trials. If successive conditioning phases are even, animals end-up responding equally Multiple reversal olfactory learning in honeybees Theo Mota 1,2 and Martin Giurfa 1,2 * 1 Centre de Recherches sur la Cognition Animale, Université de Toulouse III -Paul Sabatier, Toulouse, France 2 Centre de Recherches sur la Cognition Animale, Centre National de la Recherche Scientifique, Université Toulouse III -Paul Sabatier, Toulouse, France In multiple reversal learning, animals trained to discriminate a reinforced from a non-reinforced stimulus are subjected to various, successive reversals of stimulus contingencies (e.g. A+ vs. B−, A− vs. B+, A+ vs. B−). This protocol is useful to determine whether or not animals "learn to learn" and solve successive discriminations faster (or with fewer errors) with increasing reversal experience. Here we used the olfactory conditioning of proboscis extension reflex to study how honeybees Apis mellifera perform in a multiple reversal task. Our experiment contemplated four consecutive differential conditioning phases involving the same odors (A+ vs. B− to A− vs. B+ to A+ vs. B− to A− vs. B+). We show that bees in which the weight of reinforced or non-reinforced stimuli was similar mastered the multiple olfactory reversals. Bees which failed the task exhibited asymmetric responses to reinforced and non-reinforced stimuli, thus being unable to rapidly reverse stimulus contingencies. Efficient reversers did not improve their successive discriminations but rather tended to generalize their choice to both odors at the end of conditioning. As a consequence, both discrimination and reversal efficiency decreased along experimental phases. This result invalidates a learning-to-learn effect and indicates that bees do not only respond to the actual stimulus contingencies but rather combine these with an average of past experiences with the same stimuli.
Differential conditioning with two odors, one rewarded and the other not, is also possible in this frame (Bitterman et al., 1983) and thus offers the opportunity to study reversal learning performances in honeybees (Ben-Shahar et al., 2000;Hosler et al., 2000;Ferguson et al., 2001, Komischke et al., 2002. Here we used the olfactory conditioning of PER to study how bees perform in a multiple reversal paradigm. Our experiment contemplated four consecutive differential conditioning phases involving the same odors, i.e., a first phase of differential conditioning (A+ vs. B−) and three subsequent phases of reversal (A− vs. B+ → A+ vs. B− → A− vs. B+). We asked whether bees would improve their discrimination performance with successive reversals or whether they would generalize their choice to both odors at the end of conditioning as a consequence of equating their associative strengths.

SubjectS
Free-flying honeybee foragers, Apis mellifera were caught at the entrance of an outdoor hive situated close to the laboratory building. Bees were placed in small glass vials and cooled in ice until they ceased their movements. The bees were then individually harnessed in small metal tubes so that they could only move their antennae and mouthparts, including the proboscis. Harnessed bees were kept in the dark and high humidity for 2 h. Fifteen minutes before starting the experiment, each subject was checked for intact PER by lightly touching one antenna with a toothpick soaked with 30% (weight/weight) sucrose solution without subsequent feeding. Extension of the proboscis beyond a virtual line between the open mandibles was counted as PER (unconditioned response). Animals that did not show the reflex (<5%) were discarded.

uncondItIoned and condItIoned StIMulI
The US was 30% (w/w) sucrose solution delivered to the antennae and mouth parts for 3 s. As the bees' ingestion rate for sucrose solution is 1 μl/s (Núñez, 1966), in each reinforced trial, bees received approximately 2-3 μl of sucrose solution. The CSs were the odorants 2-hexanol and 2-octanone (Sigma-Aldrich, Lyon, France), which are well learned and discriminated by the bees in olfactory PER conditioning (Guerrieri et al., 2005). Four microliters of pure odorant were applied onto a fresh strip of filter paper. The paper strip was then placed into a 1-ml plastic syringe and mounted in an odor-supplying device. When the bee was placed in front of the device, it received a gentle, constant flow of clean air provided by a standard aquarium pump. Computer-driven solenoid valves (Lee Company) controlled airflow delivery. During periods of odorant delivery, the airflow was shunted through a syringe containing the odorant. Each CS presentation lasted 4 s. An exhaust system was mounted behind the bees to remove odor-laden air.

condItIonIng procedure
Bees were trained along four consecutive differential conditioning phases. In the first phase, bees were presented with an A+ vs. B− discrimination. In the second phase, the contingencies were reversed so that they had to learn an A− vs. B+ discrimination. In the third phase, the contingencies were again reversed and bees had to discriminate A+ vs. B−. Finally, in the fourth phase a last reversal was proposed so that bees had to discriminate A− vs. B+.
Thus, bees experienced two contingency inversions between phases: A+ → A− and B− → B+ from the first to the second phase, A− → A+ and B+ → B− from the second to the third phase, and A+ → A− and B− → B+ from the third to the fourth phase.
Within each phase, reinforced and non-reinforced odorants were given five times (5 CS+ vs. 5 CS−), each in a pseudo randomized sequence. At most, two reinforced/non-reinforced trials succeeded each other within a conditioning phase. This experimental sequence was also varied from one day to the next. In all cases the intertrial interval (interval between two consecutive CS presentations, within or between phases) was 10 min. Thus, each conditioning phase lasted 90 min and the complete experiment, implying four conditioning phases also separated by 10 min, 6:30 h. Two independent groups of bees were trained along these four phases in order to balance 2-hexanol and 2-octanone as odorants A and B (see Table 1).

condItIonIng trIalS
The onset and offset of each trial as well as of CS and US delivery were controlled and signaled by a computer that was programmed to emit tones of different frequencies for each event. Each trial lasted 60 s. At the beginning of each trial the subject was placed in front of the odor-supplying device for 30 s to allow familiarization with the training situation. Thereafter the CS was presented for 4 s. In reinforced trials, the US onset occurred 3 s after CS onset. Both antennae were lightly touched with a toothpick soaked with the sucrose solution and after proboscis extension the bee was allowed to feed for 3 s. Therefore, the interstimulus interval was 3 s and the overlap between CS and US was 1 s. After US delivery, the bee was left in the setup until completing 60 s and then returned to its resting position. Non-reinforced trials consisted of CS presentations without US and lasted also 60 s.

reSponSe MeaSureMent
We recorded whether or not a bee extended its proboscis within 3 s after onset of the odor (CS). Responses in this interval could not be elicited directly by the US so that we measured conditioned responses to the odorants. Multiple responses during a CS were counted as a single PER. After completing the experiments, all animals were again checked for PER. If an animal did not respond (<5%) it was discarded.

StatIStIcal analySIS
We measured the percentage of conditioned responses (% PER) in reinforced and non-reinforced trials. Repeated-measurement analysis of variance (ANOVA) was used for between-group and within-group comparisons. Although parametric ANOVA is usually not allowed in case of dichotomous data such as those of the PER, Monte Carlo studies have shown that it is permissible to use ANOVA for a dichotomous dependent variable under certain conditions (Lunney, 1970), which are met by our data: equal cell frequencies and at least 40 degrees of freedom of the error term. To provide a quantitative account of reversals we computed for each bee an excitatory reversal score (∆ e ) as the difference in responses to the CS+ between the fifth and the first trial of a reversal phase (∆ e = CS+ trial5 -CS+ trial1 ), and an inhibitory reversal score (∆ i ) as the difference in responses to the CS− between the first and the fifth effects as well as a significant interaction effect (F 4,436 = 80.04; P < 0.0001) showing that responses to odors followed different significant trends during trials depending on their association with sucrose reward. In the 2nd phase, bees successfully mastered the first reversal as shown by the significant stimulus (F 1,109 = 6.37, P < 0.05) and trial (F 4,436 = 10.16; P < 0.0001) effects. Inversion of conditioned responses occurred in the 4th trial, thus yielding a significant stimulus × trial interaction (F 4,436 = 76.21, P < 0.0001). In the 3rd phase, bees again successfully reversed their conditioned responses to odors as shown by the significant stimulus × trial interaction (F 4,436 = 46.44, P < 0.0001). In this case, conditioned responses were inversed in the 3rd trial. Stimulus and trial effects were, however, not significant (stimulus effect: F 1,109 = 0.98, NS; trial effect: F 4,436 = 1.97, NS), probably because both curves were symmetrical, thus leading to a canceling effect for trial and stimulus. Finally in the last phase, a similar situation as in the 3rd phase was found. Bees successfully reversed their conditioned responses as shown by the highly significant stimulus × trial interaction (F 4,436 = 32.86, P < 0.0001). In this case inversion of conditioned responses was visible on the 4th trial. As in the previous, 3rd phase, stimulus and trial effects were not significant (stimulus effect: F 1,109 = 0.44, NS; trial effect: F 4,436 = 0.61, NS). Thus, bees mastered the original discrimination and the three consecutive reversals. However, Figure 1 shows that effective discrimination decreased along successive conditioning phases. Indeed the differentiation achieved at the end of each phase decreased along the four phases.
To provide a quantitative analysis of this effect, we computed for each phase a reversal score. Reversal discrimination learning is successful if there is an increase of conditioned responses to the CS+, based on its excitatory properties acquired through association with sucrose reward, and a decrease in responding to the CS−, based on its inhibitory properties related to the absence of reward. The excitatory component of reversal (∆ e ) can be quantified as the difference in responses to the CS+ between the fifth and the first trial of a reversal phase (∆ e = CS+ trial5 − CS+ trial1 ). The inhibitory component (∆ i ) can be quantified as the difference in responses trial of a reversal phase (∆ i = CS− trial1 -CS− trial5 ). Wilcoxon test was used to compare excitatory and inhibitory reversal scores. ANOVA for repeated measurements was used to compare ∆ i and ∆ e values between conditioning phases. A further index was computed for each bee to quantify the amount of discrimination reached at the end of each conditioning phase. Such a discrimination index (Di) was calculated as the difference between the responses to the CS+ minus the responses to the CS− in the last trial (Di = CS+ trial5 -CS− trial5 ). ANOVA for repeated measurements was used to compare Di values between conditioning phases. The alpha level was set to 0.05 (twotailed) for all analyses.

reSultS
Two independent groups of bees were trained along four consecutive differential conditioning phases involving two odorants, 2-hexanol and 2-octanone, and three reversals. In order to balance odor contingencies, Group 1 (n = 57 bees) was trained to discriminate 2-hexanol as odor A from 2-octanone as odor B, while Group 2 (n = 54 bees) was trained to discriminate 2-octanone as odor A from 2-hexanol as odor B (see Table 1). We first compared the performance of both groups along conditioning phases. Within each phase, there were no significant differences between Groups 1 and 2 as shown by 2 × 2 × 5 (group × stimulus A/B × trial) ANOVA for repeated measurements ( Table 2). Thus data from both groups could be pooled. Figure 1 shows the pooled performance of bees in our experiment (n = 111 bees). In the 1st phase (A+ vs. B−), bees successfully learned the discrimination. A 2 × 5 (stimulus A/B × trial) ANOVA for repeated measurements yielded significant stimulus (F 1,109 = 157.87; P < 0.0001) and trial (F 4,436 = 82.13; P < 0.0001)

Figure 1 | Conditioned responses during multiple reversal learning in honeybees.
Proboscis extension responses (% PER) to odors A and B during four consecutive differential conditioning phases. Bees experienced two contingency inversions between phases: A+ → A-and B-→ B+ from the first to the second phase, A-→ A+ and B+ → B-from the second to the third phase, and A+ → A-and B-→ B+ from the third to the fourth phase. n = 111 bees.   ) showed that the Di of the 1st phase was significantly higher than those of the other three phases (1st vs. 2nd phase: P < 0.001; 1st vs. 3rd phase: P < 0.01; 1st vs. 4th phase: P < 0.0001), while the Dis of the 2nd and the 3rd phase did not differ significantly. The difference between the Dis of the 3rd and 4th phase was marginally non-significant (P = 0.055). Thus, although bees managed to reverse the learned contingencies along three reversal phases, their success progressively decreased and odorant discrimination was achieved with increasing difficulty. Figure 1 shows the global responses of the entire population of bees tested. As such, it may mask differences in individual strategies applied to solve multiple reversal learning. In order to evaluate the success of an individual in multiple reversal learning, two elemental conditions have to be met: (a) the bee has to master the first olfactory discrimination (1st conditioning phase) because asking about reversal learning is meaningless if the very first learning task was not achieved; (b) the bee has also to succeed in the first reversal (2nd phase) because only then further reversal can be studied. Taking this into account, we classified bees in three categories: (1) bees that were not able to solve the very first discrimination (i.e., discrimination of the 1st phase; n = 35 bees); (2) bees that mastered the very first discrimination, but were unable to solve the subsequent reversal discrimination of the 2nd phase (n = 42 bees); (3) bees that solved the discriminations of the 1st and the 2nd phase (n = 34 bees). The 1st category represents bees that did not meet condition (a) (see above); the 2nd category represents bees that met condition (a) but not condition (b); finally, the 3rd category represents bees that met conditions (a) and (b), which were, therefore, those for which the question of success in further reversal learning was pertinent. The criterion used to define success in solving each phase was the presence of a dual correct response in the last (fifth) trial, i.e., PER to the CS+ and absence of PER to the CS−. Figure 4 shows the performance of the three categories of bees. Per definition, bees of the 1st category did not master the original discrimination (A+ vs. B−) of the 1st phase and this effect was not to the CS− between the first and the fifth trial of a reversal phase (∆ i = CS− trial1 -CS− trial5 ). Both scores were computed for each bee and reversal phase (2nd, 3rd and 4th phases). Figure 2 shows the average ∆ e and ∆ i scores obtained (n = 111). In the 2nd phase, in which bees experienced the first reversal, the mean excitatory score ∆ e was significantly higher than the mean inhibitory score ∆ i (∆ e = 0.60; ∆ i = 0.34; Wilcoxon test: Z = 7.11, P < 0.0001). This result indicates that after achieving the first olfactory discrimination (1st phase), bees were better in increasing responses to the formerly non-rewarded odor than in extinguishing responses to the formerly rewarded odor. In the 3rd phase, excitatory and inhibitory scores were the same (∆ e = 0.32; ∆ i = 0.32; Wilcoxon test: Z = 0, NS), thus confirming the symmetric performance. Finally, in the 4th phase, excitatory and inhibitory scores were also equivalent (∆ e = 0.23; ∆ i = 0.21; Wilcoxon test: Z = 0.45, NS). Excitatory and inhibitory scores significantly decreased along consecutive reversal phases (∆ e : F 1,220 = 20.41, P < 0.0001; ∆ i : F 1,220 = 3.17, P < 0.05). The excitatory score of the 2nd phase was significantly higher than those of the 3rd and 4th phases (Tukey test: P < 0.0001 in both cases), which did not differ between them. Similarly, the inhibitory score of the 2nd phase was significantly higher than that of the 4th phase (P < 0.05) but not of the 3rd phase. Inhibitory scores of the 3rd and 4th phase did not differ significantly.
Thus, multiple olfactory reversals lead to a progressive decrease in the bees' ability to reverse the reinforcement contingencies. As a consequence, differentiation levels reached at the end of each conditioning phase also decreased. Figure 3 shows the values of a differentiation index (Di) computed for each bee based on its responses in the fifth trial of each conditioning phase. This index was calculated as the difference between the responses to the CS+ minus the responses to the CS− in the last trial (Di = CS+ trial5 -CS− trial5 ). A comparison between Dis calculated for each phase showed a significant decrease of differentiation from the 1st to the 4th phase of the   phases (2nd, 3rd,  and 4th conditioning phases). ∆ e was calculated as the difference in responses to the CS+ between the fifth and the first trial of a reversal phase (∆ e = CS+ trial5 -CS+ trial1 ); ∆ i was the difference in responses to the CS− between the first and the fifth trial of a reversal phase (∆ i = CS− trial1 -CS− trial5 ). Statistical comparisons of excitatory scores between phases are indicated by letters (e.g. a, b). Comparisons of inhibitory scores between phases are indicated by letters with prime (e.g. a′, b′). Asterisks indicate significant difference between excitatory and inhibitory scores within a phase. n = 111 bees.
limited to the fifth trial ( Figure 4A: F 4,136 = 1.14, NS). The 2nd category, which per definition mastered the discrimination of the 1st phase ( Figure 4B: F 4,164 = 80.46, P < 0.0001), was however unable to master the first reversal task in the 2nd phase. Although these bees responded differently to the odors (F 4,164 = 24.36, P < 0.0001), they seemed unable to revert their response to the formerly rewarded (now non-rewarded) odor A (odor A × trial ANOVA: F 4,164 = 1.69, NS). They varied, nevertheless, their responses to the formerly non-rewarded (now rewarded) odor B (odor B × trial ANOVA: F 4,164 = 20.92, P < 0.0001). Bees of the 3rd category ( Figure 4C) were successful in solving the discriminations of the 1st (F 4,132 = 63.52, P < 0.0001) and the 2nd phases (F 4,132 = 60.86, P < 0.0001). It is, therefore, possible to analyze in this group whether solving a first reversal (2nd phase) improves or not reversal efficiency in the subsequent reversals (3rd and 4th phases).
To answer this question, for all three categories we computed excitatory (∆ e ) and inhibitory (∆ i ) scores for each reversal phase. Figure 5 shows the mean ∆ e and ∆ i scores calculated for each of category. Even if bees of the 1st category were not able to solve the first discrimination task during the 1st phase, some individuals were able to discriminate odors during the 2nd phase (A− vs. B+), and solved reversal tasks during the 3rd and 4th phases ( Figure 4A). Their mean excitatory score ∆ e (Figure 5A) was significantly higher than their mean inhibitory score ∆ i in the 2nd phase (∆ e = 0.43; ∆ i = 0.09; Wilcoxon test: Z = 2.64, P < 0.01). Although ∆ e values were also higher than ∆ i values in the 3rd and 4th phases, this difference was not significant (3rd phase: ∆ e = 0.41, ∆ i = 0.24, Z = 1.52, NS; 4th phase: ∆ e = 0.28, ∆ i = 0.17, Z = 1.12, NS). Excitatory ∆ e and inhibitory ∆ i scores ( Figure 5A) did not vary significantly between phases as shown by (score × phase) ANOVA for repeated measurements (∆ e : F 2,68 = 0.48,NS; ∆ i : F 2,68 = 1.48, NS). These results underline what seems to be a characteristic feature of these bees: after the first conditioning phase, where no learning was visible, they were more responsive to rewarded than to non-rewarded stimuli (see 2nd, 3rd, and 4th phases in Figure 4A), thus generating asymmetric curves for both kind of stimuli. This asymmetry, which is particularly visible in the 2nd phase (first reversal) could be seen, however, as a consequence of category sorting. Given that bees of the 1st category were, per definition, those not mastering the original discrimination (A+ vs. B−) of the 1st phase, one can argue that inhibitory learning in the 2nd phase has to be necessarily low because bees start from a low PER level due to the lack of excitatory learning in the 1st phase.
In the case of bees of the 2nd category ( Figure 4B), mastering reversal tasks was impossible because these bees were unable to revert their original (1st phase) responses to the rewarded odor A even if they reverted their original (1st phase) responses to the non-rewarded odor B (Figures 4B and 5B). Thus, in the 2nd phase, their inhibitory score was close to 0 (∆ i = 0.05) but their excitatory score was, on the contrary, positive (∆ e = 0.48), and the difference between scores was significant (Wilcoxon test: Z = 3.36, P < 0.001) thus showing that the absence of reversal was highly associated to the lack of extinction of the formerly rewarded odor A and not to the capacity to revert the learning about the formerly non-rewarded odor B (Figure 5B). The reversal being impossible in the 2nd phase, the 3rd phase prolonged this situation as the original, non-reversed learning (A+ vs. B−) was again reinforced. The excitatory score in Proboscis extension responses (% PER) to odors A and B during four consecutive differential conditioning phases. Categories were defined by determining individual success in solving the 1st and the 2nd conditioning phases. The criterion used to define success in solving each phase was the presence of a dual correct response in the last (fifth) trial, i.e., PER to the CS+ and absence of PER to the CS−. (A) First category (n = 35 bees) included individuals that were not able to solve the very first discrimination of the 1st phase (A+ vs. B−). (B) Second category (n = 42 bees) included individuals that mastered the very first discrimination, but were unable to solve the subsequent reversal discrimination of the 2nd phase (A− vs. B+). (C) Third category (n = 34 bees) included individuals that solved the discriminations of the 1st and the 2nd phase, for which, therefore, the question of success in further reversal learning (3rd and 4th phases) was pertinent. the 3rd phase was, therefore, close to 0 (∆ e = 0.04) as bees could not improve their already high responsiveness to the rewarded odor ( Figure 5B). The inhibitory score in this 3rd phase was, nevertheless, more important (∆ i = 0.26), and the difference between ∆ e and ∆ i was again significant (Wilcoxon test: Z = 2.21, P < 0.05), showing again that these bees could eventually revert their conditioned responses to an odorant that was only partially learned (odor B in the 2nd phase). Finally, in the 4th phase, bees were again unable to revert the A+ vs. B− discrimination reinforced in the 3rd phase. Their excitatory and inhibitory scores were equivalent (∆ e = 0.21; ∆ i = 0.29; Wilcoxon test: Z = 0.68, NS), thus showing a delayed and low tendency to start modulating their responses to A and B only in the last phase of the experiment (Figure 5B; see also Figure 4B). A (score × phase) ANOVA for repeated measurements showed significant changes both in excitatory ∆ e and inhibitory ∆ i scores along phases (∆ e : F 2,82 = 11.98, P < 0.0001; ∆ i : F 2,82 = 4.72, P < 0.001). The excitatory score of the 2nd phase was significantly higher than those of the 3rd and 4th phases (Tukey test: 2nd × 3rd phase, P < 0.001; 2nd × 4th phase, P < 0.01), which did not differ between them. At the same time, the inhibitory score of the 2nd phase was significantly lower than that of the 3rd and 4th phases (Tukey test: P < 0.01 in both cases). Inhibitory scores of the 3rd and 4th phase did not differ significantly. These results clearly reflect the high influence of negative transfer effects in the 2nd phase.
Finally, bees of the 3rd category (Figure 4C), which successfully mastered the original learning (A+ vs. B−) and the first reversal (A− vs. B+), allowed analyzing whether further reversals were improved by these achievements. Differently from the other two categories (Figure 5C), both excitatory and inhibitory scores (2nd phase: ∆ e = 0.94, ∆ i = 1.00; 3rd phase: ∆ e = 0.59, ∆ i = 0.50; 4th phase: ∆ e = 0.21, ∆ i = 0.14) were equivalent within each reversal phase (Wilcoxon test; 2nd phase: Z = 0.00, NS; 3rd phase: Z = 0.80, NS; 4th phase: Z = 0.63, NS). Thus, the capacity of bees to extinguish responses to the formerly rewarded odor was the same as the one to increase responses to the formerly non-rewarded odor in all reversal phases ( Figure 5C). As for the global analysis, ∆e and ∆i values of 3rd-category bees significantly decreased along consecutive reversal phases (∆ e : F 2,66 = 32.04, P < 0.0001; ∆ i : F 2,66 = 50.41, P < 0.0001). All possible comparisons between ∆ e or ∆ i scores corresponding to two different phases yielded significant difference (Tukey test: ∆ e, P < 0.001 in all cases; ∆ i , P < 0.001 in all cases). Thus, the analysis of the 3rd category, which included bees that were actually effective in solving olfactory reversals, shows that a progressive decrease in the ability to reverse reinforcement contingencies occurred along successive reversal phases.

dIScuSSIon
The present work shows that bees can master multiple olfactory reversals involving the same two odorants. In doing this, they do not improve their successive discrimination performances but rather tend to generalize their choice to both odors at the end of conditioning so that both discrimination levels and reversal efficiency (measured through excitatory and inhibitory scores) decreased along experimental phases. This result invalidates the hypothesis of a learning-to-learn effect, in which case a significant improvement of reversal efficiency should be evident in successive reversal phases.

Figure 5 | Average excitatory (∆ e ) and inhibitory (∆ i ) reversal learning scores (+ S.e.) computed for the three categories of bees, for the three reversal phases (2nd, 3rd, and 4th conditioning phases). (A)
First category (n = 35 bees) included individuals that were not able to solve the very first discrimination of the 1st phase (A+ vs. B−). (B) Second category (n = 42 bees) included individuals that mastered the very first discrimination, but were unable to solve the subsequent reversal discrimination of the 2nd phase (A− vs. B+). (C) Third category (n = 34 bees) included individuals that solved the discriminations of the 1st and the 2nd phase, for which, therefore, the question of success in further reversal learning (3rd and 4th phases) was pertinent. Statistical comparisons of excitatory scores between phases are indicated by letters (e.g., a, b). Comparisons of inhibitory scores between phases are indicated by letters with prime (e.g. a′, b′). Asterisks indicate significant difference between excitatory and inhibitory scores within a phase.
Previous work on olfactory reversal learning in honeybees suggested that a learning-to-learn effect may account for the performance of honeybees trained to solve successive olfactory differential conditionings tasks involving different overlapping pairs of odorants (Komischke et al., 2002). Bees that had experienced three previous reversals were better than bees with no previous reversal experience in solving the final reversal task (Komischke et al., 2002). Although we did not find such an effect, the results of Komischke et al. (2002) cannot be directly compared with those of our study. Indeed, while we only used two odorants (A, B) whose valences were simultaneously inversed from phase to phase, Komischke et al. (2002) used four odorants (A, B, C, D), and from the two that had to be discriminated within a phase, only the valence of one was inversed at a time, thus reducing the ambiguity of the problem (e.g. A+ vs. B−, B+ vs. C−, C+ vs. D−, D+ vs. A−). As discussed by Komischke et al. (2002), configural learning may have accounted for the bees' performance in their experiment. When odor pairs are different (AB, BC, CD, DA) bees can learn each odor pair in terms of a unique configuration in which the specific odor combination determines the appropriate choice. For instance, bees may learn that in the context of B, A is the rewarded odor, in the context of C, B is rewarded, in the context of D, C is rewarded, etc. Although bees may use this form of non-elemental processing when solving olfactory discriminations (Deisig et al. 2001(Deisig et al. , 2002(Deisig et al. , 2003Komischke et al., 2003), it cannot help solving the multiple reversals involving just two odorants, in which the outcome of a given configural unit AB changes from phase to phase.
Bees that could reverse their response to odors A and B along the consecutive phases of our experiment tended to generalize their response to both odors after extensive training. It seems, therefore, that they determined their response to a given odorant not only based on its actual contingency, but taking also into account previous experiences with it. Averaging positive and negative experiences along time would yield the progressive decrease in reversal and discrimination observed in our work, which becomes evident at the end of the 4th phase. This result shows that actual, novel experiences do not erase previous memories but are rather integrated into an updating process that allows reevaluation of the associative strength of a stimulus at any encounter. This result is consistent with analyses of memory dynamics in honeybees foraging on a patch of artificial feeders providing different rewards (Greggers and Menzel, 1993). It was shown that in these experimental circumstances, honeybee decisions are controlled by both short-term memories initiated by the reward just experienced and specific long-term memories of individual feeders within the patch. In our case, updating previous memories derived from a conditioned phase (e.g. A+, B−) with short-term memories from a subsequent reversal phase (e.g. A−, B+) may lead progressively to equivalent associative strengths for both odorants. Further reversals may enhance this effect thus resulting in a random choice for both stimuli.
Focusing on the olfactory circuit is necessary to understand the neural basis of multiple reversal learning as studied in our work. The olfactory pathway (CS pathway) has been well described in honeybees: axons of olfactory receptor neurons located on each antenna project to the antennal lobes where they synapse with approximately 4000 local interneurons and 800 projection neurons. Each antennal lobe is made of 166 glomeruli, which are the contact Comparable results were obtained by Menzel (1969) who st udied multiple reversal learning in free-flying honeybees trained with two colors, orange and blue. Using a differential conditioning protocol, Menzel (1969) trained honeybees to land five times on one of these colors to get sucrose reward and not on the alternative color that was non-rewarded. Once the first discrimination was mastered, the color contingencies were inversed as in our experiment. After three reversals, both colors were equally chosen at the end of the training procedure. Discrimination recovered only after bees were kept locked up in the hive for a day.
Our results differ in part from those of Menzel (1969) because after three reversals, we still observed a significant discrimination between the two trained odorants even if differentiation decreased and bees tended to respond equally to both odors. Though this difference may be due to different learning dynamics and accuracy in the case of color and olfactory cues and/or to the fact that our bees were restrained in the laboratory while they freely flew in Menzel's (1969) experiments, we cannot exclude that adding further reversal phases results in full generalization and equivalent choice levels for both odorants in our experiments. A more important distinction between Menzel's (1969) work and our study is the demonstration provided in our case that not all the bees are equivalent in terms of the strategies they implement when confronted with a multiple reversal learning problem. An analysis of excitatory and inhibitory scores associated with the responses generated by the CS+ and the CS−, respectively, showed that bees differed in the weight assigned to these two components. Efficient reversers exhibited comparable excitatory and inhibitory scores within each conditioning phase (Figure 5C), thus showing that they can equally invert their responsiveness toward excitatory and inhibitory stimuli. On the contrary, less-efficient reversers were characterized by an asymmetric weight between excitatory and inhibitory components (Figures 5A,B), which accentuated responses to one (either the CS+ or the CS−) of the stimuli that had to be discriminated. As a consequence, multiple reversal was partial ( Figure 4A) or did not take place ( Figure 4B) in these bees. The fact that bees of the same hive differed dramatically in the way they evaluate the CS+ and the CS−, and thus in the way they change their response to them, may be related to their different sensitivities to appetitive and aversive stimuli (Page et al., 1998;Roussel et al., 2009;see Page et al., 2006 for review). It has been suggested that appetitive and aversive behavioral syndromes coexist in a honeybee hive (Roussel et al., 2009). In other words, while some bees exhibit a biased responsiveness to appetitive stimuli (including sucrose and other sensory cues related to the foraging context), other bees exhibit biased responsiveness to aversive stimuli. These interindividual differences, which may determine different excitatory and inhibitory scores, may underlie the different performances observed in our multiple reversal experiment. This hypothesis can be easily tested by measuring in individual bees their responsiveness thresholds to appetitive sucrose solution of different concentrations (Page et al., 1998) and to aversive stimulation with electric shocks of different voltages (Roussel et al., 2009), measuring in each case the appropriate response, PER and sting extension reflex (SER), respectively. In this framework, we predict that bees having comparable sensitivity to appetitive and aversive stimuli will be efficient reversers.
follow predictable, well-defined flowering periods. In this context, worker bees must deal with fast changes in pollen or nectar resources and should be prepared to adapt their foraging behavior to changes in stimulus-reward contingencies. Indeed, when foodsource profitability changes, the ability of workers to rapidly switch to another food source will maximize colony productivity. One can, therefore, argue that reversal learning is an important component of colony fitness. However, strictly speaking, the protocol of multiple reversal learning conducted in our work would be hardly conceivable in a natural context. Indeed, in temperate biotopes, where flowering species are replaced one by the another, the scenario of two flower species A and B that would alternatively change their nectar/pollen reward multiple times is unrealistic. This may explain the progressive decrease in discriminative performance exhibited by the efficient reverser bees, which at the end tended to generalize between both odorants.
This argument does not exclude the possibility that in a natural scenario, bees do indeed "learn-to-learn," i.e., learn to perform better, when switching between species that follow each other in successive flowering periods. In this case, ambiguity would be reduced, thus favoring reversal strategies. In other words, rather than concluding that a learning-to-learn effect does not exist in honeybees, we should state that the particular learning conditions imposed by the natural environment or the experimenter may overshadow or make emerge the "learning-to-learn" effect. This conclusion is supported by experiments on reversal learning in bumblebees (Chittka, 1998). In these experiments, bumblebees were trained to collect sucrose solution in a small T-maze so that they had to choose the right arm when the entrance was marked blue and left when it was yellow. After a second reversal, bees chose directions randomly for several hundred trials, thus showing interference between the information learned in the first training and in the reversal, consistently with some of our findings. However, a single bumblebee trained with multiple reversals showed a performance that could be interpreted as a "learning-to-learn" effect; this bee displayed a poor performance until, after more than seven reversals, it detected in an extremely fast way that a reversal took place thus improving dramatically its choices (Chittka, 1998). Although this example is based on a single individual and has to be taken, therefore, cautiously, it suggests that an extensive training schedule may make emerge the "learning-to-learn" effect.
Note, however, that in our case, an extensive training schedule would not have the same effect given the important difference between the T-maze experiments with freely-moving bumblebees and our experiments with honeybees in contention. The latter, contrarily to bumblebees, do not return to the hive to unload the sucrose reward that is provided to them during the training. As a consequence, feeding sucrose reward during hundreds of trials is not possible because the honeybee's crop has a limited capacity of 60 μl (Núñez, 1966(Núñez, , 1982 and when this capacity is reached and bees are satiated, they do not exhibit the appetitive PER anymore, thus impeding the prosecution of the experiment. We cannot exclude, nevertheless, that in a free-flying bee protocol of multireversal learning, similar to that used by Menzel (1969) but with increasing number of reversals and trials, bees would be able to improve their reversal performance as the bumblebee did it in the experiments of Chittka (1998). sites of these different neuron classes. Projection neurons convey the processed information via two principal tracts to higher brain structures, the mushroom bodies and the lateral horn. Mushroom bodies have been traditionally related with learning and memory phenomena (Menzel, 1999;Giurfa, 2007). Specifically, it has been suggested that mushroom bodies are required for solving problems of higher complexity but not necessarily for elemental problems (Giurfa, 2003;Komischke et al., 2005;Devaud et al., 2007;Giurfa, 2007). Devaud et al. (2007) focused on simple olfactory reversal learning in honeybees and showed that reversible blocking of mushroom body signaling via a local injection of procaine impaired olfactory reversal (e.g. bees having learned to discriminate A+ from B− were unable to reverse to A− vs. B+); however further differential conditioning with two additional odors was left intact (e.g., bees having learned to discriminate A+ from B− could learn to discriminate C+ from D−). This led to the suggestion that mushroom body activity may be required to solve conflicts between contradictory CS-US associations (Devaud et al., 2007). If mushroom bodies were required for single reversal learning, it seems logic to suggest that their participation is of fundamental importance for the multiple reversal learning studied in our work as it involves the sequential processing of consecutive contradictory information about associations between CS and US. Obviously, if mushroom body blocking through local injection of procaine impedes the reversal of a learned discrimination, we expect it to also affect further reversals.
Reversal learning could be the appropriate tool to elucidate the control of neural plasticity in the olfactory circuit. Recent experiments have shown that following olfactory learning and the formation of a long-term olfactory memory (3 days after conditioning), structural changes are visible at the level of the antennal lobe where some glomeruli increase their volume in an odor-specific manner (i.e., depending on the odor conditioned; Hourcade et al., 2009). These changes may be due to an increase in synaptic branching for certain glomeruli, resulting from selective gene expression and protein synthesis following long-term memory formation. However, this mechanism has to be subjected to forms of cellular control as bees learning several flower species throughout their life as foragers, may not be subjected to continuous increases in glomerular volumes within the limits of their head capsule. One possibility is that switching to another floral species implies a concomitant decrease in those glomeruli that increased previously as a consequence of a first associative experience, together with an increase in the glomeruli that are pertinent for the novel species exploited. This hypothesis could be tested using reversal learning protocols. In this case, specific glomerular increases are expected for the first conditioning phase in the case of odor A+ but not for B− (Hourcade et al., 2009); however, the critical question is what happens to these glomeruli when A+ is reversed to A− and B− is reversed to B+. This experiment could be done to understand the neural mechanisms underlying reversal plasticity in the olfactory domain.
In an ecological context, honeybee foragers should be prone to reverse efficiently information learned about food sources. Honeybees are flower constant and exploit, therefore, the same floral species as long as it provides profitable nectar and/or pollen reward (Grant, 1950, Chittka et al., 1999. In temperate biotopes, which are characteristic of European bees, different flower species The observation that an increase in the number of trials may lead to the emergence of the "learning-to-learn" effect is consistent with the so-called "overlearning reversal effect" (Menzel, 1969). This effect, which determines that in a dual-choice situation reversal to the other alternative is increasingly favored with increasing number of trials, is interpreted either as a loss of US strength or a loss of attention to the conditioned stimuli as a result of a general decrease in motivation (Rescorla and Wagner, 1972). In experiments with free-flying bees (Menzel, 1969) or walking bumblebees (Chittka, 1998), the general motivation of the bees does not change throughout the series of learning trials because, otherwise, they would not come back to the feeding site on their own. This might indicate a decrease in the associative strength in predicted US presentations as a mechanism to explain the switch to the alternative stimulus in a dual-choice situation.
The comparison between our experiments and those using freely moving animals, which in turn may also differ depending on variables such as number of reversals and/or number of trials per train-ing phase, reveals that the strategy employed the bees to respond to the problem that is posed to them depends greatly on the design of the experiment and the conditioned stimuli used. The limitation of PER conditioning for questions on multiple reversal learning derives from the harnessing situation and the fact that bees are not allowed to unload the reward successively delivered to them, thus affecting appetitive motivation if hundreds of trials are required to uncover a "learning-to-learn effect." From that point of view, controlled experiments using visual stimuli and free-flying bees are appealing; the experimenter has only to have the persistence to test bees over much longer periods than those already used (Menzel, 1969), which proved already to be insufficient to uncover such an effect, if any.