A Neurogenetic Dissociation between Punishment-, Reward-, and Relief-Learning in Drosophila

What is particularly worth remembering about a traumatic experience is what brought it about, and what made it cease. For example, fruit flies avoid an odor which during training had preceded electric shock punishment; on the other hand, if the odor had followed shock during training, it is later on approached as a signal for the relieving end of shock. We provide a neurogenetic analysis of such relief learning. Blocking, using UAS-shibirets1, the output from a particular set of dopaminergic neurons defined by the TH-Gal4 driver partially impaired punishment learning, but left relief learning intact. Thus, with respect to these particular neurons, relief learning differs from punishment learning. Targeting another set of dopaminergic/serotonergic neurons defined by the DDC-Gal4 driver on the other hand affected neither punishment nor relief learning. As for the octopaminergic system, the tbhM18 mutation, compromising octopamine biosynthesis, partially impaired sugar-reward learning, but not relief learning. Thus, with respect to this particular mutation, relief learning, and reward learning are dissociated. Finally, blocking output from the set of octopaminergic/tyraminergic neurons defined by the TDC2-Gal4 driver affected neither reward, nor relief learning. We conclude that regarding the used genetic tools, relief learning is neurogenetically dissociated from both punishment and reward learning. This may be a message relevant also for analyses of relief learning in other experimental systems including man.

To do so, the fruit fly offers a fortunate possibility for fine grained behavioral analyses, combined with a small, experimentally accessible brain. Once trained with odor-electric shock pairings, fruit flies avoid this odor as a signal for punishment (Tully and Quinn, 1985); training with a reversed timing of events, that is first shock and then the odor, on the other hand, results in approach toward this odor as a predictor for relief (in adults: Tanimoto et al., 2004;Yarali et al., 2008Yarali et al., , 2009Murakami et al., 2010; in larvae: Khurana et al., 2009). Presenting an odor together with a sugar reward establishes conditioned approach, too (Tempel et al., 1983).
Punishment and reward learning are well-studied, including how the respective kinds of reinforcement are signaled. Shock activates a set of fruit fly dopaminergic neurons (Riemensperger et al., 2005), defined by the TH-Gal4 driver; blocking the output from these neurons impairs punishment learning, but not reward learning (in adults: Schwaerzel et al., 2003;Aso et al., 2010;in larvae: Honjo and Furukubo-Tokunaga, 2009;Selcho et al., 2009; regarding the former larval study, Gerber and Stocker (2007) filed caveats which may challenge the associative nature of the used paradigm). Also, loss of function of the dopamine receptor DAMB selectively impairs punishment rather than reward learning in fruit fly larvae . Accordingly, in the cricket and the honey bee as well, punishment rather than reward learning is impaired by dopamine receptor antagonists (Unoki et al., 2005(Unoki et al., , 2006Vergoz et al., 2007). Finally, activating a set of dopaminergic neurons, defined by the TH-Gal4 driver in adult (Claridge- Chang et al., 2009;Aso et al., 2010) and reportedly also in larval (Schroll et al., 2006)

IntroductIon
Having no idea as to what will happen next is not only bewildering, but can also be dangerous. This is why animals learn about the predictors for upcoming events. For example, a stimulus that had preceded a traumatic event can be learned as a predictor for this event and is later on avoided. Such predictive learning qualitatively depends on the relative timing of events: a stimulus that occurred once a traumatic event had subsided later on supports opposite behavioral tendencies, such as approach, as it signals what may be called relief (Solomon and Corbit, 1974;Wagner, 1981) or safety (Sutton and Barto, 1990;Chang et al., 2003). Such opposing memories about the beginning and end of traumatic experiences are common to distant phyla (e.g., dog: Moskovitch and LoLordo, 1968, rabbit: Plotkin and Oakley, 1975, rat: Maier et al., 1976, snail: Britton and Farley, 1999, adult fruit fly: Tanimoto et al., 2004;Yarali et al., 2008Yarali et al., , 2009Murakami et al., 2010, larval fruit fly: Khurana et al., 2009), including man (Andreatta et al., 2010). This timing-dependency may reflect a universal adaptation to what one may call the "causal texture" of the world, such that whatever precedes X is likely to be the cause of X, and whatever follows X may be responsible for X's disappearance (Dickinson, 2001). Correspondingly, pleasant experiences, too, support opposing kinds of memory for stimuli that respectively precede and follow them (e.g., pigeon : Hearst, 1988; honeybee: Hellstern et al., 1998). Thus, to fully appreciate the behavioral consequences of affective experiences, it is necessary to study the mnemonic effects of their beginning and their end.
A neurogenetic dissociation between punishment-, reward-, and relief-learning in Drosophila ron in the bee, Hammer, 1993, and a recent study on dopaminergic signaling in the fly, Aso et al., 2010) the assignment of these putative roles to specific amine-releasing and receiving neurons and the receptors involved, as well as the utility of the genetic tools available. Here, we ask for the neurogenetic bases of relief learning, comparing the underpinnings of relief learning to punishment and reward learning.
We used shibire ts1 for temperature-controlled, reversible blockage of synaptic output (Kitamoto, 2001). shibire ts1 expression was directed to different sets of neuron by crossing the males of the respective Gal4 strains ( Table 1) to females of a UAS-shibire ts1 strain (Kitamoto, 2001; first and third chromosomes); thus the offspring were heterozygous for both the Gal4-driver and UAS-shibire ts1 . We refer to these flies with the name of the Gal4-driver together with "shi ts1 " (e.g., "TH/shi ts1 "). To obtain proper genetic controls, we crossed each of the UAS-shibire ts1 or the Gal4-driver strains to white 1118 flies, thus obtaining flies heterozygous either for the Gal4driver or for UAS-shibire ts1 . We refer to these as, e.g., "TH/+" and "shi ts1 /+," respectively.
To approximate the patterns of Gal4 expression, we used the respective drivers (Table 1) to express the UAS-controlled transgene mCD8GFP, which encodes for a green fluorescent protein (GFP) to insert into cellular membranes. To do this, we crossed males from each driver strain to females of a UAS-mCD8GFP strain (Lee and Luo, 1999; second chromosome) and stained the brains of the progeny against the Synapsin protein to visualize the neuropils and against GFP to approximate the pattern of Gal4 expression. Note however that the pattern of GFP-immunoreactivity does not necessarily reflect which neurons would be targeted had another effector, e.g., shibire ts1 been expressed using the same Gal4 driver (Ito et al., 2003): first, UAS-mCD8GFP and UAS-shibire ts1 may support different levels and patterns of background expression without any Gal4; this background expression then adds up with the driven expression when the Gal4 is present. Second, the level of mCD8GFP expression sufficient for immunohistochemical detection may well be different substitutes for punishment during training. Altogether, these results point to dopamine as covered by the applied genetic tools, to be necessary and sufficient to signal punishment.
As for reward signaling, this reinforcing role seems to be fulfilled by octopamine. In the honeybee, activity of a sugar responsive octopaminergic neuron "VUMmx1," innervating the olfactory pathway, is sufficient to substitute for the rewarding, but not the reflex-releasing, effects of sugar during training (Hammer, 1993), as does injecting octopamine at various sites along the olfactory pathway (Hammer and Menzel, 1998). In turn, interfering with the honey bee or cricket octopamine receptors impairs reward learning, but leaves punishment learning intact (Farooqui et al., 2003;Unoki et al., 2005Unoki et al., , 2006Vergoz et al., 2007). Accordingly, in the fruit fly, compromising octopamine biosynthesis via the tbh M18 mutation impairs reward learning, but not punishment learning (Schwaerzel et al., 2003;Sitaraman et al., 2010). Finally, in larval fruit flies, the output from a particular set of octopaminergic/tyraminergic neurons, defined by the TDC2-Gal4 driver seems to be required selectively for reward learning (see Honjo and Furukubo-Tokunaga, 2009, but see above); in turn, activating these neurons reportedly substitutes for the reward during training (Schroll et al., 2006).
These findings together suggest a double dissociation between the roles of dopamine and octopamine in signaling punishment and reward, respectively. This double dissociation however may need qualification, as the function of the fruit fly dopamine receptor dDA1 turns out to be required for both kinds of learning (in adults: Kim et al., 2007;in larvae: Selcho et al., 2009). The picture becomes more complicated with the additional role of dopaminergic neurons in signaling the state of hunger, which is a determinant for the behavioral expression of the sugar-reward memory in adult fruit flies (Krashes et al., 2009; in other insects, too, octopamine and dopamine affect the behavioral expression of memory, Farooqui et al., 2003;Mizunami et al., 2009; also in crabs: Kaczer and Maldonado, 2009). Finally, in a fruit fly operant place learning paradigm, where high temperature acts as punishment and preferred temperature as potential reward, neither dopamine nor octopamine signaling seems to be critical (Sitaraman et al., 2008(Sitaraman et al., , 2010. Thus, the scope of what octopamine and dopamine do for punishment and reward learning, memory, and retrieval remains open, including (except for the seminal case of the VUMmx1 neu-  As odorants, 90 μl benzaldehyde (BA), 340 μl 3-octanol (OCT), 340 μl 4-methylcyclohexanol (MCH), 340 μl n-amyl acetate (AM) and 340 μl isoamyl acetate (IAA) ; all from Fluka, Steinheim, Germany) were applied in 1 cm-deep Teflon containers of 5, 14, 14, 14, and 14 mm diameters, respectively. For the experiments in Figures 6A,B For punishment learning (Figure 1A), flies received six training trials. Each trial started by loading the flies into the experimental setup (0:00 min). From 4:00 min on, the control odor was presented for 15 s. Then, from 7:15 min on, the to-be-learned odor was presented also for 15 s. From 7:30 min on, electric shock was applied as four pulses of 100 V; each pulse was 1.2 s-long and was followed by the next with an onset-to-onset interval of 5 s. Thus the to-be-learned odor preceded shock with an onset-to-onset interval of 15 s. The control odor on the other hand preceded the shock by an onset-to-onset interval of 210 s, which does not result in a measurable association between the two (Tanimoto et al., 2004;Yarali et al., 2008, loc. cit. Figures 1D and2F, Yarali et al., 2009, loc. cit. Figure 1B). For relief learning ( Figure 1B), keeping all other parameters unchanged, we reversed the relative timing of events: that is, the to-be-learned odor was presented from 8:10 min on, thus following shock with an onset-to-onset interval of 40 s. At 12:00 min, flies were transferred out of the setup into food vials, where they stayed for 16 min until the next trial. At the end of the sixth training trial, after the usual 16 min break, flies were loaded back into the setup. After a 5 min accommodation period, they were transferred to the choice point of a T-maze, where they could escape toward either the control odor or the learned odor. After 2 min, the arms of the maze were closed and flies on each side were counted. A preference index (PREF) was calculated as: # indicates the number of flies found in the respective maze-arm. Two groups of flies were trained and tested in parallel ( Figure 1D). For one of these, e.g., 3-octanol (OCT) was the control odor and BA was to be learned; the second group was trained reciprocally. PREFs from the two reciprocal measurements were then averaged to obtain a final learning index (LI): Subscripts of PREF indicate the learned odor in the respective training. Positive LIs indicate conditioned approach to the learned odor; negative values reflect conditioned avoidance. Reward learning ( Figure 1C) used two training trials. Each trial started by loading the flies into the setup (0:00 min). One minute later, flies were transferred to a tube lined with a filter paper which was soaked the previous day with 2 ml of 2 M sucrose solution, and then was left to dry over night. This tube was scented with the to-be-learned odor. After 45 s, the to-be-learned odor was removed, and after 15 additional seconds flies were taken out of the tube. At the end of a 1 min waiting period, they were transferred into another tube lined with a filter paper which was soaked with pure water and then dried. This second tube was scented with the control from the level of shibire ts1 expression sufficient to block neuronal output; thus potentially, not all neurons that are visualized by immunohistochemistry may be affected by shibire ts1 or vice versa.
To test for an effect of an octopamine biosynthesis deficiency, we used the mutant strain tbh M18 (Monastirioti et al., 1996; also see Schwaerzel et al., 2003;Saraswati et al., 2004;Scholz, 2005;Brembs et al., 2007;Certel et al., 2007;Hardie et al., 2007;Sitaraman et al., 2010). These flies have reduced or no octopamine (Monastirioti et al., 1996), due to the deficiency of the tyramine β-hydroxylase enzyme, which catalyzes the last step of octopamine biosynthesis (Figure 2). Since the original tbh M18 strain (Monastirioti et al., 1996) contains an additional mutation in the white gene, we instead used a recombinant strain with a wild-type white + allele, which was generated by Schwaerzel et al. (2003). As genetic control, we used a nonrecombinant strain with wild-type tbh + and white + alleles, which was generated in parallel; we refer to this strain simply as "Control."

IMMunohIstocheMIstry
Brains were dissected in saline and fixed for 2 h in 4% formaldehyde with PBST as solvent (phosphate-buffered saline containing 0.3% Triton X-100). After a 1.5 h incubation in blocking solution (3% normal goat serum [Jackson Immuno Research Laboratories Inc., West Grove, PA, USA] in PBST), brains were incubated overnight with the monoclonal anti-Synapsin mouse antibody SYNORF1, diluted 1:20 in PBST (Klagges et al., 1996) and polyclonal anti-GFP rabbit antibody, diluted 1:2000 in PBST (Invitrogen Molecular Probes, Eugene, OR, USA). These primary antibodies were detected after an overnight incubation with Cy3 goat anti-mouse Ig, diluted 1:250 in PBST (Jackson Immuno Research Laboratories Inc., West Grove, PA, USA) and Alexa488 goat anti-rabbit Ig, diluted 1:1000 in PBST (Invitrogen Molecular Probes, Eugene, OR, USA). All incubation steps were followed by multiple PBST washes. Incubations with antibodies were done at 4°C; all other steps were performed at room temperature. Finally, brains were mounted in Vectashield mounting medium (Vector Laboratories Inc., Burlingame, CA, USA) and examined under a confocal microscope (Leica SP1, Leica, Wetzlar, Germany).

BehavIoral assays
Flies were collected from fresh food vials and kept for 1-4 days at 18°C and 60-70% relative humidity before experiments. For reward learning as well as for the punishment learning experiments shown in Figures 6B,B′, flies were instead starved overnight for 18-20 h at 25°C and 60-70% relative humidity in vials equipped with a moist tissue paper and a moist filter paper. Those experiments that did not use shibire ts1 were performed at 22-25°C and 75-85% relative humidity. For inducing the effect of shibire ts1 , flies were first exposed to 34-36°C and 60-70% relative humidity for 30 min; then the experiment took place under these same conditions, which are referred to as "@ high temperature." The condition referred to as "@ low temperature" in turn involved exposing the flies to 20-23°C and 75-85% relative humidity for 30 min; then the experiment followed also under these conditions.
The experimental setup was in principle as described by Tully and Quinn (1985) and Schwaerzel et al. (2003). Flies were trained and tested as groups of 100-150. Trainings took place under dim red light which does not allow flies to see, tests were in complete darkness.

Yarali and Gerber
Neurogenetics of relief learning FiGuRe 1 | Training. For punishment training (A), flies received two odors and pulses of electric shock. A control odor was presented long before shock; a to-be-learned odor preceded shock with an onset-to-onset interval of 15 s. For relief training (B), while all other parameters were unchanged, the to-be-learned odor followed shock with an onset-to-onset interval of 40 s. For reward training (C), flies were successively exposed to a to-be-learned odor in the presence of sugar and then to a control odor without any sugar. Although not shown here, in half of the cases, reward training started with the control odor instead of the to-be-learned odor and sugar. For each kind of training, we used a reciprocal design (D): two groups were trained in parallel; for one of these, e.g., 3-octanol (OCT) was the control odor and benzaldehyde (BA) was to be learned; the other group was trained reciprocally. Each group was then given the choice between the two odors. Based on the flies' distribution, preference indices (PREF) were calculated. Based on the two reciprocal PREF values, we calculated a learning index (LI). The situation is sketched for punishment learning, but also applies to relief and reward learning.
temperature, as shibire ts1 was benign, TH/shi ts1 flies performed comparably to the genetic controls in punishment learning (Figure 4A @ low temperature: Kruskal-Wallis test: H = 2.06, d.f. = 2, P = 0.36). Importantly, blocking output from TH-Gal4 neurons, a treatment which did impair punishment learning, left relief learning intact: with training and test at high temperature, we found relief learning scores of TH/shi ts1 flies to be indistinguishable from the genetic controls (Figure 4B @ high temperature: Kruskal-Wallis test: H = 0.10, d.f. = 2, P = 0.96). Accordingly pooling the data, we found conditioned approach (Figure 4B @ high temperature: one-sample sign test for the pooled data set: P < 0.05). One might argue that the generally low relief learning scores may not allow detecting a possible partial impairment due to neurogenetic intervention. This however does not apply to Figure 4B, as relief learning in the TH/shi ts1 flies does not even tend to be inferior to the genetic controls (similarly, see Figures 5B, 6C, and 7B). We note that punishment and relief learning procedures differ only with respect to the timing of the to-be-learned odor during training; otherwise they entail the same handling and stimulus-exposure. Therefore, intact relief learning in the TH/ shi ts1 flies ( Figure 4B) excludes sensory and/or motor problems as potential cause for the impairment in punishment learning ( Figure 4A, left).
Next, we used an independent driver, DDC-Gal4 (Li et al., 2000; Table 1; Figures 2 and 3B), to express UAS-shibire ts1 in a set of dopaminergic/serotonergic neurons. Blocking the output from these neurons left punishment learning unaffected: when trained and tested at high temperature, DDC/shi ts1 flies showed learning scores comparable to the genetic controls (Figure 5A @ high temperature: Kruskal-Wallis test: H = 2.14, d.f. = 2, P = 0.34). Thus pooling the scores across genotypes, we observed conditioned avoidance ( Figure 5A @ high temperature: one-sample sign test for the pooled data set: P < 0.05). This lack of effect on punishment learning may be caused by (i) the DDC-Gal4 driver not covering all dopaminergic neurons; (ii) incomplete overlap to those dopaminergic neurons targeted by the TH-Gal4 (Sitaraman et al., 2008;Claridge-Chang et al., 2009;Mao and Davis, 2009; see the odor. After 45 s, control odor was removed and 15 s later, flies were taken out of this second tube. The next trial started immediately. This transfer between the two kinds of tube during training should prevent the learning of an association between the control odor and the sugar. For half of the cases, training trials started with the to-be-learned odor and sugar; in the other half, control odor was given precedence. Once the training was completed, after a 3 min waiting period, flies were transferred to the choice point of a T-maze between the control odor and the learned odor. After 2 min, the arms of the maze were closed, flies on each side were counted and a preference index (PREF) was calculated according to Eq. 1. As detailed above (also see Figure 1D), two groups were trained reciprocally and the LI was calculated based on their PREF values according to Eq. 2.
Finally, a modified punishment training procedure (not shown in Figure 1) imitated the reward learning as in Figure 1C, but sugar presentation was replaced by 12 pulses of 100 V electric shock, each lasting 1.2 s and separated by an onset-to-onset interval of 5 s. statIstIcs All data were analyzed using non-parametric statistics and are reported as box plots, showing the median as the midline and 10, 90, and 25, 75% as whiskers and box boundaries, respectively. For comparing scores of individual groups to 0, we used one-sample sign tests. Mann-Whitney U-tests and Kruskal-Wallis tests were used for pair-wise and global between-group comparisons, respectively. When multiple tests of one kind were performed within a single experiment, we adjusted the experiment-wide error-rate to 5% by Bonferroni correction: we divided the critical P < 0.05 by the number of tests. One-sample sign tests were done using a web-based tool (http://www.fon.hum.uva.nl/Service/Statistics/ Sign_Test.html). All other statistical analyses were performed with the software Statistica (Statsoft, Tulsa, OK, USA). Sample sizes are reported in the figure legends.

BlockIng output FroM two dIFFerent sets oF dopaMInergIc neurons
First, we compared relief learning to punishment learning in terms of the roles of dopaminergic neurons. We confirmed that blocking the output from a particular set of dopaminergic neurons, using the temperature-sensitive UAS-shibire ts1 in combination with the TH-Gal4 driver   Table 1; Figures 2 and 3A), impairs punishment learning: when trained and tested at high temperature, TH/shi ts1 flies showed less negative learning scores than the genetic controls ( Figure 4A @ high temperature: Kruskal-Wallis test: H = 11.44, d.f. = 2, P < 0.05). This impairment in punishment learning, however, was obviously partial in the TH/shi ts1 flies ( Figure 4A @ high temperature: one-sample sign tests: P < 0.05/3 for each genotype), as was the case in previous studies (Schwaerzel et al., 2003;Aso et al., 2010). This residual learning ability may be due to incomplete coverage of dopaminergic neurons by the TH-Gal4 driver Sitaraman et al., 2008;Claridge-Chang et al., 2009;Mao and Davis, 2009; see the Discussion for details) and/or to an incomplete block of neuronal output by shibire ts1 . At low FiGuRe 2 | Biosynthesis of dopamine, tyramine, octopamine, and serotonin. DDC, dopa decarboxylase; TβH, tyramine β-hydroxylase; TDC, tyrosine decarboxylase; TH, tyrosine hydroxylase; TPH, tryptophan hydroxylase. Modified from Monastirioti (1999).

Yarali and Gerber
Neurogenetics of relief learning

FiGuRe 3 | Approximated patterns of Gal4 expression by the used drivers.
We drove the expression of a membrane bound green fluorescent protein (mCD8GFP) using three different Gal4 drivers. Patterns of GFPimmunoreactivity (green) should approximate the respective patterns of Gal4-expression; Synapsin-immunoreactivity (magenta) shows the organization of the neuropils. We display projections of frontal optical sections of 0.9 μm, each. In each row, the leftmost panel shows the anterior-most projection; in each panel, dorsal is to the top. When driven by TH-Gal4 (A), GFP was expressed in neurons that innervate the mushroom body vertical lobes and peduncles (left and middle panels) as well as the fan-shaped body (middle panel) and the protocerebral bridge (right panel). We found no innervation of the antennal lobes or the mushroom body calyces (but see Mao and Davis, 2009). Under the control of the DDC-Gal4 driver (B), GFP was expressed in neurons that innervate the subesophageal ganglion (left and middle panels) as well as the horizontal lobes of the mushroom body (right; see also the inset). Neurons that express GFP, driven by TDC2-Gal4 (C) innervated the antennal lobes (left panel), mushroom body γ-lobes and their spurs (left panel, inset), the subesophageal ganglion (left and middle panels), the areas surrounding the esophagus (middle panel), and the mushroom body calyces (right panel; see also the inset).

www.frontiersin.org
December 2010 | Volume 4 | Article 189 | 7 Yarali and Gerber Neurogenetics of relief learning FiGuRe 4 | Targeting a set of dopaminergic neurons, using the TH-Gal4 driver. We expressed shibire ts1 in the set of dopaminergic neurons defined by the TH-Gal4 driver. Punishment learning was partially impaired at high temperature (A, left), but not at low temperature (A, right). Contrarily, relief learning remained unaffected even at high temperature (B). *P < 0.05 and NS: P > 0.05 while comparing between genotypes. While comparing scores of each genotype to 0 *P < 0.05/3, to keep the experiment-wide error-rate at 5% (i.e., Bonferroni correction). Sample sizes were N = 8, each in (A) and 13, each in (B).
Box plots show the median as the midline; 25 and 75% as the box boundaries and 10 and 90% as whiskers.
Discussion for details), (iii) incomplete block of synaptic output by shibire ts1 ; (iv) a dominant-negative effect of DDC-Gal4, which is non-additive with the effect of shibire ts1 expression in these neurons (see below).
In any case, we probed for an effect of blocking output from the DDC-Gal4 neurons on relief learning and found none: after training and test at high temperature, learning scores were not different between genotypes (Figure 5B @ high temperature: Kruskal-Wallis test: H = 1.24, d.f. = 2, P = 0.54). We thus pooled the data and found weak yet significant conditioned approach (Figure 5B @ high temperature: one-sample sign test for the pooled data set: P < 0.05). We note that the DDC/+ flies tended to show less pronounced punishment and relief learning when compared to the TH/+ flies (compare Figure 4 versus Figure 5) as well as when compared to the shi ts1 /+ flies (Figure 5). In the case of punishment learning, as we used a Kruskal-Wallis test across all three experimental groups, this effect of the DDC-Gal4 driver construct may have obscured an actual effect of blocking the output from DDC-Gal4-targeted neurons (compare shi ts1 /+ to DDC/shi ts1 in Figure 5A). For relief learning, however, no corresponding trend is noted (compare December 2010 | Volume 4 | Article 189 | 8

Yarali and Gerber
Neurogenetics of relief learning inconclusive (Figure 5). We would like to stress that this does not at all exclude a role for the dopaminergic system in relief learning, given that first, in neither experiment did we cover all dopaminergic neurons at once, and second, as a general concern, blockage of neuronal output by shibire ts1 may well be incomplete (see the Discussion for details).

coMproMIsIng octopaMIne BIosynthesIs
Next, we compared relief learning to reward learning in terms of the role of octopamine. We first confirmed that compromising octopamine biosynthesis via the tbh M18 mutation in the key enzyme tyramine β-hydroxylase (Monastirioti et al., 1996; Figure 2) impairs reward learning: after odor-sugar training, using the odors 3-octanol (OCT) and 4-methylcyclohexanol (MCH), the tbh M18 mutant showed significantly less conditioned approach than the genetic Control (Figure 6A: U-test: U = 544.00, P < 0.05). Residual reward learning ability was however detectable in the tbh M18 mutant (Figure 6A: one-sample sign tests: P < 0.05/2 for each genotype). This contrasts to the report of Sitaraman et al. (2010), who had shown a complete loss of reward learning using the same odors; the discrepancy may be due to the different genetic backgrounds used in the two studies (i.e., the present study uses the strains from Schwaerzel et al., 2003, whereas Sitaraman et al., 2010uses those from Certel et al., 2007. Schwaerzel et al. (2003) found no reward learning ability in the tbh M18 mutant, using the odors ethyl acetate and isoamyl acetate (IAA); indeed, using n-amyl acetate (AM) and IAA as odors, we also found a complete loss of reward learning in the tbh M18 mutant (Figure 6A′: U-test: U = 33.00, P < 0.05; onesample sign tests: P < 0.05/2 for Control, and P = 0.58 for the tbh M18 mutant). Surprisingly however, when the odors OCT and benzaldehyde (BA) were used, tbh M18 mutant flies showed fully intact reward learning (Figure 6A′′: U-test: U = 204.50, P = 0.27; one-sample sign test for the pooled data set: P < 0.05). This lack of effect in Figure 6A′′ should not be due to the relatively low learning indices of the Control flies, since in Figure 6A, we could detect even a partial effect of the tbh M18 mutation despite such low Control scores. Note that using the present two-odor reciprocal training design (Figure 1D), the contribution of each odor to the LI, and hence the question whether the tbh M18 mutation affects learning about any one given odor but not the other, remains unresolved. We can however conclude that the reward learning impairment of the tbh M18 mutant can be partial, complete, or absent, depending on the combination of odors used and likely also on the genetic background; this suggests residual octopaminergic function and/ or an octopamine-independent compensatory mechanism (see the Discussion for details).
To test for an effect of the tbh M18 mutation on punishment learning, we used a modified training, which entailed the same prestarvation, handling, and stimulus-exposure as reward learning, except the sugar presentation was replaced by shock pulses. In such modified punishment learning, the tbh M18 mutant performed comparably to the genetic Control, using either the odors OCT and MCH (Figure 6B: U-test: U = 47.00, P = 0.15; one-sample sign test for the pooled data set: P < 0.05) or AM and IAA (Figure 6B′: U-test: U = 38.00, P = 0.82; one-sample sign test for the pooled data set: P < 0.05). Thus, confirming Schwaerzel et al. (2003), we can conclude that reward and punishment learning are dissociated in terms shi ts1 /+ to DDC/shi ts1 in Figure 5B). In any case, with respect to the role of the neurons defined by DDC-Gal4, our results do not offer an argument to dissociate punishment from relief learning.
To summarize, concerning the neurons defined by TH-Gal4, we found a clear dissociation between punishment and relief learning (Figure 4), while for the DDC-Gal4 neurons the situation remains December 2010 | Volume 4 | Article 189 | 9

Yarali and Gerber
Neurogenetics of relief learning FiGuRe 6 | Compromising octopamine biosynthesis using the T βH mutant. We used the tbh M18 mutant, which has reduced or no octopamine. When the odors 3-octanol (OCT) and 4-methylcyclohexanol (MCH) were used, reward learning was partially impaired (A). Using the odors n-amyl acetate (AM) and isoamyl acetate (IAA) revealed complete lack of reward learning in the tbh M18 mutant (A′). When the odors OCT and benzaldehyde (BA) were used, tbh M18 mutant was intact in reward learning (A′′). A modified punishment learning procedure, which was identical to reward learning, except that the shock pulses were replaced by sugar presentation, revealed no impairment in the tbh M18 mutant, when either the odors OCT and MCH (B) or AM and IAA (B′) were used. Finally, under those conditions for which reward learning of the tbh M18 mutant was partially impaired, i.e., using the odors OCT and MCH, relief learning remained unaffected (C). For this experiment, the odors AM and IAA were not used, as these do not support relief learning (Yarali et al., 2008, loc. cit. Figure 5D). *P < 0.05, NS: P > 0.05, while comparing between genotypes. While comparing scores of each genotype to 0 *P < 0.05/2, NS: P > 0.05/2 (i.e., Bonferroni correction). Sample sizes were from left to right N = 40, 39 in (A), 11, 13 in (A′), 23, 22 in (A′′), 12, 12 in (B), 9, 9 in (B′), and 20, 20 in (C). Box plots are as detailed in Figure 4.
of the effect of the tbh M18 mutation. In addition, normal performance of the tbh M18 mutant in this modified punishment learning makes deficiencies in odor perception or motor control unlikely as causes for the reward learning impairment (Figures 6A,A′).
In order to test for an effect of the tbh M18 mutation on relief learning, we used the odors OCT and MCH, because the odors AM and IAA do not support relief learning (Yarali et al., 2008, loc. cit. Figure 5D). Under conditions for which the tbh M18 mutant did December 2010 | Volume 4 | Article 189 | 10

Yarali and Gerber
Neurogenetics of relief learning show a reward learning impairment, however partial (i.e., using the odors OCT and MCH), relief learning ability remained unaffected: learning scores were statistically indistinguishable between genotypes ( Figure 6C: U-test: U = 168.00, P = 0.40), with no apparent trend for lower scores in the tbh M18 mutant. We thus pooled the data and found weak yet significant conditioned approach (Figure 6C: one-sample sign test for the pooled data set: P < 0.05).

BlockIng the output FroM a set oF octopaMInergIc/ tyraMInergIc neurons
As an additional, independent assault toward the octopaminergic system, we blocked the output from a set of octopaminergic/ tyraminergic neurons, using UAS-shibire ts1 , in combination with the TDC2-Gal4 driver (Cole et al., 2005; Table 1; Figures 2 and 3C). We first tested for an effect on reward learning: when trained and tested at high temperature, TDC2/shi ts1 flies performed comparably to the genetic controls ( Figure 7A @ high temperature: Kruskal-Wallis test: H = 3.03, d.f. = 2, P = 0.22). Accordingly pooling the learning scores across genotypes, we found conditioned approach (Figure 7A @ high temperature: one-sample sign test for the pooled data set: P < 0.05). This lack of effect on reward learning may be because the TDC2-Gal4 driver does not target all octopaminergic neurons (Busch et al., 2009; see the Discussion for details) and/or the output from the targeted neurons is not completely blocked by the shibire ts1 . Nevertheless, we probed for an effect on relief learning and found none: after training and test at high temperature, learning scores were statistically indistinguishable between genotypes (Figure 7B @ high temperature: Kruskal-Wallis test: H = 2.43, d.f. = 2, P = 0.30). Accordingly pooling the data, we found conditioned approach (Figure 7B @ high temperature: one-sample sign test for the pooled data set: P < 0.05). To summarize, while reward and relief learning are apparently dissociated when considering the tbh M18 mutant, we can put no distinction between these two kinds of learning in terms of the role of the neurons covered by the TDC2-Gal4 driver. Again, this does not rule out a role for the octopaminergic system in relief learning, as these conclusions refer only to the specific genetic manipulations used.

dIscussIon
We compared relief learning to both punishment learning and reward learning, focusing on the involvement of aminergic modulation by dopamine and octopamine.
As previously reported (Schwaerzel et al., 2003;Aso et al., 2010), directing the expression of UAS-shibire ts1 to a particular set of dopaminergic neurons defined by the TH-Gal4 driver partially impaired punishment learning ( Figure 4A). Relief learning however was left intact (Figure 4B). Expressing UAS-shibire ts1 with another driver, DDC-Gal4, on the other hand affected neither punishment nor relief learning (Figure 5).
All dopaminergic neuron clusters in the fly brain are targeted by the TH-Gal4 driver; some clusters however, are covered only partially, e.g., 80-90% of the anterior medial "PAM cluster" neurons are left out Sitaraman et al., 2008;Claridge-Chang et al., 2009;Mao and Davis, 2009). Contrarily, the DDC-Gal4 driver, along with serotonergic neurons, likely targets most of the PAM cluster dopaminergic neurons, while possibly leaving out dopaminergic neurons in other clusters (Sitaraman et al., 2008; Figure 3B). In a mixed classical-operant olfactory punishment learning task, Claridge- Chang et al. (2009) found no impairment upon blocking the activity of most PAM cluster neurons with an inwardly rectifying K + channel (UAS-kir2.1), driven by HL9-Gal4. Although relying on both a different Gal4 driver and a different effector, this result is in agreement with the intact punishment learning we found when expressing UAS-shibire ts1 with the DDC-Gal4 driver ( Figure 5A). Thus, as would enable signaling gustatory reward onto the olfactory pathway. Indeed, in the honey bee, activation of a single octopaminergic neuron, VUMmx1, with such innervation pattern, is sufficient to carry the reward signal for olfactory learning (Hammer, 1993). Surprisingly however, although all octopaminergic neurons in the VM cluster are targeted by the TDC2-Gal4 (Busch et al., 2009), using this driver with UAS-shibire ts1 , we found reward learning intact ( Figure 7A). This may be because the level UAS-shibire ts1 expression falls short of completely blocking the neuronal output. Alternatively, given that activation of the TDC2-Gal4-targeted neurons in fruit fly larvae reportedly substitutes for reward (Schroll et al., 2006), the VM cluster neurons may indeed carry a reward signal, but other octopaminergic neurons outside this cluster, left out by the TDC2-Gal4 driver (Busch et al., 2009) may redundantly do so. Either kind of argument could also explain the lack of effect on relief learning ( Figure 7B). Thus, although we find no evidence for a role for the octopaminergic system in relief learning, we refrain from excluding such a role. Still, given that the tbh M18 mutation affects reward learning, but not relief learning, these two forms of learning are to some extent dissociated in their genetic requirements.
Obviously, the question whether dopaminergic and octopaminergic systems are involved in relief learning remains open. Follow up studies should extend our neurogenetic approach to further tools. For example, dopamine biosynthesis can be specifically compromised in the fly nervous system using a tyrosine hydroxylase mutant in combination with a hypoderm-specific rescue construct (Hirsh et al., 2010). Also, for two different dopamine receptors, DAMB and dDA-1, loss of function mutations are available (Kim et al., 2007;Selcho et al., 2009). Notably, by means of the dDA-1 receptor loss of function mutant, the role of the dopaminergic system in reward learning was revealed (Kim et al., 2007;Selcho et al., 2009), which had been overlooked with the tools used in the present study. In addition, a pharmacological approach would be useful. Antagonists for the vertebrate D1 and D2 receptors have been successfully used in the fruit fly (Yellman et al., 1997;Seugnet et al., 2008) and other insects (Unoki et al., 2005(Unoki et al., , 2006Vergoz et al., 2007) (regarding the octopamine receptors: Unoki et al., 2005Unoki et al., , 2006Vergoz et al., 2007). Such pharmacological approach could be extended to other aminergic, as well as peptidergic systems and could also test for the effects of human psychotherapeuticals. The results of such studies may then guide subsequent analyses at the cellular level.
To summarize, while this study has shed no light on how relief learning works, it did show that relief learning works in a way neurogenetically different from both punishment learning and reward learning, likely at the level of the roles of aminergic neurons. Interestingly, at this level also punishment and reward learning are dissociated. However, all three kinds of learning also share genetic commons, for example with respect to the role of the synapsin gene, likely critical for neuronal plasticity (Godenschwege et al., 2004;Michels et al., 2005;Knapek et al., 2010;T. Niewalda, Universität Würzburg, personal communication). Thus, punishment-, relief-, and reward-learning may conceivably rely on common molecular mechanisms of memory trace formation, which however are triggered by experimentally dissociable reinforcement signals, and/ or operate in distinct neuronal circuits. This may be a message relevant also for analyses of relief learning in other experimental far as short-term punishment learning is concerned, there is so far no evidence for a role for the PAM cluster neurons (for middle-term punishment learning, see Aso et al., 2010). Nevertheless, targeting the remaining dopaminergic neuron clusters by the TH-Gal4 driver only partially impairs punishment learning (Schwaerzel et al., 2003;Aso et al., 2010; Figure 4A). Conceivably, the TH-Gal4 driver may leave out few dopaminergic neurons in clusters other than PAM; these may then carry a punishment signal, redundant to that carried by the TH-Gal4-targeted neurons. This scenario would readily accommodate Schroll et al.'s (2006) report that activity of the TH-Gal4-targeted neurons in larval fruit flies substitutes for punishment. The intact relief learning upon expressing UAS-shibire ts1 with TH-Gal4 can also be explained by this scenario. Alternatively, the level of shibire ts1 expression driven by TH-Gal4 may fall short of effectively blocking the neuronal output required for relief learning, and/or an additional, shibire ts1 -resistant neurotransmission mechanism may be employed in relief learning. Further, if punishment were to be signaled by a shock-induced increase in the activity of the TH-Gal4 neurons and relief was to be signaled by a decrease in their activity below the baseline at the shock offset, incomplete blockage of output from these neurons could partially impair punishment learning, while leaving relief learning intact. In face of these caveats, we find it too early to exclude any role of dopamine or of the TH-Gal4 neurons. What then is a safe minimal conclusion? Given that while punishment learning is partially impaired (Figure 4A) relief learning does not even tend to be impaired (Figure 4B), these two kinds of learning do differ in terms of whether and which role the TH-Gal4-covered neurons play. This does dissociate punishment and relief learning in terms of their underlying mechanisms.
Turning to the octopaminergic system, we confirmed Schwaerzel et al. (2003) in that the tbh M18 mutant with compromised octopamine biosynthesis is impaired in reward learning (Figures 6A,A′), but not in punishment learning (Figures 6B,B′). The effect on reward learning was however conditional on the kinds of odor used (Figures 6A,A′,A′′). Under the conditions that significantly impaired reward learning, we found relief learning intact ( Figure 6C). Although the tbh M18 mutant we used revealed no octopamine content in immunohistochemical and high pressure liquid chromatography (HPLC) analyses (Monastirioti et al., 1996), it may retain an amount of octopamine below the detection thresholds of these methods but sufficient to signal reward and/ or relief. Furthermore, HPLC analysis reveals a ∼10-fold increase in the amount of octopamine-precursor tyramine in this mutant (Monastirioti et al., 1996); this excessive tyramine may compensate for the lack of octopamine (Uzzan and Dudai, 1982).
As an additional approach, we blocked the output from a set of octopaminergic/tyraminergic neurons, expressing UAS-shibire ts1 with the TDC2-Gal4 driver; this impaired neither reward, nor relief learning (Figure 7). The TDC2-Gal4 driver targets, along with tyraminergic neurons, octopaminergic neurons in three paired and one unpaired neuron clusters (Busch et al., 2009). Among these, the unpaired "VM cluster" harbors octopaminergic neurons innervating on the one hand the subesophageal ganglion (SOG), and on the other hand the antennal lobes, mushroom bodies, and the lateral horn (Busch et al., 2009); such connectivity T. Niewalda, Y. Aso and M. Appel for comments on the manuscript. The authors are grateful to E. Münch for the generous support to Ayse Yarali during the startup phase of her PhD. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) via CRC-TR 58 Fear, Anxiety, Anxiety Disorders, and a Heisenberg Fellowship (to Bertram Gerber); Ayse Yarali was supported by the Boehringer Ingelheim Fond. Dedicated to our respective daughters. systems, including rodent (Rogan et al., 2005), monkey (Tobler et al., 2003;Belova et al., 2007;Matsumoto and Hikosaka, 2009), and man (Seymour et al., 2005;Andreatta et al., 2010).

acknowledgMents
The continuous support of the members of the Würzburg group, especially of M. Heisenberg, K. Oechsener and H.
Kaderschabek, is gratefully acknowledged. Special thanks to