The role of the striatum in social behavior

Where and how does the brain code reward during social behavior? Almost all elements of the brain's reward circuit are modulated during social behavior. The striatum in particular is activated by rewards in social situations. However, its role in social behavior is still poorly understood. Here, we attempt to review its participation in social behaviors of different species ranging from voles to humans. Human fMRI experiments show that the striatum is reliably active in relation to others' rewards, to reward inequity and also while learning about social agents. Social contact and rearing conditions have long-lasting effects on behavior, striatal anatomy and physiology in rodents and primates. The striatum also plays a critical role in pair-bond formation and maintenance in monogamous voles. We review recent findings from single neuron recordings showing that the striatum contains cells that link own reward to self or others' actions. These signals might be used to solve the agency-credit assignment problem: the question of whose action was responsible for the reward. Activity in the striatum has been hypothesized to integrate actions with rewards. The picture that emerges from this review is that the striatum is a general-purpose subcortical region capable of integrating social information into coding of social action and reward.


INTRODUCTION
The striatum is necessary for voluntary motor control. Research on its role in movement planning and execution uncovered its participation in cognition and reward processes. Rigorous experimentation demanded social isolation to properly study this neuronal circuit. However, action, rewards and cognition also occur in the company of conspecifics, in a social context. Social behaviors, those behaviors that occur in a social context, place an extra demand on cognition since others' behaviors are difficult to predict and they affect our own behavior. Therefore, to understand the properties of the striatum it is important to study it while the organism engages in social behavior. Recent studies highlight this brain structure during different social behaviors. Among these studies, we found that the striatum contains neurons that signal the social action that will result in own reward. We place these new findings within the context of previous findings on the known role of this area in movement and reward coding in the brain. The question that guides the review is as follows: "does the striatum serve a social function?" We conclude that the striatum is a general-purpose subcortical region capable of integrating and reflecting social information into its better known non-social functions.

ANATOMY AND NEUROPHYSIOLOGY OF THE STRIATUM
The striatum is the input module to the basal ganglia, a neuronal circuit necessary for voluntary movement control (Hikosaka et al., 2000). The striatum is composed of three nuclei: caudate, putamen, and ventral striatum. The latter contains the nucleus accumbens (NAcc). The caudate and putamen/ventral striatum are separated by the internal capsule, a white matter tract between brain cortex and brainstem.
The striatum has two main efferent pathways. The direct pathway is formed by axons of medium spiny neuron (MSN) expressing D1 receptors which mainly project to GABAergic neurons in the substantia nigra pars reticulata (SNr) (Parent et al., 1984;Gerfen et al., 1990;Kawaguchi et al., 1990;Chuhma et al., 2011). MSN that express D2 receptors mostly target the external segment of the globus pallidus (GPe) and form the indirect pathway (Parent et al., 1984;Gerfen et al., 1990;Kawaguchi et al., 1990;Chuhma et al., 2011). GABAeric neurons in GPe project to SNr and the internal segment of the globus pallidus (GPi) (Parent and Hazrati, 1995;Wilson, 1998). The SNr and GPi are the output nuclei of the basal ganglia.
The striatum contains many other cell types besides MSN, including cholinergic and fast-firing GABAergic interneurons (Tepper and Bolam, 2004). Cholinergic interneuron activity has a relationship to reward-predicting stimuli and reward and punishment (Apicella et al., 1991b;Ravel et al., 2003). These firing properties suggest that these neurons may play a role in learning (Schulz and Reynolds, 2013). Fast-firing interneurons are also involved in reward prediction error coding (Stalnaker et al., 2012). However, for brevity we will limit this review to MSN and refer to them as striatal neurons. Functionally, striatal neurons show motor and reward responses (Hikosaka et al., 2000). Functional and anatomical evidence led to the hypothesis that striatal activity forms a "limbic-motor" interface (Mogenson et al., 1980). Neurons in the striatum integrate information about expected reward with motor information to guide behavior (Hollerman et al., 1998;Hikosaka et al., 2000;Schultz, 2000;Schultz and Dickinson, 2000;Goldstein et al., 2012). We review MSN neurophysiological responses to action and reward in the next section.

STRIATUM NEUROPHYSIOLOGY: ACTION AND REWARD
The striatum contains neuronal activity related to movements, rewards and the conjunction of both movement and reward. Striatal neurons show activity related to the preparation, initiation and execution of movements (Hollerman et al., 2000). These neurons are also active before overt goal-directed movements (Schultz and Romo, 1988;Romo et al., 1992; Figure 2A). Some of these neurons are exclusively active during selfinitiated movements, whilst other neurons are only active during instructed trials, and some others do not discriminate between self-initiated and instructed movements. In addition to this, striatal neurons also show reward related activity. Neuronal activity in the striatum is modulated by reward expectation independent of the movement necessary to obtain it (Hikosaka et al., 1989b;Apicella et al., 1991aApicella et al., , 1992Schultz et al., 1992). Striatal neurons that discharge after reward delivery do so in two main modes: phasic or tonic. Phasic responses usually have short latencies (<50 ms) and are relatively short lived-median duration: 500 ms (Apicella et al., 1991b;Hollerman et al., 1998;Lau and Glimcher, 2007; Figure 2B). By contrast, tonic responses have longer latencies and can last as long as the intertrial interval, i.e., up to 3 s (Apicella et al., 1991b;Hollerman et al., 1998;Histed et al., 2009). Furthermore, there are striatal neurons coding which action is associated to reward and which action is not (Hollerman et al., 1998;Kawagoe et al., 1998; Figure 2C). This coding is independent of the stimuli indicating the action required to obtain reward . Reward-predicting cues modulate the activity of caudate neurons (Kawagoe et al., 1998;Lauwereyns et al., 2002). After saccade execution up to 50% of neurons encode only the action, while around 20% of recorded neurons encode whether the action was rewarded or not and close to 40% of neurons are modulated by both movement and reward (Kobayashi et al., 2006;Lau and Glimcher, 2007). Together, these data suggest that striatal neurons response is modulated by action and reward. These responses are not limited to the moment of movement or reward receipt; rather they are present during cue and during reward expectation.
Most striatal neurons that respond during task performance show higher activity when a reward is expected compared to when no reward is expected (Hollerman et al., 1998). However, there are also neurons that are active preferentially after the monkey is instructed to not move to obtain reward (Hollerman et al., 1998). These data suggest that striatal neurons flexibly encode the type of action that will produce reward.
An action-value neuron tracks the value of one action, independent of the performed action. By tracking the value of different candidate actions and comparing their values an organism can decide to exploit the most valuable action or to explore the value of other actions. Samejima et al. (2005) were the first group to show that striatal neurons code action-value ( Figure 2D). Neuronal activity tracked over time the value of performing one action regardless of the animal's choice. Later, Lau and Glimcher (2008) trained macaques to perform a matching task. In this task rewards are distributed probabilistically between two options and subjects match the frequency with which they choose one action with its reward probability (Herrnstein, 1961). This task opens the possibility of investigating the presence of actionvalue and chosen-value (i.e., value of the chosen action) neurons. Indeed, Lau found that caudate neurons code both action-value and chosen-value. These signals can inform decision making mechanisms.

FIGURE 2 | Action and reward coding by striatal neurons. (A)
Example striatal neuron active before movement (go) and silent before no-movement (no-go). Based on Schultz and Romo (1988), reproduced with permission. (B) Example striatal neurons coding reward. First row depicts a neuron with phasic active after juice reward delivery independent of the action to obtain reward. Second row depicts a neuron with tonic activity after juice reward delivery. Third row shows a neuron with tonic activity after no reward is delivered. Based on Hollerman et al. (1998), reproduced with permission. (C) Example caudate neuron coding the conjunction of action and reward. This neuron is active during the presentation of a cue indicating the saccade necessary to complete the trial if the trial will be rewarded (rewarded direction is highlighted by a bulls eye). R, right; U, up; L, left; D, down. Polar plots show the average response for each cue and direction. Based on Kawagoe et al. (1998), reproduced with permission. (D) (Top) Depiction of the probability of larger rewards associated with left or right actions on each condition block. Colored numbers refer to the probability associated with left-right actions. (Bottom) Example striatal neuron coding right action value. Based on Samejima et al. (2005), reproduced with permission.
In conclusion, the striatum contains neuronal activity related to movements, rewards and the conjunction of both movement and reward. These neuronal representations serve many functions like goal directed movements and decision making.

SOCIAL REWARD
Rewards are events or objects that elicit learning, elicit approach behavior and produce positive emotions (Schultz, 2004). Social rewards are just like any other rewards with the particularity that they occur in a social context. We propose a simple classification of social rewards using two axes: who acts and who receives reward. For example, observing others is a social reward (Anderson, 1998;Deaner et al., 2005) where the individual acts (observes) and receives reward (the social stimuli). Pro-social behavior refers to a preference to increase the welfare of others (Fehr and Camerer, 2007). Depending on individual social preferences these choices can be rewarding by themselves, e.g., in charitable giving (Harbaugh et al., 2007). Vicarious reward refers to the situation when observing someone else receive reward is rewarding in itself (Mobbs et al., 2009). Finally, in several social rewards the recipient is the individual and the actor is someone else. Examples of other's actions that are rewarding include praise and pleasant touch (Francis et al., 1999;Olausson et al., 2002;Rolls et al., 2008;Korn et al., 2012). Building a desired reputation is also considered a social reward; critically, reputation depends on other's perception of the individual, not on the individual's perception of herself (Izuma et al., 2008;Izuma, 2012). Receiving gifts or social actions that result in own reward can also be considered as other-generated social rewards. Social inclusion can be considered a social reward and facilitates learning (Eger et al., 2013). Although this classification might further our understanding of the neuronal underpinnings of social rewards, further experimentation might validate its use.

Observing others
Fuelling a brain entails a huge cost, and the ratio of brain size to body size is larger in primates than any other Order in the animal kingdom (Laughlin and Sejnowski, 2003;Dunbar and Shultz, 2007). The huge cost of fuelling a large brain begs the question what is the benefit of such large brains? Byrne and Whitten suggest that only a costly primate brain can deal with the complexity of primate social living, the so-called social brain hypothesis (Dunbar and Shultz, 2007). The primate brain has a great deal of specializations to acquire information about conspecifics. Neurons in the ventral visual pathway respond selectively to biological motion, gaze direction, body parts and faces (Perrett et al., 1984(Perrett et al., , 1985aGross, 1992;Oram and Perrett, 1996;Tsao et al., 2006). Social information arrives through all senses. For example, the superior temporal polysensory area contains neurons that selectively respond to conspecific calls (Perrodin et al., 2011) and local field potentials in the temporal lobe are modulated by face or call familiarity (Báez-Mendoza and Hoffman, 2009). The volume of gray matter correlates with the size of the individual's troop in mid superior temporal sulcus, inferotemporal cortex, rostral superior temporal sulcus, amygdala-all areas involved in perceiving individuals-and rostral PFC in macaques (Sallet et al., 2011). These findings suggest that the brain has specialized structures dealing with the acquisition and representation of information about conspecifics.
If the brain has specialized structures for the acquisition and representation of information about conspecifics, then acquiring this information must be valuable for the individual. In a clever paradigm Deaner and colleagues measured the value of acquiring access to observe pictures of conspecifics (Deaner et al., 2005). They pitted a constant amount of juice against a variable amount of juice plus the opportunity to observe the picture of a conspecific. The monkeys made their choices depending on the amount of juice offered along with the picture. If the monkey chose a smaller amount of juice plus the opportunity to watch an image, it strongly indicated that the monkey valued watching the image equivalent to the difference between offered juice volumes. For example, a monkey that likes watching a high-ranking monkey will choose watching the image and receiving 0.8 ml of juice vs. only receiving 1ml of juice. When the monkey chose with equal probability between the two alternatives then the difference in offered juice volume is the subjective value for observing the image, the so-called point of subjective equivalence. Researchers using this method can measure the subjective value of varying juice magnitudes (fluid value) and that of social images (image value). Another advantage of this method is that it facilitates the comparison of different goods (Glimcher, 2010), e.g., observing female perinea or a subordinate male face. Using this method Deaner and colleagues reported that male monkeys valued highly looking at dominant monkeys and the perinea of female monkeys compared to looking at subordinate monkeys or a non-salient visual stimulus (Deaner et al., 2005). Neuronal activity during this task has been measured in different brain regions. LIP neuronal activity correlates with both image value and fluid value when the monkeys chose to look at the image (Klein et al., 2008). OFC neurons showed distinct coding of reward magnitude or image value, but not both (Watson and Platt, 2012). Thus, these results suggest that OFC neurons do not code reward on a single currency (e.g., in juice volume), rather as different variables, as shown before (O'Neill and Schultz, 2010). Intriguingly, these animals strongly preferred looking at pictures of subordinates, a finding at odds with previously reported strong preferences for dominant faces in the same paradigm (Deaner and Platt, 2003;Deaner et al., 2005;Shepherd et al., 2006;Klein et al., 2008); but this result suggests that the encoding of social reward reflects subjective preferences.
Neurons in the anterior striatum showed an interesting response pattern in the same paradigm (Klein and Platt, 2013). The large majority of reward responsive neurons were selective for reward type. These neurons also showed a regional pattern: those in the caudate were more strongly modulated by social reward, conversely, putamen neurons were more strongly modulated by liquid reward. This pattern can be alternatively explained by simple saccade direction coding because caudate neurons are tuned for saccade direction, particularly for contralateral saccades (Hikosaka et al., 1989a).
Humans also value observing other humans; and among different targets we value highly observing our romantic partners and mothers Zeki, 2000, 2004;Aron, 2005;Acevedo et al., 2012). Observing pictures of a partner elicits higher blood oxygenated level-dependant (BOLD) activity in caudate/putamen and VTA along with cingulate and insular cortex compared to viewing pictures of friends matched for age, gender and length-offriendship as their partners (Figure 3, green squares). This effect is present either when the relationship is recent (Aron, 2005) or when has been long established (Acevedo et al., 2012). These BOLD responses are a neural correlate of the value of observing a loved one.
In summary, acquiring social information, in particular looking at conspecifics, is valuable for the individual (Deaner et al., 2005). The primate temporal lobe contains regions whose function includes the processing of social information (Tsao et al., 2006;Perrodin et al., 2011). Both social information and value converge in the striatum, opening the possibility of social reward coding in this brain region-as shown by Klein and Platt (2013).

Other social rewards
A positive reputation is a social reward as it can elicit learning, approach behavior and positive emotions. This is particularly evident in indirect reciprocity: a donor who helps a recipient in public might receive in the future a donation from someone that FIGURE 3 | fMRI studies of social behaviors in which the striatum is active. Peak activation coordinates in the striatum of the fMRI studies cited in this review color-coded for each section as illustrated in the legend. Studies using a region of interest analysis strategy were not included in this image. These striatal responses are compatible with a general activation in response to social behaviors, including social rewards. A functional subdivisions according to types of social rewards need to await further experiments. Studies aggregated in "Other social rewards": (Rilling et al., 2002;Moll et al., 2006;Izuma et al., 2008;Mobbs et al., 2009;Acevedo et al., 2012;Fareri et al., 2012;Korn et al., 2012). Studies clustered in "Observing others": Zeki, 2000, 2004;Aron, 2005;Acevedo et al., 2012). Studies in "Learning about others": (Delgado et al., 2005;King-Casas et al., 2005;Baumgartner et al., 2008;Burke et al., 2010;Phan et al., 2010;Xiang et al., 2012;Fouragnan et al., 2013). Studies in "Reward inequity": (Moll et al., 2006;Fliessbach et al., 2007;Hsu et al., 2008;Tricomi et al., 2010). has observed its "altruistic" behavior (Nowak, 2006). Obtaining a good reputation from others increases BOLD activity in the human striatum (Izuma et al., 2008;Korn et al., 2012) (Figure 3, red squares), but not in individuals diagnosed with autism (Izuma et al., 2011). This difference is likely due to insensitivity to social rewards in autistics (Dawson et al., 1998;Schultz, 2005).
Other social rewards that also increase BOLD activity in the striatum include charitable donations (Moll et al., 2006;Harbaugh et al., 2007) and observing someone else succeed (Mobbs et al., 2009). Vicarious reward is also modulated by the closeness of the recipient: there is higher striatal BOLD activity when sharing a monetary gain with close friends compared to sharing with strangers, and sharing with the latter is associated with higher activations compared to when the "recipient" is a computer (Fareri et al., 2012). This social vs. non-social effect has also been observed when cooperating with a human partner vs. cooperating with a computer (Rilling et al., 2002). The peak activations from studies cited in this section are illustrated with red squares in Figure 3. Taken together, these data suggest that social rewards are associated with BOLD activity in the striatum and can be modulated by the social context.

LEARNING ABOUT SOCIAL AGENTS
Social life is rife with opportunities to learn about others. For example, we learn to trust or mistrust other people. The trust game is an economic game that measures how trust is built between two individuals. During the trust game the investor receives an initial endowment that she can choose to invest in a trustee, the trustee receives three times the investment and decides how much of the gains to return to the investor. When this game is played iteratively the investor learns to trust (or mistrust) the trustee and vice versa. Thus, both players develop a model of the other's reputation (King-Casas et al., 2005). To build a trust model investors use previous behavior to predict future behavior. If there is a deviation from what is predicteda reward prediction error-then the model is updated. Activity in dorsal striatum mirrored prediction errors during the repayment phase (Figure 3, yellow squares; King-Casas et al., 2005). When an investor returned more than what a trustee expected the trustee reciprocated by increasing her investment. During the investment phase activity increased in middle cingulate cortex of the investor and also in ACC of the trustee. Activity in both areas correlated with activity in the trustee's caudate; most importantly the peak of these correlations shifted from the repayment epoch to the investment epoch (King-Casas et al., 2005). These results suggest that generating someone else's reputation engages a reinforcement learning algorithm that uses prediction errors and the latter are reflected in striatal BOLD activity.
Prior information about someone's trustworthiness sets the initial state of the trust model. This initial bias can be overruled by observing someone's willingness to reciprocate trust (Figure 3, yellow squares; Delgado et al., 2005;Phan et al., 2010;Fouragnan et al., 2013). Prior information diminishes the magnitude of the reward prediction error signal in the striatum during the repayment phase (Fouragnan et al., 2013). Following advice to solve a task (a type of prior information) generates an outcome-bonus in a version of the Iowa gambling task (Biele et al., 2011). These studies suggest that prior information not only sets the initial state of the trust model, but it has a long lasting effect on its computation.
Depth-of-thought refers to a person's inference about someone else's intention and to how many iterations of this inference they perform (Dixit and Skeath, 2004). Players in the trust game solve the game with different levels of depth-of-thought (Xiang et al., 2012). If the investor makes no inference about the trustee's intention to reciprocate, then a prediction error occurs when the trustee does not reciprocate trust. This prediction error is reflected in increased striatal activity (Figure 3, yellow squares; Xiang et al., 2012). If the investor infers that he plays this game against a trustee that infers what he will offer, then the prediction error occurs when the investor submits its investment to the trustee; again, the striatum reflects this prediction error (Xiang et al., 2012). Thus, the computation of prediction errors, during the trust game, depends on depth-of-thought.
Oxytocin, a neuropeptide, also modifies how we update the trust model. Intranasal administration of this neuropeptide increases the rate of trust decisions compared to placebo, even after repeated violations of trust (Kosfeld et al., 2005). Correspondingly, people that received oxytocin showed a smaller negative prediction error signal in the striatum after repeated violations of trust (Baumgartner et al., 2008). Although the distribution of oxytocin receptors in the human brain is unknown, one possible locus where oxytocin modifies trust is in the striatum (see section "Involvement of the Striatum in Pair-Bond Formation and Maintenance" below).
Social life is also rife with opportunities to learn from others. Observational learning is another social cognitive process that can be modeled with reinforcement learning. Burke and colleagues hypothesized that observational learning is composed of two prediction errors, an action observation prediction error and an outcome observation prediction error (Burke et al., 2010). In their task two individuals took turns to learn which one of two decks of cards provided a better outcome. In order to disentangle individual learning from imitation learning and observational learning the individuals performed the task in three conditions: other's actions and outcomes were private, only the other's outcome was visible and both the partner's action and outcome were observable. Burke and colleagues found a correlate for action observation prediction error in dorsolateral prefrontal cortex (DLPFC) and for outcome observation in ventromedial prefrontal cortex (VMPFC) and ventral striatum (Figure 3, yellow squares). Specifically, VMPFC activity correlated positively and ventral striatum correlated negatively with the outcome observation prediction error (Burke et al., 2010). Thus, they found neural correlates of observational learning in frontal cortex and ventral striatum.
In conclusion, the neuronal mechanism of learning to trust someone else or from someone else is based on a reinforcement learning algorithm. This algorithm makes predictions about other's behavior and prediction errors help to update the model. The type of predictions depends on depth-of-thought and prior information modifies the rate to which the model is updated. These learning signals are reflected in changes in BOLD activity in the striatum.

INEQUITY AND FAIRNESS CONSIDERATIONS
Inequity arises from an asymmetric distribution of resources between two or more conspecifics. Classic economics assumes that agents always intend to maximize their own benefit regardless of other's wellbeing (Von Neumann and Morgenstern, 1947). However, the difference in resource distribution can have a negative impact on the utility and subjective value of an object (Loewenstein et al., 1989;Fehr and Schmidt, 1999). The disutility from an unequal outcome depends on who obtains more resources. When the agent receives more than the conspecific, we speak of advantageous inequity. Conversely, when the agent receives less than the conspecific we speak of disadvantageous inequity.
Interestingly, humans choose to lower their own payoff so that inequity is smaller, a so-called pro-social behavior. For example, when people donate money to charity they diminish their wealth so that others can be better off (Harbaugh et al., 2007). Disadvantageous inequity, having less than others, can have a negative effect in behavior. For example, progressive taxation is designed to reduce income inequality by implementing higher taxes on higher earners (Wilkinson and Pickett, 2010). An influential hypothesis of how people react to inequity (Fehr and Schmidt, 1999) posits that unequal payoffs are aversive, therefore agents try to minimize them. This theory has its roots on the idea that one can estimate social utility functions that specify level of satisfaction as a function of outcome to self and other (Loewenstein et al., 1989). Other example theories where social utility functions help to explain human preferences that deviate from pure maximization include "Equity, Reciprocity, and Competition" by Bolton and Ockenfels (Bolton and Ockenfels, 2000) and "Fairness" by Rabin (Rabin, 1993).
One experimental task commonly used to measure advantageous inequity aversion is the dictator game (Forsythe et al., 1994). In this task the person playing as dictator receives an initial financial endowment and decides to give an amount of the endowment to a receiver. The neoclassical assumption of rational behavior predicts that dictators will not give away anything of their payoff; however, dictators usually give away between 5 and 25% of their initial endowment (Forsythe et al., 1994). It is assumed that the proportion of money given to the receiver is a measure of the disutility for the dictator of having more than the other (Gibbons, 1992;Camerer et al., 2004). To measure disadvantageous inequity aversion scientists use the ultimatum game (Güth et al., 1982). In this game the proposer receives an endowment and proposes a split to the responder, just as in the dictator game. The responder then either rejects the split, thereby forgoing all monies, or accepts it. Neoclassical economic models predict that the responder will accept any split that results in him having more than nothing. However, responders tend to only accept splits where they obtain more than 30% of the initial endowment (Güth et al., 1982). The responder's minimum acceptable offer is the percentage of the initial endowment that he is willing to accept 50% of the time (Camerer et al., 2004). This last parameter is directly proportional to the degree of disadvantageous inequity aversion.
When subjects play the dictator game as dictators the ventral striatum is active when deciding to donate money to a charity (Moll et al., 2006;Harbaugh et al., 2007) and when enacting the decision on how to distribute a good between two charitable possibilities (Hsu et al., 2008). The relative wealth of the donor and the receiver also matter to how the brain responds to these decisions. After one of two volunteers is made betteroff than the other volunteer, the worse-off volunteers ranked receiving money much more appealing than their better-off counterparts (Tricomi et al., 2010). Accordingly, ventral striatum and VMPFC show higher activity during transfers to self than to the other. Better-off volunteers found more appealing that the other received money than themselves. Ventral striatum and VMPFC reflected this preference: both brain regions showed higher activity during transfers to other than to self (Tricomi et al., 2010). In a related experiment, Fliessbach and colleagues paid in different ratios to pairs of volunteers for correctly completing a simple task while they were in an MRI scanner (Fliessbach et al., 2007).
Ventral striatum activity was positively correlated with the ratio of the payoff regardless of the actual personal monetary payoff. Furthermore, striatal activity was lowest during own errors and highest during other's errors. Such a social contrast has been confirmed, e.g. activity in ventral striatum is higher after winning a lottery in public vs. winning the same amount in private (Bault et al., 2011). The peak activations from the fMRI studies cited in this section are illustrated in Figure 3 with pink squares. Thus, these data suggest that the striatum reflects the difference between own and other's rewards.

AGENCY CODING IN STRIATAL NEURONS
Reciprocal social interactions provide the opportunity to increase fitness through repeated exchanges with a particular individual, although one of its by-products is reward inequality. For this interaction to be successful several mental processes need to take place (Axelrod and Hamilton, 1981): both participants need to identify their partner, assign agency for the current outcome, decide how to act depending on the series of events and keep a tally of the recent exchanges. Without partner identification reciprocity is virtually impossible (unless all interactions take place with a uniform population) (Dawkins, 2006). Without a memory trace of the outcomes of the recent exchanges, participants might see themselves locked onto a "one-way street" reciprocal exchange. Agency assignment allows the individual to assign credit (or blame) for a shared outcome (Wolpert et al., 2003;Tomlin et al., 2006). With precise agency assignment in the memory of recent exchanges individuals can avoid free riders (Dawkins, 2006). Therefore, agency assignment is a trait that might have been favored by evolution in social animals.
Another way to frame the problem of agency assignment is to think of it as the "social" extension of the credit-assignment problem ( Figure 4A). Let us revise what the credit-assignment problem is. In order for an action to be reinforced, it needs to be selected from various actions made between the operant and the reinforce. The organism needs to assign credit to the operant, and not assign (or subtract) credit to other non-contingent actions (Sutton and Barto, 1998). This is done by changing the weights of different eligibility traces, or memories of past actions (Sutton and Barto, 1998). The agency credit assignment problem applies when more than one actor can generate a reward (Tomlin et al., 2006). Thus, the agency credit assignment problem can be cast by paraphrasing Sutton and Barto (1998): how do you distribute credit for success among the many actors that may have been involved in producing it?
The striatum is well-suited for integrating social action (an action made in a social context) and reward given its anatomical connections and known role in action and reward coding. We recorded striatal neuron's activity while an animal performed a reward giving task with a conspecific in order to investigate the interaction of social action and reward (Báez-Mendoza et al., 2013). The reward giving task is an extension of the paradigm described by Hollerman et al. (1998) to encompass several social dimensions. In the original paradigm the activity of striatal neurons was tested for relationships to movement vs. no-movement and reward vs. no-reward. In our task we tested if striatal neuron activity was related to own vs. conspecific's movement and own and/or conspecific's reward. During the experiment two monkeys sat opposite each other across a table with a touchscreen. Both animals took turns to complete the following task: the actor held a resting key with its right arm, the computer presented two simultaneous cues predicting reward (circle) or no reward (square) separately for each animal (Figure 4B), followed by a blue go signal eliciting the actor's arm movement for touching it (Figure 4B). After a brief delay, the computer delivered reward to the actor and then to the conspecific. We were able to probe the neuronal correlates of agency and reward coding by varying reward presence and absence for both players and who performed the task. This simple test allowed us to test the neuronal mechanisms of a complex cognitive process.
Our first concern was whether the monkeys were sensitive to the social nature of the task. Reaction times and eye fixation analysis suggested that the monkeys were sensitive to reward received by themselves and their conspecific. Importantly, the animals were less likely to move whenever it was the conspecific's turn, suggesting that they had an understanding of the turn-taking structure of the task. This is particularly relevant for agency credit assignment because during "own turns" the animal should have assigned credit to itself for own reward and during "conspecific's turns" to the conspecific.
Own reward modulated the activity of striatal neurons, as previously observed (Hikosaka et al., 1989b;Apicella et al., 1991a); but few striatal neurons responded to conspecific's reward. Interestingly, a sub-population of neurons differentiated between social actors, with some neurons firing more strongly during one of the actor's turn. Given these types of neuronal modulations, we then looked at the neurons' sensitivity to whose turn it was. A large number of own reward coding neurons reflected the social actor: some neurons responded to own reward only when the recorded animal acted ( Figure 4C) whereas a different sub-population responded to own reward when the conspecific acted ( Figure 4D). We tested a series of alternative hypothesis for these data including: eye position, response inhibition, temporal discounting and reward cost, none of which were a satisfactory explanation of the data.
We also found a collection of neurons that reflected whose trial it was. These neurons fired more strongly during own trials than conspecific's trials, or vice versa: conspecific > own trials. These neurons reflected social action as they differentiated between actors. To test whether these neurons truly reflected a "social" component of the task we measured their activity while the animal performed the task with the conspecific or a nonsocial juice recipient (an empty bucket). If a neuron is modulated by the social component of the task, then it should stop differentiating between actors during the "bucket test." This test for social-specific coding indicated that close to 50% of social actor coding-neurons were indeed modulated by the social environment. This is, to our knowledge, the first direct test of a neuronal correlate of social behavior in single neurons.
These experiments showed that there are multiple signals in the striatum relevant for social interactions. The data suggests an extension of the known role of the striatum in movement and reward processing into the social domain. Several questions arise from these findings.

FIGURE 4 | Agency credit assignment cartoon and striatal neurons coding social action and own reward. (A)
Once the monkey receives a banana it needs to know which action produced reward to assign credit. The action can be its own (solid lines) or someone else's (dashed lines). Many actions take place before reward is delivered, therefore looking at a memory of each action or eligibility trace (brown arrows) can solve the agency credit assignment problem. (B) Task sequence for the actor: shape of conditioned cue predicted absence or presence of reward for each animal. Appearance of a subsequent blue go signal was followed by key release, stimulus touch and reward for actor, and later for conspecific. After the ITI the monkeys switched roles as actor and passive. (C) Single striatal neuron coding own action and own reward. Note the higher neuronal activity during own action and own reward compared to own reward absence and conspecific's actions. (D) Single striatal neuron coding social action and own reward. This neuron is active during conspecific's actions that will result in own reward, a complement to the neuron shown in (A). Monkey picture by smerikal (Flickr), reproduced with permission. Panels How are these signals formed? One possible mechanism is as follows: Striatal neurons receive biological motion information either directly from area STP (Oram and Perrett, 1996) or indirectly via parietal lobe (Cavada and Goldman-Rakic, 1991) while simultaneously receiving reward-related information from dopaminergic neurons and other reward-related areas (Haber and Knutson, 2010, see also Figure 1). Converging inputs and local interactions (Chuhma et al., 2011) are also well-suited to combine information about other's actions and own reward. Future experiments will test and measure the formation of agency and reward conjoint coding in the population of striatal neurons.
Another issue is: how are these signals used? We hypothesize that this neuronal signal may help assign, and maintain, credit to a social agent when receiving reward in a social context. Solving this problem is necessary for successful interactions. It is possible the striatum provides a signal to distribute credit for reward among the many actors that may have been involved in producing it. One key experiment would test the individual-specificity of this signal: is the signal specific for one individual or it only discriminates between own action and "other's" actions? Such a fine grained signal would aid in discriminating who is a better partner and who is not.

SOCIAL CONTACT AND STRIATAL FUNCTION
The striatum is involved in other social behaviors besides social action, social reward and reward inequity. Social isolation and social defeat compromise the normal function of the striatum. These effects highlight the interplay between normal social contact and striatal function. Social isolation has long-lasting effects in behavior, neuronal anatomy and neurochemistry. For example, social deprivation in the first year of life of macaques is related to abnormal social behaviors including fearfulness, withdrawal, lack of play, apathy, indifference to external stimuli, deficiencies in communication and aggression (Martin et al., 1991). Macaques reared in social deprivation show decreased numbers of caudate/putamen neurons reactive to substance P, tyrosine hydroxylase (TH), leucine-enkephaline, and calbindin; in contrast, the number of somatostatin interneurons did not differ to normally-reared conspecifics. TH staining was reduced in SNc but neuron numbers were stable. Other subcortical regions were unaffected, including the NAcc, amygdala and BNST (Martin et al., 1991). Further characterization of the behavioral, anatomical and neurochemical effects of social isolation have been carried out in rodents.
Social isolation leaves consistent behavioral effects on rodents. These include hyper-reactivity to novel environments, a reduction in the pre-pulse inhibition of the acoustic startle, and an increase in aggressive behavior (reviewed by Fone and Porkess, 2008). Also, studies of the neuroanatomy of isolates' brains describe changes in cortical and subcortical neuronal circuits. For example, after social isolation rats showed decreased dendritic spine density in prefrontal cortex and hippocampus compared to socially-housed littermates (Silva-Gomez et al., 2003). There are several reports on differences in neurotransmitter systems, for a systematic review see (Fone and Porkess, 2008). Of particular relevance to this review, the dopaminergic system of socially isolated rats is different to that of socially-housed animals.
Although socially isolated rats show normal basal levels of extracellular dopamine (DA) in the ventral striatum, systemic administration of d-amphetamine produces a significant increase in DA release compared to socially-reared rats (Wilkinson et al., 1994;Hall et al., 1999). Furthermore, isolation-reared rats show an increase in DA turnover and in hyper-locomotion induced by d-amphetamine (Hall et al., 1998). Injections of cocaine increase DA efflux in ventral striatum, an effect potentiated by isolation rearing (Howes et al., 2000). Intriguingly, isolates acquire faster operant responding to obtain low doses of cocaine but their acquisition is slower for higher doses compared to sociallyhoused rats (Howes et al., 2000). Deficits in pre-pulse inhibition of the acoustic startle in socially-isolated rats are reversed by administration of the D2 receptor antagonist raclopride (Geyer et al., 1993). DA depletion in ventral striatum after administration of 6-hydroxydopamine also facilitates pre-pulse inhibition in socially-isolated rats (Powell et al., 2003). Interestingly, basal levels of extracellular DA in ventral striatum do not differ between socially-isolated and socially-reared rats (Wilkinson et al., 1994;Hall et al., 1999;Howes et al., 2000). These results suggest that basal mesolimbic DA is unaffected by social isolation, rather the ventral striatum is "hypersensitive" to events that naturally trigger DA release.
One candidate mechanism for the hypersensitive ventral striatum of socially-isolated rats is a difference in receptor levels. Yet some groups report no changes in D1 or D2 receptor density or affinity in striatum (Bardo and Hammer, 1991;Del Arco et al., 2004); while others report an increase in D2 binding (Djouma et al., 2006). Changes in housing condition, however, modify the levels of D2 receptors in the monkey striatum (Morgan et al., 2002). Specifically, after monkeys were socially housed, dominant monkeys had higher levels of D2 receptors in striatum compared to when they were housed individually and to subordinates. Interestingly, subordinates consumed more and worked more for intravenous injections of cocaine than dominant monkeys (Morgan et al., 2002). This finding is further supported by a negative correlation between the baseline levels of D2 receptors and the rate of cocaine self-administration and a decrease in D2 receptor levels with chronic cocaine use (Nader et al., 2006). Thus, these results suggest that D2 receptor density can be modified by changes in the social environment.
Changes in social hierarchy result in winners and losers: lower ranking individuals were usually defeated by their conspecifics and lost their rank. After losing one or more encounters with a conspecific, mesostriatal transmission is modified in the defeated individual. Tidey and Miczek (1996) reported that rats that were defeated by a conspecific, showed higher concentrations of extracellular DA in ventral striatum and prefrontal cortex during a social encounter with a dominant rat compared to baseline. If rats remained isolated after being defeated, the number of striatal dopamine transporter (DAT) binding sites was reduced, while there were no changes in DAT in animals that returned to the familiar group (Isovich et al., 2001). A potential role of levels of DAT in regulation of social behavior is suggested by a report of DAT knockout mice which exhibited increased rates of reactivity and aggression following mild social contact (Rodriguiz et al., 2004). Mice who experienced chronic social defeat avoid making contact with conspecifics and show increased levels of brain derived neurotrophic factor (BDNF) in the NAcc up to 4 weeks after the last defeat (Berton et al., 2006). BDNF potentiates DA release in the NAcc by acting in pre-and post-synaptic sites (Russo and Nestler, 2013). The major source of BDNF in NAcc is dopaminergic neurons in VTA. BDNF deletion in these cells of chronically-defeated mice results in an increase in social contact, suggesting that BDNF plays a key role in the maintenance of the social defeat phenotype (Berton et al., 2006). These selected studies highlight that mesolimbic dopaminergic transmission is modified following acute or chronic social defeats.
In conclusion there are behavioral, anatomical and neurochemical consequences of social isolation. There is a marked reduction in the number of striatal interneurons, but basal levels of extracellular DA remain unchanged. There is no consensus whether there are changes in DA receptor levels in the striatum, but other signaling systems (BDNF) and molecular mechanisms (changes in DAT) are involved. This snapshot of studies on the relationship between social housing conditions, behavior and basal ganglia function suggest that this is not a simple relationship. Notwithstanding, it can be concluded that social isolation and social defeat result in changes in neurotransmission to the mesolimbic circuit.

INVOLVEMENT OF THE STRIATUM IN PAIR-BOND FORMATION AND MAINTENANCE
Sex is a primary reward and it is the basis of pair-bond formation in voles. The striatum is part of the neuronal circuitry underlying a remarkable pair-bond formation in which both partners remain monogamous. It is important to note that the role of the striatum extends beyond that of movement and reward. Studies on vole pair formation provide an interesting example of the interaction between social behavior and striatal function.
There are two similar species in the same genus: one of which is monogamous and the other promiscuous. Prairie voles (Microtus ochrogaster) form life-long bonds with their first mate, remain monogamous and live in burrows with extended families; meadow voles (Microtus pennsylvanicus), in contrast, are a promiscuous species often living in solitary burrows (Insel, 2010). This natural dissociation in pair formation provides the opportunity to tap into the neurobiology of social behavior.
The interplay of oxytocin, arginine-vasopressin and DA play a pivotal role in pair formation in voles. Administration of haloperidol-an unselective DA inverse agonist-in male prairie voles' NAcc prevents partner preference, whilst stimulating D2-like receptors in caudate-putamen induces partner preference in the absence of mating (Aragona et al., 2003. Conversely, DA D1-like receptor activation prevents pair-bond formation . This mechanism is similar in females, since D2-like receptor stimulation induces partner preference whereas administration of a D1-like agonist had no effect (Wang et al., 1999). Vasopressin V1a receptor gene transfer into the ventral pallidum of polygamous meadow voles is sufficient to induce pair-bond-like behavior after mating (Lim et al., 2004b). Similarly, overexpression of oxytocin receptor in NAcc facilitated partner preference in female prairie voles but has no effect in parental care, nor any effect on female meadow voles (Ross et al., 2009). Prairie voles have a high density of oxytocin-receptors in the NAcc and of vasopressin V1a receptors in the ventral pallidum compared to meadow voles (Insel and Shapiro, 1992;Hammock and Young, 2006). Interestingly, oxytocin-receptors are bound by oxytocin, and with lower affinity, vasopressin (Gimpl and Fahrenholz, 2001). Interestingly, there are no differences in the distribution of D1-like and D2-like receptors in the striatum between these two species (Lim et al., 2004a). Thus, these results suggest that the differential distribution of oxytocin and vasopressin receptors is responsible for pair-bond formation. In conclusion, pair-bond formation is modulated by the interaction of oxytocin, vasopressin and DA in NAcc neurons as well as the distribution of oxytocin and vasopressin V1a receptors.
The role of oxytocin and vasopressin in social recognition is supported further by the absence of habituation to conspecifics in oxytocin and V1a-R knockout mice (Ferguson et al., 2000;Bielsky et al., 2004). Oxytocin knockout mice "recover" social habituation after infusion of oxytocin agonists in central amygdala (Ferguson et al., 2001). Similarly, local infusion of V1a-R antagonists in lateral septum of rats inhibits habituation to conspecifics (Everts and Koolhaas, 1999). Thus, both oxytocin and vasopressin regulate social recognition. The endogenous opioid system is another neuronal mechanism that may play a role in pair-bond formation. Mu-opioid receptor (MOR) activation modulates partner preference in female prairie voles (Burkett et al., 2011). MOR density is striatal region specific, thus this effect is probably mediated by specific striatal regions (Resendez et al., 2013). MORs within the dorsal striatum mediate partner preference formation via impairment of mating, whereas receptors in NAcc appear to mediate pair bond formation through the positive hedonics associated with mating (Resendez et al., 2013). Interestingly, monogamous voles show higher MOR density in forebrain including the caudate-putamen and NAcc than the closely-related polygamous voles (Inoue et al., 2013), but see (Insel and Shapiro, 1992). Thus, interspecies differences in opiate receptor density and pharmacological effects suggest a role of opiates in social attachment.
A relevant question is how and where these neurotransmitter systems interact. Rat NAcc core neurons expressing D1-like receptors co-express prodynorphin, conversely D2-like expressing cells co-express proenkephalin (Curran and Watson, 1995). An electron microscope investigation indicates that about half of neurons in the rat dorsolateral striatum co-express D2 and MORs (Ambrose et al., 2004). These anatomical studies support the possibility that oxytocin, vasopressin and D2-like receptors are present in single striatal cells, yet their interactions remain to be further investigated.
Little is known about pair-bond formation in primates. However, marmosets, a monogamous new-world monkey, show oxytocin receptor labeling in NAcc among other subcortical structures (Schorscher-Petcu et al., 2009), whereas rhesus macaques, a polygamous old-world monkey, only show labeling for this receptor in hypothalamus and the nucleus basalis of Meynert (Freeman et al., 2012). Titi monkeys are a monogamous species that exhibit small, but significant, changes in glucose intake in the NAcc and ventral pallidum 48 hr. after mating (Bales et al., 2007).
Whereas we have learned about pair-bond formation, the neuronal mechanisms of pair-bond maintenance are just starting to be investigated. For example, monogamous male voles show a significant increase in D1-like receptors in NAcc after pair-bond formation, and D1-like receptor antagonists diminish aggressive behavior toward female strangers-a behavioral marker of pair bond formation . This is probably the most exciting open question in pair-bond formation, what are the neuronal mechanisms of pair-bond maintenance?
The striatum might also play a role in mother's recognition of offspring. The pregnancy hormones progesterone and oestrogen prime the brain for the synthesis of oxytocin and its receptor (Keverne and Curley, 2004). Olfaction is the prime sense for maternal offspring recognition in mammals. Oxytocin receptors expression increases in central olfactory projections and NAcc during pregnancy (Keverne and Curley, 2004).
Overall, these studies suggest a mechanism for pair-bonding formation in voles. The hypothetical mechanism is centered in the striatum's capability to facilitate the association between olfactory social cues and reward. A potential mate's pheromones reach the vomeronasal organ (VNO), which in turns transmits the individual's information to the extended amygdala and the central amygdala further transmits this information to striatum. VNO lesions in female voles disrupt pair formation (Curtis et al., 2001), a finding that supports this hypothetical mechanism. However, other brain areas may also play a role in pair-bond formation. For example there are marked differences in the distribution of dopamine, oxytocin and vasopressin receptors in the medial prefrontal cortex of monogamous and promiscuous voles (Smeltzer et al., 2006). As noted by Wang and Young (Lim et al., 2004b;Young and Wang, 2004), the cellular mechanism might be the co-activation of D2-expressing accumbal neurons by vasopressin and/or oxytocin. Oxytocin is released by the hypothalamus, odor information transmitted from the central amygdala and DA is released by dopaminergic neurons in VTA. Striatal neurons are well-suited for detecting the conjunction of sensorimotor information and reward. In pair-bond formation the role of the striatum, particularly the NAcc is to facilitate the association of social cues and reward to guarantee reproductive success.

CONCLUSIONS
Based on the studies reviewed here, we conclude that the striatum plays a role in computations that take place during social behavior. These computations revolve around social actions and social rewards. fMRI and neurophysiology studies show that Frontiers in Neuroscience | Decision Neuroscience December 2013 | Volume 7 | Article 233 | 10 neural activity in the striatum is modulated by social rewards and by learning in a social context (Figure 3). By learning in this context we refer to: learning about other's preferences, a new mate, about other's actions that lead to own reward, or updating our predictions about other's preferences. We have shown that neuronal activity in the striatum is also modulated by social actions and, critically, by the conjunction of social action and own reward (Figure 4). The computations performed by the striatum are critical for successful social interactions. A breakdown in social interactions leads to compromised striatal function, which highlights the interplay between this neuronal circuit and social behavior.
Overall, these observations suggest that the striatum does not appear to have a particular "social" specialization; rather its neurons are capable of flexibly incorporating social information into their computations. Therefore, it is justified to speak of the striatum as containing a general purpose neuronal mechanism to associate actions or events with reward. Importantly, it can also associate-or reflect-other's actions to the rewards they lead to. Rewards are also coded in the activity of striatal neurons, and as social rewards are a sub-class of rewards, they are processed in the striatum. Importantly, a functional subdivision based on different types of social behaviors need to await further experimentation.
In conclusion, the striatum plays a role in the computation of social behavior.