Does Feedback-Related Brain Response during Reinforcement Learning Predict Socio-motivational (In-)dependence in Adolescence?

This multi-methodological study applied functional magnetic resonance imaging to investigate neural activation in a group of adolescent students (N = 88) during a probabilistic reinforcement learning task. We related patterns of emerging brain activity and individual learning rates to socio-motivational (in-)dependence manifested in four different motivation types (MTs): (1) peer-dependent MT, (2) teacher-dependent MT, (3) peer-and-teacher-dependent MT, (4) peer-and-teacher-independent MT. A multinomial regression analysis revealed that the individual learning rate predicts students’ membership to the independent MT, or the peer-and-teacher-dependent MT. Additionally, the striatum, a brain region associated with behavioral adaptation and flexibility, showed increased learning-related activation in students with motivational independence. Moreover, the prefrontal cortex, which is involved in behavioral control, was more active in students of the peer-and-teacher-dependent MT. Overall, this study offers new insights into the interplay of motivation and learning with (1) a focus on inter-individual differences in the role of peers and teachers as source of students’ individual motivation and (2) its potential neurobiological basis.


Inter-individual Differences in Students' Scholastic Motivation
Studies in the field of educational psychology focus on the social school environment, which is mainly determined through relations with peers and teachers providing essential motivation (Harter, 1996). Indeed, several studies have shown that peers and teachers can play an important role for students' scholastic motivation (Wentzel, 2009a(Wentzel, ,b, 2010, both individually and through a whole classroom approach (Pianta et al., 2003). This is particularly interesting during adolescence, when most students' scholastic motivation declines (Harter, 1996;Zusho and Pintrich, 2001;Watt, 2004) due to changes in their environment or within themselves (Eccles et al., 1998;Wigfield and Eccles, 2001). Nevertheless, Wigfield and Eccles (2001) contend that some students do not necessarily reduce motivation (Deci and Ryan, 2002) suggesting that there are inter-individual differences in students' motivation patterns based on environmental as well as developmental aspects. Previous person-oriented research (Raufelder et al., 2013c) investigated inter-individual differences in adolescents' perception of peers and teachers as environmental sources of motivation, and proposed four distinct motivation types (MTs), which we discuss in more detail below.
Four Different Motivation Types (MTs) and the Concept of Socio-motivational (In-)dependence Raufelder et al. (2013b) differentiated four different MTs based on the importance of peers and teachers as sources for scholastic motivation. Applying a latent class analysis (LCA) in 1088 adolescent students from Germany they identified (1) teacher-dependent MT, (2) peer-dependent MT, (3) peer-andteacher-dependent MT and (4) peer-and-teacher-independent MT (Raufelder et al., 2013b). Students of the teacher-dependent MT (1) receive most of their scholastic motivation from teachers. Further qualitative interviews  revealed that these students are also affected by teachers' own motivation as well as the support and feedback they perceive from the teacher. Likewise, students of the peer-dependent MT (2) are mostly driven by their peers, whereas students of the peerand-teacher-dependent MT (3) perceive both peers and teachers as sources of motivation . Finally, the motivation of students in the peer-and-teacherindependent MT category (4) remains largely unaffected by the motivation, learning behavior, or perceived support of their peers and teachers. The four MT have been validated by another LCA in a sample of adolescent students in Montréal, Canada . Based on these findings Raufelder (2014) formulated the concept of socio-motivational (in-)dependence: individuals whose motivation is affected by others' motivation, learning behavior or perceived support, are considered to be sociomotivationally dependent. In the school context, motivation can be affected by both peers' motivation, learning behavior or social support and/or by teachers' perceived support and motivation (see Wentzel, 2009a,b;Raufelder et al., 2015). In turn, when the motivation remains largely unaffected by others, socio-motivational independency is assumed. A subsequent longitudinal latent transition analysis (LTA) could confirm the four types of socio-motivational (in-)dependence by investigating intra-individual changes from early to middle adolescence (Jagenow et al., 2015). While slight turnovers were observed between the three types of socio-motivational dependence from early to middle adolescence, the socio-motivationally independent category showed the highest probability (0.68) to remain stable.

Reinforcement Learning, Motivation, and Brain Activity
Since the beginning of the 20th century, researchers have studied how motivation affects learning and vice versa (Mehring and Colson, 1990), since both concepts are reportedly related (Rao, 2003;Crandell and Robinson, 2007). As most students' scholastic motivation declines with the onset of adolescence (Harter, 1996;Zusho and Pintrich, 2001;Watt, 2004), most students' learning performance and academic achievement also tend to decrease (Alspaugh, 1998;Barber and Olsen, 2004;Vedder-Weiss and Fortus, 2012). One prominent theory on learning is the reinforcement theory, which is based on the early work of Thorndike (1929) and Skinner (1938). Their theory of operant conditioning posits that individuals learn according to the outcomes of their actions. Specifically, if an outcome -for example good learning performance at school -is reinforced through reward (e.g., praise and feedback from peers and/or teachers), then the corresponding behavior (e.g., good learning performance) is likely to be repeated and vice versa. This means that students' motivation to learn is directly increased through reward. In turn, punishment can work to suppress certain kinds of undesirable behavior (see Ittel et al., 2013). Both the concept of socio-motivational (in-)dependence and reinforcement theory focus on the interactions between an individual and his or her environment and constitute a promising approach to examine individual differences in motivation and learning patterns. Moreover, linking these theories to neuroscience offers the possibility of elucidating individual characteristics beyond mere behavioral aspects. Since learning from reward and punishment represents the fundament also for social learning, investigating the neurobiological correlates of these learning mechanisms may help to explain why some individuals depend on the reinforcing aspects of scholar motivators (peers and teachers), while others do not. Moreover, in neuroscientific experimental settings, the detailed learning experience (i.e., trial-by-trial learning) within the brain can be investigated directly. Here, the combination of functional magnetic resonance imaging (fMRI) and reinforcement learning paradigms such as reversal learning tasks (Cools et al., 2002) constitute a well-established research method to investigate simple operant learning and allow researchers to consider potential inter-individual differences on a neural level during a basic learning task. Reversal learning assesses an individual's ability to develop advantageous learning behavior by using feedback on performance. Standard 2choice reversal learning tasks present participants with two potential responses with different reinforcement contingencies. Through multiple trials, participants learn to choose the stimulus associated with a higher reward by referring to their performance feedback. Next, the reinforcement contingency is altered without warning participants, who are thus surprised to discover that their previously reinforced response does not yield a reward anymore, cueing them to switch to the alternative response. As soon as one response is no longer being rewarded, the alternative response always becomes the better choice (see D'Cruz et al., 2011). Using such tasks, brain regions that are crucially involved in signaling learning parameters such as the prediction error (PE) have been identified (O'Doherty et al., 2003;Schlagenhauf et al., 2013). The PE is defined as the difference between an expected outcome and the actual outcome (Sutton and Barto, 1998). These PEs have been found to be encoded in the subcortical striatum (O'Doherty et al., 2003(O'Doherty et al., , 2004D'Ardenne et al., 2008), as well as in prefrontal and parietal areas (O'Doherty et al., 2003;Tobler et al., 2006). The striatum which comprises of putamen and caudatus is known to be involved in basic forms of feedback processing and flexible learning processes (Beeler et al., 2014). The prefrontal cortex (PFC) on the other hand is involved in more controlled social feedback learning processes (see Guyer et al., 2012) and higher order control functions (Wood and Grafman, 2003;van Schouwenburg et al., 2010;Wolfensteller and Ruge, 2012).
Students' motivation to learn is influenced by basic processes such as operant learning. Investigating these processes in experimental neuroscientific studies may help to explain why some individuals depend on reinforcing aspects of scholar motivators and others do not. Understanding these neural processes may constitute the basis for the understanding of aberrant behavior in the school context (e.g., conduct disorders).

Research Aims and Hypotheses
Here, we used computational modeling to estimate PE values based on the behavioral data of each individual. This approach also provides meaningful parameters that quantify different aspects of learning behavior like the individually estimated learning rate. The learning rate describes how strongly single feedback events influence future choice behavior, i.e., show if an individual adjusts his or her behavior quickly according to the received feedback or if his or her expectations are more stable and only influenced through feedback over a longer period of time. Possibly, since individuals with socio-motivational dependence adapt their motivation and learning to peers and/or teachers , an association of individual learning rate values with the probability of being socio-motivationally dependent rather than socio-motivationally independent is suggested.
Furthermore, we hypothesize that PE related activation measured using fMRI during the reinforcement learning task is associated with the individual socio-motivational type. However, since this is the first study to link students' MT, learning rate and brain activation during a reinforcement learning task, our approach was exploratory.
The main goal of the present study was to explore the interplay of reinforcement learning and socio-motivational (in-)dependence in adolescent students by: (1) investigating whether students' individual learning rates in a reversal learning task can predict their respective MT, and (2) investigating whether PE related activation in the PFC and the striatum -brain regions that are known to be crucially involved in reinforcement learning -predict the probability of belonging to specific MTs.

Participants and Procedure
A subsample of 88 mentally and physically healthy (as confirmed by a semi-structured interview assessing psychiatric health care utilization and psychiatric family history) adolescents (M age = 15.03; SD = 0.51; 44 girls) from 9th grade in secondary schools in the German federal state of Brandenburg was selected from a larger sample of a former quantitative study (N = 1088; M age = 13.7 years; SD = 0.53) to participate in an fMRI study on reinforcement learning by using a reversal-learning task. The 88 participants were chosen according to their high probability (>0.85) of being either (1) a peer-dependent MT (n = 20; girls = 12), (2) a teacher-dependent MT (n = 17; girls = 10), (3) a peer-and-teacher-dependent MT (n = 24; girls = 11), or (4) a peer-and-teacher-independent MT (n = 24; girls = 12). This probability was based on ratings on the scales "Peers as positive motivators" (PPMs) and "Teacher as positive motivators" (TPMs), which formed the empirical basis of our preliminary LCA (see Raufelder et al., 2013b). In other words, the sample was highly representative of each MT.
The fMRI sessions were held between June and December 2012. Prior to their fMRI session each student within the current study's subsample answered questions (among others) about their perception of PPMs and TPMs for a second time, which formed the empirical basis of the typology in the present sample. Since we were particularly interested in students' own views and perceptions of their socio-motivational relationships with teachers and peers, the questionnaires used for this study were based on self-reports. Three of the 88 participants needed to be excluded: one due to missing data, one due to excessive head movements (more than 3 mm translation or 3 • rotation) and one due to neurological abnormalities. The remaining 85 participants were included in the following analysis.
Prior to the fMRI study, participants were thoroughly screened for MRI exclusion criteria (e.g., non-removable ferromagnetic material). Both the participant and one parent or custodian provided their informed, written consent. Participants were free of drug use as well as any medication potentially affecting brain responses. According to the Edinburgh Handedness Inventory (Oldfield, 1971) 81 participants were right handed and four were left-handed. The study was performed in accordance with the latest version of the Declaration of Helsinki, and approved by the ethics committee of the German Psychological Society.

Teachers as Positive Motivators
This subscale was taken from the Relationship and Motivation (REMO) scale (Raufelder et al., 2013a). TPM consists of six items that showed a reliability of α = 0.81 in the current sample. Students were asked to answer statements such as "I will make more effort in a subject when I think that the teacher believes in me" or "When a teacher helps me, I try to do well in the subject" on a 4-point Likert scale from 1 (strongly disagree) to 4 (strongly agree).

Peers as Positive Motivators
This subscale is also part of the REMO Scale (Raufelder et al., 2013a) and consists of nine items (e.g., "When my friends learn, I am also motivated to learn" or "My friends and I motivate each other to make an effort at school"). Responses were collected using a four-point Likert scale from 1 (strongly disagree) to 4 (strongly agree) (α = 0.86).

Probabilistic Reversal Learning Task
During the fMRI, participants performed 300 trials of a probabilistic reversal learning task. In this reinforcement learning paradigm, participants can win money by choosing one of two different stimuli, one of which has a higher probability of yielding a reward. These probabilities reverse during the experiment, therefore the participants have to keep learning and adjusting their choice behavior over the course of the experiment. One of the stimuli was associated with an 80% probability of a monetary reward and a 20% probability of a loss. Inverse probabilities were assigned to the other stimulus. Each trial consisted of three phases: presentation of the two stimuli (maximum 1.5 s), selection of one stimulus by the participant (1.5 s -RT), and feedback (1 s, see Figure 1). During the jittered, exponentially distributed inter-trial interval (1-6.5 s) a fixation cross appeared. Then, two symbols were randomly assigned to the left and righthand side of the screen. Participants had to select one of them by pressing a button within the presentation time window; otherwise the message "Too slow!" (in German "Zu langsam!") appeared and the experiment proceeded with the next trial. After the button press, a blue frame highlighted the selected target and the feedback (positive or negative) appeared: (1) positive feedback was displayed with an image of a 10 Euro-cent coin accompanied by the message "Win! +10Cent" (in German "Gewonnen! +10Cent"), (2) negative feedback was displayed with a crossed out 10-Euro Cent image accompanied by the text "Loss! -10Cent" (in German "Verloren! -10Cent"). After achieving the criterion of five correct answers (i.e., choosing the stimulus with currently higher chance to win) for the last six trials in a sliding window, the chance of reversal of the probability distribution became 20% for the following trial. Before performing the task in the scanner, students were introduced to the task with the help of a powerpoint presentation and given a short training session without reversals, in order to be familiarized with the probabilistic nature of the experiment. They were informed about the possibility of changes in reward contingencies during the main experiment and that they would receive the money they won during the task. Participants were instructed to maximize their reward. After performing the task with reversals in the fMRI scanner, participants received their total monetary gain (maximum 8 €).

Behavioral Data Analysis of Reversal Learning Task
Matlab 2010b (The MathWorks, Natick, MA, USA) and SPSS 19 (SPSS, Inc., Chicago, IL, USA) were used to analyze the behavioral data generated by the reversal learning task. Each participant's behavioral performance was determined by the proportion of his or her "correct" responses, i.e., choosing the symbol with the currently higher probability to be rewarded.

Reinforcement Learning Algorithm
A reinforcement learning model was used to estimate learning parameters that describe individual behavior in the reversal task, and to generate single trial PEs as regressors for the analysis of fMRI data. In detail, a modified Q-learning algorithm was used that calculates five free parameters for each participant that best capture the student's observed choice behavior. This algorithm updates an expected value (Q-value) based on the outcome of previous trials (Sutton and Barto, 1998). At each trial t, Q-values for the chosen option c were adjusted according to the feedback received: The individual learning rate α determines how quickly expectations change with respect to the current PE δ, which is defined as the difference between the expected and the actual reinforcement R c,t : R c,t indicates two separate free parameters: instead of coding reward and punishment as 1 and −1 respectively, the parameters varied individually for reward and punishment (Schlagenhauf et al., 2014).
In addition, the model estimates the degree to which a participant updates values for the unselected response Q u,t . Since the reversal learning task included inversely correlated reward probabilities, double update models are best suited to explain the observed behavior (Glascher et al., 2009;Hauser et al., 2014;Schlagenhauf et al., 2014). Because we were particularly interested in examining inter-individual differences in behavior (i.e., learning), the extent to which a participant utilized a double update strategy was allowed to be weighted by parameter κ: To optimize the model fit, another free parameter Q i was included that specified the initial Q-values for one option (a bias to initially choose one stimulus over another; Schlagenhauf et al., 2014). The probability of choices based on the modelderived values was estimated by means of a softmax. The softmax equation calculates p a(t) -the likelihood of a subject choosing action a over b in trial t -which is assumed to be proportional to the expected value of this option.
The set of 5 free parameters was fitted individually for each participant by applying expectation-maximization with empirical FIGURE 1 | Probabilistic reversal learning task: one trial consists of stimulus presentation with a response time window (1.5 s), feedback (1 s) and an inter-trial interval of 1.5-6.5 s introducing an exponentially distributed jitter. Choosing the currently 'good' stimulus leads to a reward with a higher probability (80%) than the alternative option (20%). After achieving the criterion of five correct answers out of the last six trials, the chance of a reversal of the probability distribution becomes 20% for the following trials.
priors, and the model evidence was approximated by integrating out the free parameters over the likelihood by sampling from the prior distribution (Huys et al., 2011(Huys et al., , 2012. The choice behavior of all but one participant could be explained better than if it had been left up to chance (i.e., based on the likelihood that the observed data are given by the parameters). One participant's performance was more than three standard deviations below the mean and thus that participant was excluded from further model-based analysis. As such, time-series based on individual parameters can be regressed against imaging data in a meaningful way.

fMRI Data Analyses
Functional magnetic resonance imaging image processing and data analyses were performed using Statistical Parametric Mapping software package (SPM8, Wellcome Trust Centre for Neuroimaging, London, UK 1 ) in Matlab R2010b (The MathWorks, Natick, MA, USA). Initially, the following image preprocessing steps were conducted: correction for differences 1 http://www.fil.ion.ucl.ac.uk/spm in slice time acquisition and motion including unwarping, coregistration of the mean EPI with the anatomical image, spatial normalization, and segmentation into tissue classes of the T1 image using the unified segmentation approach (Ashburner and Friston, 2005). EPIs were spatially smoothed with an isotropic Gaussian kernel of 6 mm full width at half maximum. The general linear model approach used by default in SPM8 was applied. Data analysis was performed following an event-related approach. On the single-subject-level, a regressor was used to model reactions to feedback and the PEs derived from the learning model were added as a parametric modulator. In order to account for movement associated variance, the six rigid body movement parameters and their first temporal derivative as well as a regressor marking scans with more than 1 mm scanto-scan movement were included in the model as additional regressors of no interest (Iglesias et al., 2013). Individual PE contrast images were taken to a random effects group-level analysis (one-sample t-test for within group analysis). All results are reported using family wise error (FWE) correction for the whole brain (p < 0.05). For statistical analysis, the bloodoxygen-level dependent (BOLD) parameter estimates (reflecting changes in the concentrations of oxy-and deoxyhemoglobin within the brain and thus indirectly indicating changes in neural activity) were extracted from clusters that were significantly activated [regions of interest (ROI)]. Based on previous studies and to reduce the number of tested areas we focused on the PFC and striatum (Cools et al., 2002;D'Cruz et al., 2011;Schlagenhauf et al., 2013). Extracted values from functional clusters were averaged. Based on studies showing differential activation of striatal subregions during reinforcement learning (O'Doherty et al., 2004), especially in adolescents (Cohen et al., 2010), we additionally extracted parameter estimates from striatal subregions (limbic, sensorimotor and associative striatum; Martinez et al., 2003) to analyze these in more detail.

Multinomial Logistic Regression Analysis
The three-step approach of latent class (LC) regression introduced by Vermunt (2010) was used to predict sociomotivational (in-)dependence: in the first step, the LC model is built on students' replies to PPM and TPM before the fMRI session, although the regression and the LC model are combined into one model. In the second step, similar to LC analysis without covariates, individuals are assigned to the LCs based on their prior MT membership probabilities obtained from step one. The estimated mean allocation probabilities for participants in the current study are above 0.96. However, assigning individuals to single MT categories may generate misclassification errors, since membership probabilities are not always exactly one. In Vermunt's (2010) three-step approach, the MT categories allocated in step two serve as a single response variable with known measurement error probabilities. Finally, in the third step, a multinomial logistic regression analysis (MLRA) is conducted using the MT category assignment from step two as the observed dependent variables. Thereby, in contrast to MANOVA, the misclassification errors in the LCs are taken into account. The strength and advantages of Vermunt's three-step approach have been successfully demonstrated (Vermunt, 2010;Asparouhov and Mutheìn, 2012).
The LC model was estimated using three parcels of the original six items (three parcels consisting of two items) of the REMO subscale, TPM, and the original nine items (three parcels consisting of three items) of the REMO subscale PPM, which students filled out immediately before their fMRI session. To ensure that all measurement information enters the structural equations, random parcel building is often used in psychological research (Nasser-Abu and Wisenbaker, 2006). Although some authors argue that parceling is inappropriate in confirmatory factor analysis models (Marsh et al., 2013), others (e.g., Little et al., 2002;Nasser-Abu and Wisenbaker, 2006) underline the advantages of parcel construction due to preventing potential variance sharing and spurious correlations. To predict the extracted LCs, the independent variables deployed in the logistic regression were the learning rate α as well as the extracted values from the fMRI analysis from the ROIs PFC, caudate/thalamus and putamen (left and right hemisphere separately). All analyses were carried out in Mplus 7 Muthén, 1998-2010) with MLR estimator, which is recommended for standardized questionnaire-based analyses with small and medium sized samples (N < 100; Wang and Wang, 2012), although the sample size is relatively large for fMRI standards. To account for missing data, the models were estimated using full information maximum likelihood in Mplus. Table 1 shows the model fit results for LCA with 2-5 classes. Judging from the Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC; lowest value), the 4-class solution reveals the best fit for our data. In addition, the Bootstrap Likelihood Ratio Tests (BLRT) indicate that the 5class solution is not superior to the 4-class solution model (see Table 1).

Latent Class Analysis
The LCA replicated the four MTs (see Figure 2): (1) teacher-dependent MT, (2) peer-dependent MT, (3) teacherand-peer-dependent MT and (4) teacher-and-peer-independent MT. Membership to this 4-class solution was as follows: 22.9% teacher-dependent MT, 26.9% peer-dependent MT, 24.2% peer-and-teacher-dependent MT, and 26.0% peer-and-teacherindependent MT. Table 2 shows the results of the MLRA, which tested the learning rate α as well as the extracted values from the PE related BOLD signal of two brain regions that were selected a priori based on previous research and due to their involvement in learning processes (PFC and striatum). In addition, to test the association with striatal activation in more detail, the activation of different striatal subregions (associative, limbic, and sensomotoric striatum) during the reversal task was tested as a predictor of the four different MTs. By default, Mplus estimates so-called logit-regression values (transformation form of probability; B-values), since nominal scaled variables (class variables) have no unit. To facilitate interpretation, odds ratios (ORs) have been estimated, which are presented in Table 2 in second position. Since the peak of ORs is 1, values above 1 increase the probability, whereas values below 1 reduce the probability of membership to one LC (Szumilas, 2010).

Learning as Predictor of Four Different MTs
The four MT did not differ in their performance measured as percent correct responses, i.e., choosing the symbol with the currently higher probability to be rewarded (F = 0.13, p = 0.94). The computational model provided individual parameters as a result from the process of fitting the model to each participant's observed behavior. As depicted in Table 2, the individual parameter learning rate α is able to distinguish between the probabilities of being an independent MT rather than a peerand-teacher-dependent MT (B = -7.71, OR = 0.00, p < 0.05): a high learning rate i.e., a high α value, indicated that a participant tends to be more strongly influenced by the most recent feedback. Such a behavior is associated with membership to the peer-and-teacher-dependent MT rather than the independent MT. Members of the latter category show lower learning rates indicating slower updating of expectations.

Functional Activation of Learning Regions as Predictors of Four Different MTs
We found PE related activation in the expected regions, the PFC, the striatum, and in parietal areas, as has been described in previous studies (see Figure 3). Activity in the PFC was located in the left inferior and middle frontal gyrus and in the right middle frontal gyrus. The striatum was found to be significantly activated in two separate clusters in each hemisphere, one lateral cluster (putamen) und one dorsal cluster (tail of caudate and anterior nucleus of thalamus).
For the further analysis we used values extracted from regions activated during the reversal learning task in our sample, thereby ensuring that these regions are indeed significantly involved in feedback processing during the task. For an additional analysis focusing on the striatum, a region which receives specific interest regarding PE processing, we used functional masks (Martinez et al., 2003) to extract activation estimates from striatal subregions.
The BOLD signal parameter estimates of two regions (PFC right, and associative striatum left -a subregion of the striatum) that are associated with PE encoding in the reversal task were identified as predictors of the four MTs: our first analysis revealed that PE related activation in the right PFC discriminates between the probability of two categories: the activity level in right PFC predicted the probability of being a peer-and-teacherdependent MT (B = −1.50, OR = 0.22, p < 0.05) rather than an independent MT. Second, as the results of a separate multinomial regression with the different striatal subareas show, the PE related activation in the left associative striatum discriminates between the probabilities of membership to three categories: the more the activity in the left associative striatum covaried with PEs, the higher the probability of being an independent MT rather than a teacher-dependent MT (B = 7.90, OR = 2697.84, p < 0.05) or peer-and-teacher-dependent MT (B = 6.80, OR = 897.85, p < 0.05). Please see Table 2 for more details.

DISCUSSION
The main goal of the current person-oriented, multimethodological study was to shed light onto individual differences in the interplay between learning and sociomotivational (in-)dependence in adolescence. In particular, reinforcement learning rates as well as activity levels in brain regions that are associated with reinforcement learning (striatum, PFC) were tested as predictors of four different MTs based on the concept of socio-motivational (in-)dependence.
With regard to our first research aim, student's learning rate indeed functions as a predictor discriminating different MTs, in particular between the peer-and-teacher-dependent MT and the peer-and-teacher-independent MT. In detail, a temporal short term learning rate, characterized by a high α value, is associated with the peer-and-teacher-dependent MT but not with the independent MT. Note that neither a low nor a high learning rate is beneficial in this probabilistic learning task, where an immediate reaction to negative feedback as well as a too slow adjustment of behavior might not increase rewarded events. Thus, students of the peer-and-teacherdependent MT are more likely to immediately adjust their behavior according to the feedback they receive during the reinforcement learning task. Possibly, this is caused by increased feedback sensitivity in basic learning mechanisms, which could function as a basis for their socio-motivational dependence, i.e., high feedback-sensitivity (recognition and approval) toward peers and/or teachers . In turn, the independent MT seems to be less reactive to feedback (reinforcement) during learning as well as in terms of motivation.
However, the learning rate neither discriminated between the three socio-motivational types (peer-dependent MT, teacher-dependent MT, peer-and-teacher-dependent MT), nor between the peer-dependent MT and the teacher-dependent MT compared with the peer-and-teacher-independent MT. Future studies with diverse samples are warranted and may provide deeper insights into these individual differences.
Regarding our second research aim, learning related neural activation in two brain regions (PFC and a subregion of the striatum) discriminated between the probabilities of being socio-motivationally independent as opposed to being sociomotivationally dependent. In detail, more PE related activity in the right PFC predicted the probability of being a peer-andteacher-dependent MT rather than an independent MT. This finding supports previous research that found the PFC to be associated with more controlled and feedback-related learning processes (Cools et al., 2002;van Schouwenburg et al., 2010;Wolfensteller and Ruge, 2012). In this way, it also supports previous findings on differences in learning-behavior typically found for different MT. In fact, the peer-and-teacher-dependent MT is characterized by a strong feedback-orientation toward both peers and teachers and by well-adjusted and controlled behavior FIGURE 3 | Functional magnetic resonance imaging during reversal learning. BOLD signal covaried with prediction errors in a network including the PFC, the parietal cortex, the striatum (putamen and caudate/thalamus-cluster) and the cerebellum (FWE-corrected on the whole brain level, Z-values > 4.8, p-values < 0.05).
in school Raufelder et al., 2015). Thus, social-motivational dependence might be associated with tendency to follow directions and adapt to feedback in general.
In the additional analysis of subregions within the striatum, we observed that the higher the PE related activity in the left associative striatum, the higher was the probability of being an independent MT rather than a teacher-dependent MT or a peer-and-teacher-dependent MT. During the fMRI experiment, we measure PE related BOLD signals in the striatum. It can be speculated, that individuals from the sociomotivational independent MT group may more strongly rely on their own assumption about their actions and the associated consequences. Such stronger reliance or trust in one's own estimations of the state of the environment can computationally be understood as higher precision of the PE, which mightperhaps through top down processing -lead to an increase in the measured PE signals in the striatum (Hebart et al., 2016).
Overall, these findings support the concept of sociomotivational (in-)dependence by providing evidence that the concept of socio-motivational (in-)dependence is associated with different learning patterns on (a) a behavioral as well as on (b) a neural level. Following Jensen's (1998) advice of "teaching with the brain in mind, " our findings revealed that teachers should be aware that student's individual motivation style [i.e., socio-motivational (in-) dependence] is associated with specific feedback-related brain response during reinforcement learning. Linking these findings to educational practice in order to better support adolescent students and accommodate their individual learning and motivation preferences, students with a socio-motivational dependence benefit from feedback and concrete directions, whereas students with a socio-motivational independence need a more autonomous learning environment with fewer instructions and feedback from teachers and peers. Particularly, students with a socio-motivational independence might not best benefit from the traditional school system, which is dominated through learning in classroom settings and strong teacher involvement and feedback. As strength they show a higher behavioral flexibility, which need to be better supported in class through promoting autonomy-supportive or student-centered teaching behaviors by teachers (Soenens and Vansteenkiste, 2005;Roth et al., 2007;Radel et al., 2010).

Strengths, Limitations, and Future Directions
Following the person-oriented approach, the present interdisciplinary and multi-methodological study extends existing research on individual differences in the interplay of learning and motivation, specifically the role of reinforcement learning as a predictor of socio-motivational (in-)dependence in adolescence. Compared to current fMRI standards, we studied a relatively large, non-clinical adolescent sample, which provides valid information about neurobiological processes during reinforcement learning considering individual differences in socio-motivational patterns in healthy adolescents. The fMRI design allows giving important insights into adolescents' brain activity while learning, bridging disciplinary boundaries by combining neuroscientific results to educational psychology in a multi-methodological way. In other words, there are not only individual differences in students' motivation patterns, but also in their brain activity while learning.
However, some methodological limitations need to be considered when interpreting the current findings. Firstly, the results are limited to German adolescents within the age range of 13-16 years and we are aware that findings may differ for students within another age range, or for students from other countries or different ethnic groups. Secondly, considering the relatively novice field of fMRIbased research on adolescents' motivational and learning processes, our study is explorative in nature. Future replication studies are warranted to validate our results and to broaden knowledge regarding potential dysfunctions within motivational aspects, learning and its neural basis. Thirdly, one might criticize the use of self-report data. However, studies on motivation that were based on both teacher and student selfreports have reported relative disparity in the information provided by multiple informants (see Skinner and Belmont, 1993). Wentzel et al. (2010) argue that teachers tend to provide invalid information about their students' perceptions of their own behavior. Since the present study focused on students' perception of teachers and peers as motivators, a self-report approach is warranted. Moreover, the use of self-report measures has been validated as an appropriate method in psychological research (Chan, 2009). Furthermore, the present study combines self-report data with experimental data, thereby minimizing the weaknesses of each method while maximizing their strengths by combining disparate yet complementary approaches (Raufelder et al., 2012). The reinforcement learning task might not compare directly to the situation in a real classroom setting, however, operant reinforcement learning is one of the most basic forms of learning (see Thorndike, 1929;Skinner, 1938;Ittel et al., 2013) and therefore also constitutes the basis for more complex learning processes in educational settings. The reversal learning paradigm used in the current study is a well-established operationalization of reinforcement learning and is suitable for simultaneous measurement of functional imaging data (Cools et al., 2002). However, we did not examine learning based on social reinforcement but based on more basic secondary reinforcement (money). Based on our findings, future studies are warranted to operationalize this more complex type of (social) reinforcement. Finally, future studies following a personoriented approach are encouraged to expand existing research on differential motivation and learning patterns (Corpus and Wermington, 2014;Korpershoek et al., 2015) through including neural components. Despite these limitations, this personoriented study provides deeper insights into the interplay of individual differences in learning and motivational processes in adolescents by using an innovative, multi-methodological, and interdisciplinary research design.

AUTHOR CONTRIBUTIONS
DR did the statistical analyses and wrote the main part of the paper. RB did the fMRI analyses of the reversal task and helped writing the paper. LR, SG, RL, TG, and AB were mainly involved in the fMRI procedures including conceptualization, data collection, pre-analyses etc. They additionally helped correcting the manuscript.

FUNDING
The research reported in this article was supported by a grant from The Volkswagen Foundation (Schumpeter Fellowship, II/84 452). The authors would like to thank the principals, teachers, parents, and students for their cooperation in making these studies possible. We also thank Eva Flemming and Patricia Pelz for helping with the fMRI data acquisition.