Learning From Loss After Risk: Dissociating Reward Pursuit and Reward Valuation in a Naturalistic Foraging Task

A fundamental feature of addiction is continued use despite high-cost losses. One possible driver of this feature is a dissociation between reward pursuit and reward valuation. To test for this dissociation, we employed a foraging paradigm with real-time delays and video rewards. Subjects made stay/skip choices on risky and non-risky offers; risky losses were operationalized as receipt of the longer delay after accepting a risky deal. We found that reward likability following risky losses predicted reward pursuit (i.e., subsequent choices), while there was no effect on reward valuation or reward pursuit in the absence of such losses. Individuals with high trait externalizing, who may be vulnerable to addiction, showed a dissociation between these phenomena: they liked videos more after risky losses but showed no decrease in choosing to stay on subsequent risky offers. This suggests that the inability to learn from mistakes is a potential component of risk for addiction.


INTRODUCTION
Many choices, like starting a new relationship or accepting a job out of state, involve some level of risk that can be expressed as a win or loss relative to baseline (1). Such decisions can lead to negative affective experiences, particularly if an individual chooses to take a risk and then receives an unfavorable outcome (2). While some individuals learn to make choices that minimize future negative outcomes (3,4), the inability to learn from such losses may be integral to certain externalizing psychopathologies like addiction (5,6). In this study, we examined relations between risky losses and externalizing tendencies by modifying a newly established human foraging paradigm (the Web-Surf Task) (7).
An earlier version of the Web-Surf Task was based on a rodent neuroeconomic task (Restaurant Row) (8). These parallel tasks entailed serial stay/skip choices regarding offers of real-time delays and primary rewards (food from four feeder sites in Restaurant Row, video clips from four galleries in the Web-Surf Task). On each encounter in the Web-Surf Task, the subject was informed of a required delay before the reward would be delivered, indicated by a download bar and numeric text instruction. The subject could either accept the deal and stay through the delay for the reward, or skip the deal and try his or her luck at the next reward site (video gallery). Reward kind (genre of video) remained constant at each gallery. Subjects had a limited time to spend on the task, thus creating delay-related trade-offs between galleries. Delay was random (selected uniformly from 1 to 30 s) on each offer encounter.
In our earlier work, we observed comparable decision valuation processes across species using these analogous tasks (9). Each subject revealed different, but reliable, delay-dependent preferences (i.e., thresholds) for each restaurant/gallery, taking delays below that threshold and skipping delays above. We also observed a high correspondence between choices and consummatory responses among humans (delay thresholds related to video enjoyment ratings), and between choices and stated preferences (delay thresholds related to rankings of video galleries assessed at the end of the task) (7).
Our initial work using the original Web-Surf Task bridged crossspecies models of decision-making while also demonstrating the task's capacity to parse different valuation processes (7). A critical next step is to understand whether foraging task parameters predict meaningful individual differences, like those observed on the externalizing psychopathology spectrum (including addiction). We were motivated to use the Web-Surf Task to assess externalizing tendencies for two reasons: 1) the rodent analogue (Restaurant Row) has been used to assess the effects of different substances (i.e., cocaine and morphine) on deliberation and post-decisional commitment (6), highlighting the value of this paradigm for understanding substance use disorders.
2) Recent theories suggest that foraging models of decisionmaking are a promising approach for studying addiction, as these tasks measure how a subject allocates scarce resources (e.g., time) when searching for valuable goods (e.g., food, drug) (10). For instance, drug users can be conceptualized as foraging for resources in a patchy environment, e.g., smokers looking for the cheapest cigarettes (11).
To better assess for behavioral markers of addiction vulnerabilities using the Web-Surf Task, we added a risk component to the task, given accumulating evidence that risky decisions represent a vulnerability for substance use disorder (12). We then characterized risky outcomes according to prospect theory (13), which raises the possibility that subjects might reframe their enjoyment with regard to post-decisional outcomes. That is, they might reframe the outcome of an incurred risk (e.g., a win or loss) relative to the mid-point of the option, independent of whether the choice was the right option to take given the information at the time. For instance, the act of losing on a risky decision may impact video enjoyment regardless of whether their choice to stay and wait for that video was consistent with the offer's value.
Our overarching goal for the current study was to test whether an experiential foraging task can measure addiction-relevant behaviors, following from theories that conceptualize risky substance use within foraging models (14). More specifically, we aimed to determine 1) whether subjects showed differential responses to risky losses with respect to their enjoyment of reward and acceptance of subsequent risky deals, and 2) whether individual differences in response to risky losses predicted variation in trait-level externalizing, a risk factor for substance use disorders (15)(16)(17). We expected bad outcomes to reduce one's likelihood of accepting subsequent risky offers and for this pattern to be reversed among high-externalizing subjects (suggesting continued risk-taking despite negative outcomes).

Subjects
One hundred five undergraduate students (81% female, average age 20.2 years) from the University of Minnesota completed the current study and received compensation in the form of extra credit towards psychology courses. We targeted a sample size of around 100 subjects for our individual differences analyses (i.e., relations with externalizing scores), given an a priori power analysis indicating the need for 84 subjects to have 80% power for detecting a moderate effect size of r = 0.3 when employing a 0.05 criteria for statistical significance (based on a meta-analysis indicating small to moderate effect sizes for risk-taking and externalizing trait correlations) (18). The racial/ethnic breakdown of the sample was as follows: 63% Caucasian, 26% Asian, 4% Black/African American, 3% Hispanic, 1% American Indian/Alaskan Native, 1% Native Hawaiian/Pacific Islander, 2% other. The University of Minnesota Institutional Review Board approved the study procedures, and all subjects provided written informed consent.

Experimental Design
In the risk variant of the Web-Surf Task ( Figure 1A), subjects had 40 min to travel between galleries that provided video rewards from the four galleries described in Abram et al. (7): kittens, dance, landscapes, and bike accidents. As in the original Web-Surf and Restaurant Row tasks, subjects had a fixed amount of time to forage; this means that subjects should have made economically maximizing decisions and stayed when the subjective value of an offer exceeded its cost.
Subjects encountered serial offers that presented a set of possible delays (Figure 1B, C): on entry into a gallery, the subject was shown a gallery icon, a textual representation of the offer, a pair of web-page like delay bars showing the maximum and minimum delays that could be received on that trial (possible delays ranged from 3 to 30 s), and the option to wait through the delay for a video from that gallery or move on. If the subject chose to wait, the actual delay was revealed, the delay counted down, and a 4 s video was shown; the subject then rated the video from 1 to 4 as an indicator of how much he or she liked it (4 = highest). Enjoyment ratings were made with key presses, and the task did not proceed until subjects input a rating (thus, there were no missing ratings). Importantly, in this version of the task, punishment was inescapable: subjects were locked in after making a stay choice (after which the delay began to count down). After each trial (regardless of the choice to stay or skip), the subject had to perform a short "travel" task, which entailed clicking the numbers 1 to 4 (presented in a darker shade of gray) as they randomly appeared around the screen (shown in a lighter gray). This travel task produced a cost to leaving an offer before getting to the next offer and was analogous to the travel time required as rats move between feeders during Restaurant Row.
Risky and non-risky trials were intermixed. Risk level was reflected by the variance of an offer and was either 0 (non-risky) or greater than 0 (risky, see Figure 1). Risky trials consisted of an offer with a range of delays (e.g., 5, 10, or 15 s), and each offer varied according to the set of possible delays and spread between the shortest and longest delay. (We did not allow for non-integer mid values in the risky trials, e.g., "Video in 5, 5.5, or 6 secs…" could not occur.) Critically, for risky trials, the true delay was only revealed if the subject elected to stay. Subjects were not informed of the probabilities associated with receipt of the different delays on risky trials. In comparison, non-risky trials presented offers with three identical delays, e.g., "Video in 7, 7, or 7 secs…" We further classified risky trials as good or bad based on their outcome: receipt of the low delay on a risky trial was a "good" outcome, while receipt of the high delay was a "bad" outcome. (Following the framing effects from prospect theory, our definitions derive from an offer's outcome type but not value, meaning that a bad outcome could have a delay below one's threshold.) We were particularly interested in situations where the subject accepted a risky offer and received the bad outcome, i.e., the subject took a risk and "lost. " We contrasted these trials with a control condition, in which the subject accepted a non-risky offer of equivalent value, and with situations characterized by relief, where the subject received the good outcome on a risk trial, i.e., the subject took a risk and "won". Importantly, the decision to stay or skip the offer on a non-risky trial, in which the true offer delay is known, can be assumed to be economically valid (i.e., correctly judged, not a mistake).
All subjects first underwent a training phase that entailed eight practice trials (two cycles through all four galleries, presented in the same order as the main task). After completion of the training phase, the subject had the opportunity to ask questions of the examiner before advancing to the main test phase.

Choice (Non-Risk)
Video in 15 secs... (B, C) Flow diagram illustrates sequencing between risky (B) and non-risky (C) trials. For a risky trial, the true delay was only revealed if the subject stayed. If they instead skipped, they advanced directly to the travel task before encountering the next offer. The travel task entailed clicking the numbers 1-4 as they appeared around the screen (traveling required five random number selections).
inventory captures a range of traits and behaviors associated with the externalizing spectrum of psychopathology, including general disinhibition processes (e.g., theft, irresponsibility), substance use/abuse, and callous aggression. 1 Total ESI scores were acquired by summing across all items in the inventory (20) and then applying a log-transformation to improve normality.
To assess whether behavior on the risk variant of the Web-Surf Task was related specifically to substance abuse tendencies versus externalizing behavior more broadly, we computed the three ESI subfactors: general disinhibition (which captures impulsivity and irresponsibility), substance abuse (which captures recreational and problematic substance use), and callous aggression (which captures physical/relational aggression and lack of empathy) (21,24). Lastly, we computed three subscales from the substance abuse subfactor that measure problems associated with substance use: alcohol problems, marijuana problems, and drug problems; here, our aim was to further explore whether task behaviors predicted substance-related consequences or harms. Examples of questions in these subscales are: "My drinking led to problems at home, " "I've broken the law to get money for drugs, " and "At times, marijuana has been more important to me than work, friends, or school. " Because many subjects were non-responders on the problem scales, we encountered a zero-inflation problem. We thus isolated subjects who endorsed at least one item on the subscale, as individuals already experiencing negative consequences (evidence of behavioral disinhibition) are at greater risk for developing an alcohol or substance use disorder (25); 22 subjects (21%) were retained for the alcohol problem subscale analyses, versus 18 subjects (17%) for the marijuana problem subscale analyses, and 19 subjects (18%) for the drug problem subscale analyses.

Specialized Procedures
Heaviside step function: a piecewise function denoted H(x), where H(x) = 0 for x < 0, H(x) = ½ when x = 0, and H(x) = 1 for x > 0. This function captures the point at which a signal switches from 0 to 1. We used this function to identify the point at which subjects reliably began to skip offers (which we refer to as delay thresholds; see below for details). We used a Heaviside step function as an alternative to the logistic fit function described in Abram et al. (7), as the Heaviside approach is better equipped to handle extreme cases (i.e., when a subject stayed or skipped all offers in a gallery). In such instances, the Heaviside step function produces a reasonable value (e.g., the minimal or maximal delay offered), whereas the logistic function can produce values approaching infinity.
Subject-specific delay thresholds were computed separately for each trial using a leave-one-out approach; this yielded four thresholds, one per gallery. Thresholds were indicative of revealed preferences, reflecting the delay time at which a subject reliably began to skip offers for a particular gallery. To obtain the threshold for trial i, we fit a Heaviside step function to all trials in gallery x excluding trial i. This produced a vector of thresholds with length equal to the number of trials in gallery x. Importantly, 1 Missing self-report data for 1 subject. thresholds were computed using the mid value of each offer for risky trials only. Non-risky trials were then assigned a threshold equal to the mean of the threshold vector for the respective gallery.
Expected value for non-risky trials (with a given delay): defined as the difference between the gallery-specific threshold and the offered delay. Expected value for risky trials: calculated as the average expected value of the three delays, assuming an equal likelihood for each delay (low, mid, high; see Figure 1C). For simplicity, we assumed a linear difference. Values ranged from −27 to 27, with a value of 0 meaning that the delay offer was equivalent to the revealed threshold.
Mixed-effects models: We used linear mixed-effects models to assess for group-level effects; all reported models include original p-values as well as false discovery rate (FDR)-adjusted p-values using Benjamini and Hochberg's FDR control algorithm (26). We fit models using the MCMCglmm package in R (27), which uses Markov chain Monte Carlo techniques (see below), and lmer and lsmeans, which provided nearly identical estimates, for plotting (28,29). The tilde (~) in all regression models can be read as "is modeled as a function of " (30).
Markov chain Monte Carlo (MCMC) techniques: an approach that uses random sampling to approximate the posterior distribution of a variable of interest within a probabilistic space.

Validity Analyses
We evaluated the external and face validity of the risk variant of the Web-Surf Task using methods described in Abram et al. (7). For each subject, for each gallery, we averaged the vector of delay thresholds produced using the leave-one-out method described above; this yielded four thresholds per subject. We measured external validity by correlating delay thresholds with stated preferences (i.e., average gallery ratings and posttest gallery rankings) and obtained two validity correlations per subject.

Group-Level Choice, Rating, and Reaction Time Models
Our primary choice/rating models evaluated the impact of framing (i.e., good/bad outcome) on risk seeking (i.e., subsequent choices) and reward valuation (i.e., immediate video enjoyment ratings).
The primary choice model evaluated whether the type of outcome on the previous trial influenced subsequent risk seeking or aversion. This model included choice at the current trial as the dependent variable, actual value received and outcome type at the previous trial as fixed-effect independent variables, and subject as a random effect: [Choice t ~ actual value t-1 + outcome type t-1 + (1|subject)]. This model included risky trials where the subject stayed and also received a risky offer at the next trial.
The primary rating model assessed the impact of framing effects on immediate reward valuation and included meancentered rating as the dependent variable (i.e., centered to the average of the respective gallery), actual value and outcome type at the previous trial as fixed-effect independent variables, and subject as a random effect: [Rating t ~ actual value t + outcome type t + (1|subject)]. This model included risky trials for which the subject stayed.
Lastly, we computed a secondary group-level model to examine direct relations between risk seeking/aversion and reward valuation, while considering the effects of framing and risk. In particular, we were interested in whether affective responses interacted with actual value or offer type when predicting subsequent decisions (building off the prior choice model detailed above). This model included choice at the current trial as the dependent variable; actual value, meancentered rating, and outcome type of the previous trial, and two interaction terms as fixed-effect independent variables; and subject as a random effect: [Choice t ~ actual value t-1 + rating t-1 + outcome type t-1 + actual value t-1 :rating t-1 + actual value t-1 :outcome type t-1 + (1|subject)]. In this model, outcome type coded good outcomes, bad outcomes, and non-risky offers; this metric then reflected the framing and risk manipulations.
To assess whether bad outcomes influenced the speed at which subjects made subsequent decisions, we tested a supplemental reaction time model that included logged reaction times as the dependent variable, actual value received and outcome type at the previous trial as fixed-effect independent variables, and subject as a random effect:

Global Risk-Aversion Trend and Control Models
We also constructed a set of models to investigate global trends in risk seeking/aversion and reward valuation, i.e., address the possibility that any trial-by-trial effects were better explained by cross-session effects. The global riskaversion model included choice as the dependent variable; number of videos viewed (i.e., consumed up to trial t), expected value, a risky/non-risky categorical indicator, and a video consumption × risky/non-risky interaction term as the fixed-effect independent variables; and subject as a random effect: [Choice t ~ number videos consumed t + expected value t + risky/non-risky t + number consumed videos t :risky/ non-risky t + (1|subject)]. All trials were included in the choice model.
The global risk-aversion rating model was structurally equivalent to the first but included mean-centered ratings as the dependent variable: [Rating t ~ number videos consumed t + expected value t + risky/non-risky t + number consumed videos t :risky/non-risky t + (1|subject)]. Only stay trials were included in the rating model, as subjects only rated videos during stay trials.
Based on the results of the global trend models above, we constructed a control model to assess whether any trial-by-trial effects were better explained by other risk-aversion patterns. Within this model, we controlled for global risk-aversion trends (number of videos consumed), as well as categorical (high, low, mid) and continuous (0-30 s) risk dimensions. Our intention was to determine if cross-session declines in accepting risky deals and/or the general tendency to prefer offers with lower risk, i.e., a more narrow offer window, could better account for the sequential choice effects seen. The control model was structured as follows: [Choice t ~ actual value t-1 + outcome type t-1 + number videos consumed t + risk t + (1|subject)].

Subject-Specific Choice and Rating Models
To examine individual differences, we fit subject-specific models based on the main choice and rating group-level models. For the subject-specific choice models, we included choice at the current trial as the dependent variable and actual value and outcome type of the prior trial as the independent variables: [Choice t ~ actual value t-1 + outcome type t-1 ]. We extracted the unstandardized outcome-type coefficient that reflected the subject's likelihood to stay following receipt of the good versus bad outcome, with higher values indicating an increased tendency to stay after receiving the bad outcome.
For the subject-specific rating models, we included mean-centered ratings as the dependent variable and actual value and outcome type of the prior trial as independent variables: [Rating t ~ actual value t-1 + outcome type t-1 ]. We again extracted the unstandardized outcome-type coefficient for good versus bad outcomes, with higher coefficients reflecting better ratings for the bad versus good outcome.
We correlated the subject-specific coefficients with trait-level externalizing, using robust partial correlation methods to reduce the influence of outliers and control for age, sex, and ethnicity. We included the age and sex demographic covariates based on prior research linking these variables with self-report and behavioral impulsivity measures (31), and more broadly with externalizing tendencies (32)(33)(34). We also included race/ethnicity, as substance use trajectories through young adulthood may differ by this factor (35). Our primary partial correlations related the two subject-specific coefficients with total ESI scores (distributions shown in Figure 7), and follow-up partial correlations assessed for associations with the substance abuse subfactor and subscales.

Delay-Discounting Comparison Models
Given the extensive literature using traditional binary choice tasks to evaluate externalizing and impulsivity (36)(37)(38), we tested whether metrics from a computerized monetary delay-and probability-discounting paradigm better explained individual differences in externalizing. 2 This entailed subjects making a series of binary choices between hypothetical monetary rewards of different reward magnitudes associated with different temporal delays (e.g., "Would you prefer $5 now or $10 in two weeks?") or probabilities (e.g., "Would you prefer $5 for sure or $10 with a 75% chance?"). Offers ranged from 50 cents to $10. The task lasted approximately 10 min.
A discounting rate (or k-value) was computed for the delay and probability trials separately using a hyperbolic function (39), yielding two k-values per subject. Higher k-values reflect more rapid discounting of delayed rewards and have been linked with impulsivity and addiction (40). For each subject, we checked for nonsystematic data using criteria outlined by Johnson and Bickel (41), and an R 2 value was calculated to determine how well the data points fit the hyperbolic function. 3 The median R 2 was 0.86 and 0.91 for the delay-and probability-discounting rates (i.e., logged k-values), respectively. Logged parameter distributions of k from the delay-discounting experiment showed median = −5.26 days -1 , SD = 2.05, and from the probability experiment showed median = 0.28% chance -1 , SD = 0.84. These results are comparable to those reported in a large sample of healthy adults (31) and suggest that, for our sample, a $10 reward would be generally worth $9.52 after a 10-day delay or $8.75 when equated with a 90% chance.

Subjects Were Willing to Wait for Videos and Showed Individual Preferences
Subjects performed similarly on this task to what was seen in the original Web-Surf Task (7). As shown in Figure 2D and E, subjects showed reliable thresholds that were generally correlated with ratings (median r = 0.66) and with rankings (median r = 0.60). The decision curves of both risky and non-risky decisions depict the expected sigmoid shape, where subjects typically skipped low-valued offers (i.e., expected value < 0) and stayed for high-valued offers (i.e., expected value > 0).

Loss After Risk Influences Choice and Reward Valuation
To address questions of how subjects responded to loss after risk, we examined how risky outcomes impacted decision behaviors and video ratings. Here, a given delay was framed as good, bad, or in-between (mid) depending on its placement within an offer on a risky trial. Note that the true delay was known at the outset of the non-risky trials but was only revealed after the decision to stay on risky trials. Our primary choice model shows that, when controlling for actual value, subjects were less likely to accept a successive risky offer if they previously received a bad outcome than if they had previously received a good outcome (p-adj = 0.01; Figure 3A; Table 1a). Subjects were also slower to make decisions following receipt of the bad outcome (Figure 4; Table 2), suggestive of posterror slowing in response to risky losses. Interactions between previous outcome type and actual value when predicting choices on subsequent risky offers. Black represents the control condition (equivalently valued non-risk offers). Subjects became risk-averse following risky losses of low value, versus risk seeking after risky losses of high value (whereas no associations between value and choice were detected for the relief and control conditions). (D) Mean-centered likability ratings following the receipt of the good, bad, and mid outcomes on the current risk trial. Subjects rated videos that followed bad outcomes more highly than those that followed good outcomes.
(E, F) Interactions between previous outcome type and actual value when predicting immediate likability ratings (mean-centered). After a risky loss, subjects tended to rate videos that followed a low-value offer worse than those that followed a high-value offer; the inverse pattern was found for videos linked to good outcomes. A similar pattern emerged when comparing bad outcomes and control trials. Error bars indicate within-subject standard errors. *p < 0.05; **p < 0.01; ***p < 0.001.
Follow-up models clarified these sequential choice effects using subsets of trials matched by the actual value of the previous trial. The first subset included trials for which subjects stayed and received the good or bad outcome on a risky trial and encountered risk on the following trial. The second subset included trials for which subjects stayed and received the bad outcome or stayed on a non-risky trial and encountered risk on the subsequent trial. Trials were matched on a subject-by-subject basis and then combined for the group analysis. Because each subject's contributing trials only included a portion of the possible values, we included actual value as a nested variable in the following model: [Choice t ~ actual value t-1 + outcome type t-1 + actual value t-1 :outcome type t-1 + (actual value t-1 |subject)].
We included the interaction term to test whether framing effects differentially shaped value-by-choice sequencing effects.
For the subset that matched bad-with good-outcome trials, we observed a significant outcome-by-value interaction (p-adj = 0.02; Table 1b); further analyses revealed that the negative framing of the previous outcome impacted relations between value of the previous trial and choice on the current trial (β = 0.014, CI = [0.004, 0.023], p = 0.004; Figure 3B). That is, subjects became risk-averse after receiving a bad offer of lower value and risk seeking after a bad offer of higher value. In contrast, we did not detect an association between the previous trial's value and successive choice after receipt of a good outcome (β = 0.002, CI = [−0.007, 0.011], p = 0.64). We identified a similar (but trend-level) effect for the subset that matched bad outcome with equivalent non-risk offers (outcome-by-value   interaction, p-adj = 0.08; Table 1c); follow-up analyses indicated a positive association between value and choice following receipt of a bad outcome (β = 0.012, CI = [0.004, 0.020], p = 0.012; Figure 3C), versus no association for non-risky decisions (β = 0.001, CI = [−0.007, 0.009], p = 0.78). Together, these results suggest that receipt of negatively framed outcomes (or losses), in particular, changed subsequent reward pursuit and decision-making. But to what extent do losses after risk impact the liking of a reward? Experiments have suggested that subjects take expended costs into account when making valuations (42,43). To address this question, we tested the impact of framing on ratings. We observed an opposite pattern in the primary rating model as compared to the primary choice model: where subjects rated videos that followed a bad outcome more highly than those that followed a good outcome (p-adj = 0.05; Figure 3D-F; Table 1d). We clarified these rating effects using follow-up matched-trial models that compared ratings that followed good versus bad outcomes and ratings that followed bad outcomes versus non-risky offers. We then fit the following model: [Rating t ~ actual value t-1 + outcome type t-1 + actual value t-

Global Trends Impacted Choices But Not Ratings
We found that subjects were less likely to accept a risky offer versus a non-risky offer as they consumed more videos (significant number of consumed videos × risk interaction, p-adj = 0.004; Figure 5A; Table 3a); that is, subjects became more risk-averse across the session. This interaction remained significant if the consumption variable was replaced with the number of good outcomes or bad outcomes, suggesting that this effect was not solely driven by accumulated negative experiences (but rather, risky rewards became progressively less effective in eliciting reward seeking with ongoing exposure). In comparison, we did not observe a consumption history x risk interaction (p-adj = 0.67) for the rating model ( Figure 5B; Table 3b), suggesting that ratings were less impacted by these factors.

Sequential Choice Effects Remained When Accounting for Global Trends
Based on the evidence that subjects grew risk-averse across the session, we built a control model to test whether the global risk-aversion trends (noted above) confounded trial-by-trial framing effects. The control model indicated that trial-by-trial choice effects were not better explained by consumption history (i.e., number of videos consumed) or risk level (i.e., spread of delays on a risky offer; Table 4).

Is the Effect Simply Due to Seeking Gains and Avoiding Losses?
The analyses above showed that the effect of risky trials on subsequent choices depended on the unexpected costs of the trial: a bad outcome meant spending more time than expected and was therefore a loss (worse than expected), while a good outcome meant spending less time than expected and was therefore a gain (better than expected). To test whether this was a general property of unexpected gains and losses, we turned to variability in the ratings within each gallery. While all the videos within a gallery were similar (e.g., cute videos of kittens), each individual video was different. Thus, subjects had an expectation of video quality based on their gallery preferences, but observed a specific video on completion of the delay that might have been better or worse than the average. This produced variability in the post-video ratings: for example, seeing a video rated worse FIGURE 5 | Global risk trends. (A) Subjects became more risk-averse as the task progressed. (B) Subjects' likability ratings decreased over time but did not differ between risky and non-risky offers. Error bars represent withinsubject standard errors. ***p < 0.001. than average was effectively a loss, while a video rated better than average was effectively a gain. In general, we can consider video ratings themselves as a measure of gain/loss.
We used a secondary model to test whether video ratings directly guided future choices under the different conditions of interest (Figure 6; Table 5). We found trend-level interactions between outcome and rating (p-adj = 0.06, p-adj = 0.09): following risky losses, relatively lower ratings predicted risk aversion, whereas relatively higher ratings yielded risk-seeking behaviors (β = 0.040, CI = [0.005, 0.071], p = 0.02). We did not detect associations between video ratings and subsequent choice following good outcomes (β = 0.009, CI = [-0.028, 0.039], p = 0.63) or non-risky trials (β = 0.011, CI = [-0.017, 0.043], p = 0.51). Thus, ratings only produced changes in risk seeking if in the context of bad outcomes on risky trials (akin to a win-stay/lose-shift strategy) (44). This implies that there was something different about risky losses that went beyond the mere experience of a less enjoyable reward (since a good outcome of the risky trial leading to a poorly rated video was still a loss but did not impact subsequent reward pursuit). We note that these effects remained when accounting for the number of prior videos consumed.

Failure to Learn from Loss After Risk Correlated With Externalizing Traits
To explore the importance of personality traits to risky decisionmaking, we investigated whether individuals scoring high on the Externalizing Spectrum Inventory (ESI, 19, which measures a range of impulsive, substance use, and aggressive behaviors) were less influenced by risk when making choices. Figure 7A and B shows the distribution of observed ESI values. Many subjects who typically score high on externalizing inventories, such as chronic smokers and individuals at risk for addiction, have been seen to be less influenced by risk when making choices (45,46). We examined whether such individuals exhibited similar risk-induced effects on reward valuation (i.e., video ratings). Informed by the group-level model, we computed a parameter that compared a subject's likelihood of accepting a risky offer after receipt of a good versus bad outcome on the prior trial. Individuals scoring high on the ESI showed an inverse pattern to that observed prating_badvgood_interaction = .04 prating_badvnorisk_interaction = .07 pbad = .02 pgood = .63 pno_risk = .51 FIGURE 6 | Interaction between previous outcome type and rating when predicting choices on subsequent risky offers. Following receipt of the bad outcome, subjects were more risk-averse after lower-rated videos and more risk seeking after higher-rated videos; no association was detected for the other conditions; ratings are mean-centered. Error bars represent withinsubject standard errors.   at the group level (partial r = 0.25, p = 0.008; Figure 7C); these individuals were more likely to accept a risky offer after having just received a bad outcome, signifying a potential deficiency in learning from risky losses. In contrast, the association between outcome type and ratings was unrelated to ESI scores (partial r = 0.04, p = 0.47; Figure 7D). 4 Together, these results indicate that these externalizing traits affected individual differences in reward pursuit but not reward valuation. Based on the group-level results above, we used follow-up partial correlations to probe whether reward pursuit was related to the broader substance abuse subfactor (versus general disinhibition and callous aggression), as well as its underlying problem subscales (i.e., alcohol problems, marijuana problems, drug problems. We computed one-tailed robust correlations 4 We excluded one subject with a coefficient less than 4 standard deviations below the mean. (i.e., assuming more risk seeking after bad outcomes) and report original and FDR-adjusted p-values that account for the six follow-up correlations.

Discounting Rates Did Not Explain the Effects of Externalizing Traits on Reward Pursuit
We computed a series of follow-up robust partial correlations to compare Web-Surf Task-derived metrics with those from a traditional discounting task. The first two correlations predicted total ESI scores from the log-transformed delay and probability k-values, while controlling for age, sex, and ethnicity.
Here we found that discounting rates did not significantly predict externalizing (delay k-value: partial r = 0.08, p = 0.46; probability k-value: partial r = 0.02, p = 0.88). We then checked whether k-values were related specifically to the substance abuse subfactor, given null associations with the total score and our interest in addiction liability. Similarly, k-values were unrelated to substance abuse (delay k-value: partial r = 0.05, p = 0.68; probability k-value: partial r = −.07, p = 0.57). Lastly, we tested whether the subject-level coefficient from the Web-Surf Task that indicated sequencing responses following receipt of a good versus bad outcome still predicted ESI scores, after controlling for the two k-values and additional covariates. Importantly, the Web-Surf Task parameter capturing reward pursuit following risk still predicted ESI scores, even when accounting for the two k-values (partial r = 0.25, p = 0.03).

DISCUSSION
We assessed the effects of wins and losses on reward valuation and reward pursuit in a new risk variant of the Web-Surf Task. We found that receipt of the bad outcome on a risky gamble influenced both reward valuation and reward pursuit, but in opposite directions; that is, bad outcomes after risk led to reduced reward pursuit and higher reward valuation. Follow-up analyses showed that offer value impacted these effects, whereby low-value risky losses led to risk aversion and lower-than-normal reward valuations, while high-value losses led to risk seeking and higher-than-normal reward valuations. Subjects were also slower to make decisions after bad outcomes, which points to a posterror slowing effect. There was no impact on willingness to take risks following wins after risk situations (better than expected) or after non-risky control trials. Thus, there was something unique about situations in which subjects lost after deciding to take a risk that increased plasticity in risk-seeking behaviors. Importantly, we also found that trait-level externalizing, particularly substance use tendencies, tracked whether these situations influenced future decisions. Externalizing behaviors were not better explained by performance on a traditional discounting task, highlighting the value of foraging behaviors in capturing substance use disorder vulnerabilities.
In line with our hypotheses and prospect theory (13), the framing of an offer relative to the mid-point (versus its absolute value) impacted subsequent reward pursuit and reward valuation. That is, whether an outcome was good or bad relative to the midpoint influenced a subject's performance regardless of whether he or she took the correct action, as determined by comparing risky and non-risky offers of equivalent value, where non-risky outcomes did not influence performance. Framing effects were also not better explained by global trends in risk aversion. Global trend analyses showed that subjects accepted fewer risky deals as the session progressed, which could suggest that subjects become more sensitive to punishment over time and/or that they experienced reward satiety from ongoing reward exposure. Regardless, the tendency to turn down risky deals following bad outcomes remained when accounting for global risk-aversion trends, highlighting the impact of framing effects on risky choices above and beyond other influences.
Choosing to accept a risky deal and finding oneself in the bad outcome, i.e., with a longer delay than expected, may also be seen as a regret-inducing situation. Constructs of regret suggest that regret occurs at the intersection of agency and mistake (47,48), where a subject recognizes that an alternate choice (counterfactual) would have led to a better outcome (49). Counterfactually, the subject could have "just skipped it" if only they had known they were going to get the bad deal. A similar phenomenon has been found in mice running the Restaurant Row task, in which mice show regret-related behaviors after accepting a deal and then quitting out of it, but not after spending the same amount of time deliberating over the offer before skipping it (50).
The finding for slowed reaction times after risky losses is consistent with observations in humans of post-error slowing (51-53) but contrasts with findings that rats and mice respond more quickly to the next trial after making a mistake of their own agency (8,50). There remain several differences between these tasks: 1) the human task presented here included chance and risk, while the rodent tasks were deterministic; 2) humans had brief pre-training, while rodents had months of training; and 3) humans were working for luxury items (videos), while rodents were working for their basic necessities (food intake for the day). And because rodents had a fixed amount of time to consume their meal, there was potentially more impetus to move quickly and consume more food before time ran out. Of course, it is also possible that there could be a species difference in how humans and rodents respond to these tasks, e.g., crossspecies divergences in self-evaluation processes following loss could contribute to the observed reaction time differences, although given the similarities recently seen in their response to deliberation and sunk costs (9), this may be less likely. Whether this post-error response inconsistency arises from cross-species differences in response to regret or unique task attributes remains unknown and will have to be left for future study. One possibility is that "regret" is more complicated and that there are differences between realizing that you made a mistake in a situation in which you had all the necessary information to make a better decision versus taking a risk only to find that the answer is not what you hoped for.
Our analyses also revealed that risky losses had an opposite impact on reward valuation, whereby subjects liked videos that followed a bad (long-delay) outcome more than those following a good (short-delay) outcome on risky trials, though we note that the effects of reward valuation were less robust than those for reward pursuit and should be interpreted with caution. These reward valuation results are consistent with economic observations that humans rate outcomes higher when they have spent more on them (54). This suggests that subjects have a backwards-looking view when rating videos that is consistent with explanations of sunk-cost effects seen in human and nonhuman subjects (9,55,56) and with economic explanations for the effect of anticipation on subsequent reward valuation (57). A desire to reduce cognitive dissonance, an aversive mental state that occurs when there is a discrepancy between behavior and attitude (58), could also explain higher ratings following bad outcomes. That is, subjects may have been trying to alter their attitude as a means to reduce psychological discomfort (59).
A key result from this study is that individuals exhibiting greater externalizing disorder vulnerability were more likely to accept a risky offer after receipt of a bad outcome. Critically, our findings were strongest for the substance abuse subfactor, and largely, the alcohol problem subscale, which could reflect the nature of an undergraduate sample. This risky decision-externalizing association is consistent with notions that addiction involves continued reward pursuit despite negative outcomes (60), and could reflect an inability to learn from mistakes (61). These results also speak to dimensional models of psychopathology, given that behavior is correlated with externalizing problems even in the absence of clinical diagnoses.
Compared to reward pursuit, we saw no relation between externalizing and reward evaluation following regret, suggesting that externalizing may have different associations with different facets of the decision process. One hypothesis is that high externalizers do not show differentiation in reward valuation because of a tendency to respond in a socially conforming manner. For instance, prior research suggests that striatal dopamine availability is a common link between the tendency to "fake good, " i.e., respond in a socially desirable way (62,63), and impulsivity (64). It is then possible that high-externalizing subjects may conform to the socially expected pattern when evaluating rewards. Similarly, externalizing problem behavior is highly related to cognitive distortions, which is an umbrella term that includes the rationalization (or neutralization) of deviant behavior (65). Here, high-externalizing subjects may rationalize their bad decisions with positive ratings. Future research could directly test these theories by including scales that measure socially desirable responding (e.g., the Marlowe-Crowne Social Desirability Scale; 66) or pre-conscious rationalization (65).
Our data could be explained in part by differences in temporal attention, whereby reward valuation is done by looking backwards, while changes in reward pursuit are done by looking forwards. This leads to a key question of whether these two processes are linked. We found them linked in typical subjects, but our individual-differences analyses revealed that these effects occur through separable processes: more externalizing individuals showed comparable effects of risk on reward valuation but did not subsequently modulate their reward pursuit following regret. In fact, Figure 4 suggests that people scoring high on the ESI may even show the opposite effect, becoming risk seeking after regret-inducing instances. These results are consistent with application of the temporal attention hypothesis to delay discounting, in which a preference for immediate rewards among individuals with addiction is due to a narrowing of temporal attention (67); perhaps high-externalizing subjects have a narrowed attention window that leaves valuation of recent consummatory experiences intact but reduces their capacity to evaluate distal outcomes.
As noted above, externalizing tendencies were not associated with performance on a traditional discounting task. This result diverges from established links between substance abuse and discounting (68,69). One possible explanation is that steeper discounting is more strongly tied to current substance abuse versus a liability towards substance abuse. For instance, while steeper discounting rates are observed in chronic nicotine users, discounting rates have been shown to normalize among ex-smokers (70,71). Gowin et al. (69) observed similar results, where individuals with current alcohol use disorder (AUD) had steeper discounting rates than healthy controls, but individuals with past AUD showed no difference from controls. The fact that our sample includes both individuals with a substance use/abuse history and individuals who are prone to substance use may have reduced our likelihood of capturing such a link. This explanation is in line with Isen et al. (72), who found that hypothetical delay-discounting behaviors did not predict latent trait-externalizing tendencies as similarly measured with the ESI. This again suggests that there may be weaker relationships between discounting behaviors and externalizing liability.

Limitations
A recognized limitation of the current study is the use of an undergraduate sample that was not specifically recruited based on substance use history. However, the fact that we still detected foraging-substance use relations suggests that the task is sensitive to behaviors that are likely present even at the lower end of the externalizing spectrum; this study also provides a set of foundational findings that can be tested in a confirmatory manner to clarify whether reward pursuit during foraging similarly tracks recreational and problematic substance use in the broader community and among individuals with varying levels of usage. Another limitation is the lack of consumption or craving measurements, as these factors could moderate the observed effects. We also acknowledge that the consequences of a risky loss on the Web-Surf Task is small relative to real-life consequences like filing for bankruptcy, losing transportation options following a DUI, or being imprisoned; but if we find substance use associations when the stakes are low, we might expect greater effects as substance use becomes more chronic and/or problematic.

Conclusions
Our results suggest a dissociation among individuals with greater substance use disorder vulnerability: costly experiences serve to enhance reward value but did not impact subsequent reward pursuit following regret. Taken together, a blunted sense of regret may result in an overvaluation of risky losses that in turn drives the continued pursuit of risky endeavors. Future work will assess the impact of risky losses while foraging in clinical samples.