Dissociable mechanisms of speed-accuracy tradeoff during visual perceptual learning are revealed by a hierarchical drift-diffusion model

Two phenomena are commonly observed in decision-making. First, there is a speed-accuracy tradeoff (SAT) such that decisions are slower and more accurate when instructions emphasize accuracy over speed, and vice versa. Second, decision performance improves with practice, as a task is learnt. The SAT and learning effects have been explained under a well-established evidence-accumulation framework for decision-making, which suggests that evidence supporting each choice is accumulated over time, and a decision is committed to when the accumulated evidence reaches a decision boundary. This framework suggests that changing the decision boundary creates the tradeoff between decision speed and accuracy, while increasing the rate of accumulation leads to more accurate and faster decisions after learning. However, recent studies challenged the view that SAT and learning are associated with changes in distinct, single decision parameters. Further, the influence of speed-accuracy instructions over the course of learning remains largely unknown. Here, we used a hierarchical drift-diffusion model to examine the SAT during learning of a coherent motion discrimination task across multiple training sessions, and a transfer test session. The influence of speed-accuracy instructions was robust over training and generalized across untrained stimulus features. Emphasizing decision accuracy rather than speed was associated with increased boundary separation, drift rate and non-decision time at the beginning of training. However, after training, an emphasis on decision accuracy was only associated with increased boundary separation. In addition, faster and more accurate decisions after learning were due to a gradual decrease in boundary separation and an increase in drift rate. The results suggest that speed-accuracy instructions and learning differentially shape decision-making processes at different time scales.


INTRODUCTION
When making choices under time and resources constraints, more accurate decisions are often achievable at a cost of longer time, while faster responses are more error-prone. This phenomenon of speed-accuracy tradeoff (SAT) is ubiquitous across species and tasks (Schouten and Bekker, 1967;Wickelgren, 1977;Chittka et al., 2009), from collective foraging behavior in insects (Chittka et al., 2003;Franks et al., 2003;Marshall et al., 2006) to simple perceptual decisions in mammals (Uchida and Mainen, 2003;Heitz and Schall, 2012), and to complex strategic judgments in human (Beersma et al., 2003).
Most studies on the SAT compare behavioral performance under instructions of speed or accuracy emphasis. Humans can effectively trade accuracy for speed when instructed to respond as fast as possible, or vice versa when instructed to respond accurately. A change between speed and accuracy instructions can rapidly switch one's behavior between short blocks of trials (Ratcliff and Rouder, 1998;Mulder et al., 2013) or even between two single trials (Forstmann et al., 2008;Ivanoff et al., 2008), suggesting that such instruction-induced SAT is embodied in the decision-making process. This is consistent with recent findings that the SAT in sensory-motor tasks is associated with neural activities in areas involved in perceptual decisions and cognitive control, such as (pre-) supplementary motor area, the frontal eye field, the anterior cingulate cortex, the striatum, and the dorsolateral prefrontal cortex (Forstmann et al., 2008;Ivanoff et al., 2008;Van Veen et al., 2008;Wylie et al., 2009;Blumen et al., 2011;Heitz and Schall, 2012).
While decisions can be rapidly adjusted in response to speedaccuracy instructions, they are also largely influenced by training and practice over a much longer time frame. It is well-established that prolonged practice gradually improves task performance, resulting in higher accuracy and faster responses (Logan, 1992;Heathcote et al., 2000). Similar to the SAT, the effect of perceptual learning is observed across species (Trobalon et al., 1992;Li et al., 2004) and sensory modalities (Fahle and Poggio, 2002), but there are clear distinctions between the two. For simple visual perceptual decisions, performance improvement through perceptual learning is usually specific for the stimuli similar to those used in training, and do not fully generalize to other stimuli when the tasks are difficult (Ahissar and Hochstein, 1997;Green and Bavelier, 2003). Practice on more complex tasks, however, may improve performance in other tasks (Green and Bavelier, 2003). Unlike the SAT, the perceptual learning process can be automatic, without conscious insights of the task. For example, motion discrimination improves as participants were exposed to subliminal motion stimuli when performing an motion-irrelevant task (Watanabe et al., 2001). The specificity, generalizability, and implicit nature of perceptual learning indicate changes in early sensory processing as well as top-down influences during the learning process (Gilbert et al., 2001;Furmanski et al., 2004;Yang and Maunsell, 2004;Fahle, 2005;Bao et al., 2010;. The cognitive processes underpin SAT and perceptual learning have previously been investigated by using the drift-diffusion model (DDM) (Stone, 1960;Ratcliff, 1978). The DDM belongs to a large family of decision-making models, namely sequential sampling models (Wald, 1947;Lehmann, 1959;Stone, 1960;Link, 1975;Link and Heath, 1975;Townsend and Ashby, 1983;Luce, 1986;Ratcliff and Smith, 2004;Smith and Ratcliff, 2004;Bogacz et al., 2006). These models assume that information supporting decisions is represented by a stream of noisy observations over time, and conceptualize decision-making as an information accumulation process: momentary evidence is accumulated over time, which reduce the noise in the evidence and hereby facilitate more accurate decisions. The sequential sampling models have been proven successful in providing a close fit to response accuracy and response time (RT) distributions (e.g., Ratcliff and Rouder, 1998), and are consistent with the identification of putative neural accumulators in the cortex from neurophysiological (Kim and Shadlen, 1999;Shadlen and Newsome, 2001;Roitman and Shadlen, 2002;Schall, 2002;Mazurek et al., 2003;Huk and Shadlen, 2005;Hanks et al., 2006;Gold and Shadlen, 2007) and neuroimage studies (Ploran et al., 2007;Heekeren et al., 2008;Ho et al., 2009;Kayser et al., 2010;Zhang et al., 2012).
The DDM is one of the most prominent sequential sampling models for two-choice decisions. It has been applied to a number of perceptual and cognitive tasks, including memory retrieval (Ratcliff, 1978), lexical decisions Wagenmakers et al., 2008), visual discrimination (Ratcliff, 2002;Palmer et al., 2005), and categorization (Nosofsky and Palmeri, 1997). The model implies a single accumulator integrating the sample evidence according to a stochastic diffusion process, until the accumulated evidence reaches one of the two decision boundaries, corresponding to the two choice alternatives. As such the model decomposes behavioral data into four parameters mapped on to latent psychological processes (Figure 1): boundary separation a for response caution, drift rate v for speed of accumulation, starting point z for a priori response bias, and non-decision time T er for stimulus encoding and response execution latencies Wagenmakers, 2009). Trial-to-trial variability in model parameters can be included to improve the FIGURE 1 | Examples of trajectories of the drift-diffusion model. Two decision boundaries (0 and a) represent the "leftward" and "rightward" decisions in the motion discrimination task. The drift rate v represents mean sensory evidence per unit of time. The magnitude of v is determined by the quality of the evidence. A positive v (as shown in the figure) indicates that the upper boundary is the correct choice. The diffusion process starts at a starting point between the two boundaries (denoted as a proportion of a by z) until the accumulated evidence reaches one of two boundaries. If the correct boundary is hit (blue sample paths), the model makes a correct decision. Because of noise, the model may sometime hit the incorrect boundary (red sample path). The predicted response time (RT) is the sum of the duration of the diffusion process and the non-decision time T er . model fits to experimental data (Laming, 1968;Ratcliff, 1978;Ratcliff et al., 1999;Ratcliff and Tuerlinckx, 2002).
Behavioral changes in SAT and perceptual learning can be explained by different parameter changes in the DDM. The SAT can be simply quantified by the separation of the two decision boundaries. When response speed is emphasized, the distance between decision boundaries is decreased. This reduces the amount of accumulated evidence prior to a decision (i.e., faster RT) and increase the change of hitting the wrong decision boundary (i.e., lower accuracy). When accuracy is emphasized, the distance between decision boundaries is increased and the model predicts slower RT and higher accuracy, because more evidence need to be accumulated prior to a decision. It has indeed been shown that emphasizing decision speed or accuracy leads to changes in the boundary separation (Ratcliff and Rouder, 2000). A few recent studies have also applied the DDM to perceptual learning and identified two separate learning mechanisms (Dutilh et al., 2011(Dutilh et al., , 2009Petrov et al., 2011). First, training and practice are associated with an increase in the drift rate, leading to higher accuracy and faster RT (Dutilh et al., 2009;Wagenmakers, 2009). The drift rate change is consistent with most learning theories that the quality of sensory processing improves during training (Ahissar and Hochstein, 2004). Second, perceptual learning has been shown to decrease the non-decision time, which may be due to an increase in familiarity with the stimuli and task after training (Dutilh et al., 2011(Dutilh et al., , 2009Petrov et al., 2011).
However, two important issues remain unsolved. First, although previous research proposed that emphasizing speed or accuracy influence only the boundary separation (Ratcliff and Rouder, 1998;Wagenmakers et al., 2008), recent studies showed that speed-accuracy instructions affect two other model parameters: drift rate (Vandekerckhove et al., 2011;Rae et al., in press) and non-decision time (Osman et al., 2000;Rinkenauer et al., 2004;Voss et al., 2004;Mulder et al., 2010Mulder et al., , 2013. Therefore, it is necessary to examine whether other model parameters are indeed affected by speed emphasis or accuracy emphasis instructions.
Second, previous studies of the SAT and perceptual learning have been largely independent, partly because of the different time scale on which the two effects operate. However, since speed-accuracy instructions and learning can affect the same decision parameters, it is necessary to study these two different task conditions in a single experiment. Here we test the intriguing hypothesis that the SAT be efficiently manipulated over the course of learning a new task. One might establish a stable tradeoff between speed and accuracy throughout learning, according to the task instructions. Alternatively, the effects of speed-accuracy instructions in a new task may be different from that in the same task after substantial practice.
The current study examined changes in decision performance and underlying cognitive mechanisms when SAT was manipulated throughout the course of learning. During multiple training sessions, participants learned to perform a coherent motion discrimination task under speed or accuracy emphasis (Figure 2A). Speed-accuracy instructions efficiently modulated participants' behavior between short blocks of trials across all sessions and training gradually improves performance specific to the trained directions. By fitting the DDM using Bayesian parameter estimation approach, we quantified the influence of speed-accuracy instructions and learning on the model parameters. Emphasizing decision accuracy rather than speed was related to increased boundary separation, drift rate and non-decision time at the beginning of training. In contrast, the emphasis on accuracy was only related to increased boundary separation after training. Furthermore, faster and more accurate decisions after learning are mainly due to a decrease in boundary separation and an increase in drift rate. Our results demonstrate that decisionmaking processes are differentially influenced by speed-accuracy instructions and training at different time scales and different stages of learning.

PARTICIPANTS
Six adults (four females) between the age of 21-35 years (mean age, 25.50 years) participated in the experiment. All participants were right handed with normal hearing and normal or correctedto-normal vision, and none reported a history of significant neurological or psychiatric illness. None had previous experience with the task. All participants signed a written informed consent before starting the experiment. The study was approved by the Cambridge Psychology Research Ethics Committee.

APPARATUS
The experiment was conducted in a darkened testing room. Each participant's head rested in a chinrest to stabilize the head position and control viewing distance. A computer (Dell Optiplex 745) controlled stimulus delivery and recorded behavioral responses. Visual stimulus was presented on a 21-inch CRT FIGURE 2 | Behavioral paradigm. (A) Structure of a single trial in the accuracy condition. A fixation point was presented for 1000 ms. The random dot kinematogram was then presented for a maximum of 2400 ms, during which participants made a binary decision on whether the coherent motion direction is leftward or rightward by pressing one of the two response buttons. For a correct response, a smiley face was presented for 500 and 50 points was credited. For an incorrect response, a sad face was presented and 20 points was lost, together with an auditory feedback. The payoff in the speed condition was slightly different (see section Task and Procedurefor more details). The intertrial interval (ITI) was randomized between 1200 and 1600 ms. (B) Training procedure across six sessions. In the first five sessions, half of the participants trained at two directions (30 and 210 • ), and the other half trained at two different directions (150 and 330 • ). In the sixth session, all participants performed the task at two new directions that were not presented in their first five sessions (i.e., untrained directions). monitor (Dell P1130) with a resolution of 1024 by 768 pixels and a refresh rate of 85 Hz, located 47.50 cm in front of the participants. Participants' responses were collected from a two-button response box. The experiment was written in Matlab 7.8 (The MathWorks, Natick, USA) and used the Psychophysics Toolbox 3 extensions (Brainard, 1997).

STIMULI
The stimuli were random-dot kinematograms displayed within a central invisible circular aperture (12 • diameter) on a black background (100% contrast). Dot density was 16.53 dots per deg 2 per s and the minimum distance between any two dots in each frame was 0.48 • . Each dot was white and subtended a visual angle of 0.12 • at the screen center. The motion stimulus was formed by interleaving three uncorrelated sequences of dot positions at a rate of 85 frames/s, which was similar to those described elsewhere (Britten et al., 1993;Shadlen and Newsome, 2001;Roitman and Shadlen, 2002;Pilly and Seitz, 2009). To introduce coherent motion information, in each frame a fixed proportion (10.71%) of the dots was replotted at an appropriate spatial displacement in the direction of motion (10 • /s velocity), relative to their positions three frames earlier, and the rest of the dots were replotted at random locations within the aperture. For example, three uncorrelated sets of dots were plotted in the first three frames.
A proportion of dots (i.e., the signal dots) in frame 1 moved in frame 4 with spatial displacements, and then a proportion of dots in frame 2 moved in frame 5, and so on. Signal dots that moved outside the aperture were wrapped around from the opposite direction of motion to conserve dot density and avoid attention cues along edges. The coherent dot motion in each trial was in one of four non-cardinal directions (30, 150, 210, and 330 • ).

TASK AND PROCEDURE
All participants completed six behavioral sessions conducted on different days. Participants performed a two-alternative forcedchoice task in all sessions, deciding whether the coherent motion direction of the random-dot stimulus is leftward (toward 150 or 210 • ) or rightward (toward 30 or 330 • ) (Figure 2A). Participants responded by pressing the left button (for leftward decisions) or the right button (for rightward decisions) on the response box with their right index and middle fingers. In the first five sessions, the random-dot stimulus was always presented at two possible directions along a line (e.g., 30 and 210 • ), which referred to as the trained directions. In the sixth session, the stimulus was only presented at the other two new directions (e.g., 150 and 330 • ), which referred to as the untrained directions. One-half of the participants were trained at 30 and 210 • directions and the other half of the participants were trained at the 150 and 330 • directions in their first five sessions ( Figure 2B).
Each experiment session comprised 672 trials, which were divided into 12 blocks of 56 trials. Each block had 50% leftwards motion trials and 50% rightwards motion trials at a randomized order. Participants took short breaks between blocks. The speedaccuracy manipulation was introduced at the block level: each session comprised of 6 accuracy blocks and 6 speed blocks. The first block of each session was always an accuracy block, and the order of the accuracy/speed instructions in the rest of the blocks were randomized across sessions and participants. At the beginning of an accuracy block, the text instruction "Be accurate this time" was presented on the screen in blue (RGB = 5,137,255), indicating that the participants should respond as accurate as possible. At the beginning of a speed block, the text instruction "Be fast this time" was presented in red (RGB = 255, 2, 2), indicating that the participants should respond as fast as possible. To ensure participants could easily identify the task instructions during the experiment, a text cue was presented at the top center of the screen throughout each block: "ACC" in blue (RGB = 5, 137, 255) for accuracy blocks, and "SPD" in red (RGB = 255, 2, 2) for speed blocks. Before the first and the 29th trials of each block, four parallel gray lines (RGB = 100, 100, 100, 0.05 • thick, 4 • apart) were presented within the circular aperture for 2000 ms, indicating the two possible motion directions in the current block (30 and 210 • , or 150 and 330 • ). Before the first session, each participant was familiarized with the task during a short practice run comprising 16 trials for the accuracy condition and 16 trials for the speed condition, during which the proportion of coherently moving dots was set at a high level of 80%.
Each trial began with the presentation of a fixation point (0.12 • diameter) at the center of the screen, which was illuminated for 1000 ms, followed by the random-dot stimulus onset. The stimulus was presented for a maximum of 2400 ms, during which the participants were instructed to perform the motion discrimination task under accuracy or speed emphasis. The random-dot stimulus disappeared as soon as a response was made, or the maximum duration was reached. The RT on each trial was measured from the stimulus onset until the participant made a response. Feedback was given 100 ms after the stimulus offset, followed by an intertrial interval randomized between 1200 and 1600 ms (Figure 2A).
To help the participants engage in the task and effectively adjust their decision processes to the speed-accuracy instructions, three types of feedback were given in the forms of texts, auditory beeps (tone with frequency of 600 Hz and duration of 0.15 s), and bonus points (see Petrov et al., 2011;Mulder et al., 2013 for similar multi-session designs using bonus points). If the participant failed to respond within 2200 ms or responded within 100 ms, a red warning message "Too slow!" or "Too fast!" was presented for a prolonged period (1500 ms) together with a beep, and the participant lost 50 points. In the accuracy condition, if the participant made a correct response, a smiley face was presented for 500 ms and 50 bonus points were credited. For an incorrect response, a sad face was presented for 500 ms and a beep where given, and the participant lost 20 points. In the speed condition, when the participants failed to respond within a time limit, a red text "Too slow!" and a beep was given and the participant lost 20 points. No further feedback about the accuracy of the participants' responses was given (i.e., they would also lose 20 points for a correct but overtime response). For each session and each participant, the time limit for the speed condition was defined as the 40% quantile of the RT distribution from the participant's first accuracy block in that specific session (see Mulder et al., 2013 for another way of defining participant-specific time limit). If participant's response was within the time limit, the same type of feedback was given for correct and incorrect responses as in the accuracy condition, but the participant would only lose 10 points for an incorrect response (i.e., fewer penalties for errors when instructing speeded responses). Participants started with zero bonus point at the beginning of each session and the cumulative bonus points were displayed at the bottom of the screen throughout the session.

DATA PROCESSING AND ANALYSIS
To eliminate fast guesses, trials with RT faster than 100 ms were removed from further analysis. Trials without a valid response within 2200 ms after the random-dot stimulus onset were also removed. The discarded trials only accounted for 0.3% of all trials. Decision accuracies (proportion of correct responses) and mean RTs from each session were entered into two separate repeated-measures ANOVAs for group analyses, with task conditions (accuracy and speed instructions) and sessions as factors.
Randomization tests were used to examine the statistical significance at the single-subject level (Edgington, 1995;Coolican, 2009). For example, to test whether a single participant had different RT between the speed and accuracy conditions, we first estimated the mean RT separately from each block in each session of the participant, resulting in RT samples from 36 speed blocks and 36 accuracy blocks. The observed RT difference between the two task conditions was quantified by the sample t-value (mean difference between the data from the speed emphasis and accuracy emphasis conditions divided by the standard error of the difference). If the null hypothesis is true, there is no difference between task conditions, and the samples are exchangeable between conditions. We therefore generated a null distribution of the test statistic from 100,000 permutations, with the condition label randomly shuffled in each permutation. The permutation p-value was then calculated as the proportion of the randomized samples with the test statistic exceeded the observed test statistic. The same randomization procedure was applied to test the learning effects between sessions ( Table 1).

HIERARCHICAL DRIFT-DIFFUSION MODEL
A full version of the DDM was fitted to each participant's accuracy and RT distribution. The model consists of seven parameters Wagenmakers, 2009). (1) Boundary separation a (a > 0). (2) Mean drift rate v.
(3) Mean response bias z as a proportion of boundary separation (0 < z < 1), which gives the starting point of the diffusion process relative to the two boundaries (z * a). Thus, values of z > 0.5 indicate an a priori bias toward the upper boundary (right button press) and values of z < 0.5 indicate a bias toward the lower boundary (left button press). (4) Mean non-decision time T er . (5) Normally distributed trial-by-trial variability in drift rate s v . (6) Uniformly distributed trial-by-trial variability in response bias s z . (7) Uniformly distributed trial-by-trial variability in non-decision time s t . The model predicts a binary choice as whether the upper or the lower boundary is reached, and predicts the observed RT as a sum of the decision time (i.e., the latency for the accumulator reaching one of the boundaries) and the non-decision time.
We used the hierarchical drift-diffusion model toolbox to fit the data (Wiecki et al., 2013). The hierarchical extension of the DDM assumes that the model parameters for individual participants are random samples drawn from group-level distributions, and uses Bayesian statistical methods to simultaneously estimate all parameters at both the group level and the individual-subject level (Vandekerckhove et al., 2011). The Bayesian approach for parameter estimation has two advantages. First, the Bayesian approach is more robust in recovering model parameters when less data is available (Matzke et al., 2013;Wiecki et al., 2013). Second, Bayesian estimation generates joint posterior distributions of all model parameters, given the observed experimental data. The posterior parameter distribution provides not only a point estimate, but also uncertainty of the estimate, and can be straightforwardly applied for Bayesian inference (Gelman et al., 2004). For example, let P Post|Data (a accuracy ) and P Post|Data (a speed ) be the marginal posteriors for the boundary separation from the accuracy and speed conditions. To test whether the boundary separation in the accuracy condition is larger than that in the speed condition, we can directly calculate the probability that the difference between the two parameters is larger than zero P Post|Data (a accuracy -a speed > 0) from the posterior distributions, and a high probability indicates strong evidence in favor of the testing hypothesis.
Performance differences between speed-accuracy conditions and between sessions suggest changes in one or more model parameters across task conditions and sessions. We therefore examined seven variants of the DDM with different parameter constrains between the two task conditions. The seven models differed on whether the boundary separation a, the drift rate v, the non-decision time T er , or a combination of the three parameters varied between the accuracy and speed conditions (Figure 4). In all the models, the four key parameters (a, v, T er , and z) were allowed to vary between sessions and were estimated at both individual-subject level and group level. The trial-by-trial variability parameters (s v , s t , and s z ) were shared between sessions and were estimated only at the group level, because it has been shown that the DDM with variability parameters fixed across multiple sessions provided a better explanation of the data (Liu and Watanabe, 2012). Similar to previous studies, the response bias parameter was set to vary between sessions but was invariant between task conditions (Mulder et al., 2013).
For each model, we generated 15,000 samples from the joint posterior distribution of all model parameters by using Markov chain Monte Carlo methods (Gamerman and Lopes, 2006) and discarded the first 5000 samples as burn-in (see Wiecki et al., 2013 for a more detailed description of the procedure). The convergence of the Markov chains were assessed using Geweke statistic (Gelman and Rubin, 1992). Parameter estimates in all models were converged after 15,000 samples.
During the first five training sessions, behavior performance at the trained directions gradually improved, as shown by a significant linear increase of accuracy [F (1, 5) = 102.07, p < 0.0001, partial η 2 = 0.95] and a linear decrease of RT [F (1, 5) = 53.37, p < 0.001, partial η 2 = 0.91] over training. To examine whether the behavioral improvement at the trained directions can be generalized to another direction, we compared participants' performance between the 5th session (i.e., the last session at the trained directions) and the 6th session (i.e., untrained directions after training). The learning effect on decision accuracy was specific to individual participants' trained directions, as the accuracy was significantly lower at the untrained directions than the trained directions [F (1, 5) = 73.56, p < 0.0001, partial η 2 = The SAT effects compared the behavioral performance between accuracy and speed conditions across all sessions. The learning effects compared the performance between session 1 and 5. The learning generalization effects compared the accuracy and RT between session 5 and 6 (i.e., performance at the untrained directions).

Differences between conditions were quantified by sample t-values. Each p-value was obtained from 100,000 permutations of data samples (see section Data
Processing and Analysis for details). These results indicate strong group effects of speed-accuracy instructions and learning in perceptual decisions. Since the experiment collected substantial amount of data from individual participants, it is effective to further examine whether each individual's performance is consistent with the group effects above (Coolican, 2009;Barnett et al., 2012). We therefore conducted single-subject randomization tests (Bulté and Onghena, 2008, see section Data Processing and Analysis for details), estimating the main effects of task instructions across all sessions, the effect of learning, and generalization between trained and untrained directions for each participant ( Table 1). Four participants had significantly higher decision accuracy and slower RT across sessions when instructed to trade speed for accuracy, with a trend effect in the accuracy in two participants (S01 and S02 in Table 1). After training, significant improvements in both accuracy and RT were observed in five out of six participants, except one participant (S03) who had faster RT but no significant accuracy change after training. Four participants had significantly lower accuracies at the untrained directions than the trained directions after training. These analyses suggested that the single-subject data are largely consistent with the group inferences.

HIERARCHICAL DRIFT-DIFFUSION MODEL FOR SPEED-ACCURACY TRADEOFF AND LEARNING
To examine which model parameters account for the effects of speed-accuracy instructions during learning, we considered seven variants of the hierarchical DDM, varying systematically in constraints on whether three model parameters (a, v, and T er ) were invariant or varied across the task conditions. We used a Bayesian parameter estimation procedure to draw samples from the joint posterior distributions of all the parameters in the hierarchical FIGURE 4 | The deviance information criterion (DIC) value differences between the seven variants of the drift-diffusion model and the best model. The models differ on whether the boundary separation a, mean drift rate v, and mean non-decision time T er can vary between the speed and accuracy conditions. The model structures are shown below the figure. The black square indicates that the corresponding parameter can vary between the speed emphasis and accuracy emphasis conditions, and the white square indicates that the parameter is invariant between the two task conditions. The best model with the minimum DIC value had variable a, v, and T er (model 1, DIC = 9474.03).
DDM (Vandekerckhove et al., 2011;Wiecki et al., 2013). The posterior samples represents parameter estimates and their uncertainties after having observed the data (i.e., response and RT distributions) (Gelman et al., 2004). Model fits were assessed by comparing each model's deviance information criterion (DIC) value (Spiegelhalter et al., 2002), which has a degree of penalty for additional free model parameters.
The best model (the one with the lowest DIC value) to describe the data across task conditions, sessions and participants allows the boundary separation a, mean drift rate v, and mean nondecision time T er all to vary between speed and accuracy conditions (model 1 in Figure 4). The second best model had varied a and T er but invariant v between SAT conditions, which had a DIC value 10.37 larger than the best model (model 3 in Figure 4). The model with only varied v but invariant a and T er (model 6 in Figure 4) provided the worst fit among the seven models. Thus, changes in the mean drift rate are less likely to significantly account for the observed speed-accuracy effects. In later analysis, we focused on the best model with the minimum DIC value 1 .
To evaluate the overall model fit, we generated posterior model predictions of the best model by simulate the same amount of predicted data as observed in the experiment using posterior 1 Conventionally, a DIC difference of more than 10 indicates that the evidence supporting the best model is substantial (Burnham and Anderson, 2002). Because the difference of DIC values between the best and the second best model is close to this criterion, we repeated the same analysis on parameters estimates as in section hierarchical drift-diffusion model analyses for the second best model. The parameter changes between task conditions and sessions remain significant in the second best model. estimates of the model parameters. There was very good agreement between the observed data and the model predictions across conditions and sessions (Figure 5).

HIERARCHICAL DRIFT-DIFFUSION MODEL ANALYSES
The hierarchical DDM incorporates parameters estimates (a, v, T er , and z) at the individual-subject level and population estimates of these parameters at the group level (Wiecki et al., 2013). We used two complementary approaches to determine the effects of speed-accuracy instructions and learning on the model parameters. First, for each parameter at the individual-subject level, the mean of its posterior distribution was used as a point estimate for group analysis. Second, for each group-level parameter, the mean and the standard deviation of its posterior distribution were used to quantify group-level measures and estimation uncertainties (Figure 6). We also used the group-level posteriors to compare two parameters in Bayesian methodology (Lindley, 1965;Berger and Bayarri, 2004;Kruschke, 2010, see section Data Processing and Analysis for details). For simplicity, below we used p to refer to classical frequentist p-value from ANOVA, and P P|D to refer to the proportion of the posteriors supporting the testing hypothesis at the group level. Figure 6A showed the posterior mean and standard deviation of the boundary separation for each task condition and session. The boundary separation was significantly larger in the accuracy conditions than in the speed conditions [F (1, 5) = 16.21, p < 0.01, partial η 2 = 0.76, P P|D = 0.95]. Post-hoc tests showed significant differences between SAT conditions in all sessions (p < 0.05, Wilcoxon signed ranks test, P P|D > 0.93). The interaction between the SAT condition and session is not significant [F (5, 25) = 0.34, p = 0.89 partial η 2 = 0.06], suggesting similar extent of the speed-accuracy effect on boundary separation across sessions.

Drift rate
The mean drift rate ( Figure 6B) did not significantly differ between SAT conditions across all sessions [F (1, 5) = 2.93, p = 0.15, partial η 2 = 0.37, P P|D = 0.76], consistent with our model comparison result that the mean drift rate is not the main factor in explaining the effects of speed-accuracy instructions. Interestingly, there was a marginal interaction effect between task conditions and sessions before and after training (sessions 1 and 5) [F (5, 25) = 6.14, p = 0.06, partial η 2 = 0.55], which is mainly driven by the higher mean drift rate in the accuracy condition FIGURE 5 | Posterior predictive data distributions for the task conditions and sessions from the best fit model. The distributions along the positive x-axis indicate correct response times, and the distributions along the negative x-axis indicate error response times. Each panel shows the normalized histograms of the observed data (bar plots) and the model prediction (black lines). The area under the curve at positive x-axis is therefore corresponding to the observed and predicted proportion correct. To generate model predictions, for each participant and each model parameter, we drew 500 sampled values from that participant's joint posterior distribution of the model parameters, which give 500 posterior parameter sets. Each sampled parameter set was then used to simulate the same amount of model-predicted data as observed in the experiment. The simulated RT distributions of correct and error trials were then averaged across the parameter sets as posterior model predictions. Data from individual participants are pooled together.
than the speed condition in the first session (p < 0.05, Wilcoxon signed ranks test, P P|D = 0.86).
The main effect of session on the mean drift rate was significant [F (5, 25) = 118.50, p < 0.00001, partial η 2 = 0.96], with a linear increase in the first five sessions at the trained directions [F (1, 5) = 350.98, p < 0.00001, partial η 2 ]. The drift rate at the untrained directions was lower than that at the trained directions after learning [F (1, 5) = 217.53, p < 0.00001, partial η 2 = 0.98, P P|D ≈ 1], consistent with the observed data that improvements in accuracy did not transfer to the untrained directions after learning.

Non-decision time
The non-decision time ( Figure 6C) was larger in the accurate condition than in the speed condition [F (1, 5) = 8.21, p < 0.05, partial η 2 = 0.62, P P|D = 0.89]. Pairwise comparison within each session indicates that the effects of speed-accuracy instructions were significant in the first three sessions (p < 0.05, Wilcoxon signed ranks test, P P|D > 0.91) but not in the last three sessions (p > 0.08, Wilcoxon signed ranks test, P P|D < 0.80). No significant effect of session was observed [F (5, 25) = 1.57, p = 0.21, partial η 2 = 0.24], but there is an interaction between task conditions and sessions before and after training [F (1, 5) = 6.83, p < 0.05, partial η 2 = 0.58]. These results suggest that the speedaccuracy instructions affect the non-decision time at a larger extent at the beginning of training.

Response bias
The posterior estimates of the response bias were close to 0.5 in all sessions ( Figure 6D) and a repeated-measures ANOVA showed no effect of sessions [F (5, 25) = 0.78, p = 0.58, partial η 2 = 0.13]. Therefore, there was no significant bias toward any of the two responses or change of biases across sessions.

DISCUSSION
This study examined how the two widely observed phenomenon, SAT and perceptual learning, differentially shape decisionmaking processes over different timescales and stages of learning. Speed emphasis or accuracy emphasis, in a coherent motion discrimination task, rapidly modulated participant's behavior between short blocks of trials (fast and error-prone or slow and accurate). This tradeoff between speed and accuracy was consistent throughout training and generalized between trained and untrained directions. The model analysis suggested that accuracy emphasis, compared with speed emphasis, not only increases the total amount of evidence required to render a decision (i.e., boundary separation), but also increases the quality of the evidence being accumulated (i.e., drift rate) and the latencies on stimulus encoding and motor preparation (i.e., non-decision time). Importantly, the effect of speed-accuracy instructions on boundary separation was significant across multiple sessions, but the effect on drift rate and non-decision time was significant only at the beginning of training.
One common assumption often made is that speed-accuracy instruction influences only the boundary separation. This selective influence assumption was largely accommodated by the ability of the constrained DDM with only varied boundaries to adequately fit behavioral data under SAT manipulations (Ratcliff and Rouder, 1998;Wagenmakers et al., 2008). However, such an approach cannot rule out possible influence of speed-accuracy instructions on other model parameters. Recent studies have considered more flexible models and identified the speed-accuracy effects on drift rate and non-decision time. By reanalyzing the data from Ratcliff and Rouder (1998), Vandekerckhove et al. (2011) suggested that the SAT is better described by changes in both drift rate and boundary separation than changes in boundary alone, with larger drift rate and boundary separation under accuracy emphasis. Similarly, Rae et al. (in press) reported that a constrained model with invariant drift rate between speed emphasis and accuracy emphasis conditions would underpredict the observed decision accuracy difference between the SAT conditions, which we also noticed from simulations of the inferior model (Model 3 in Figure 4). Rae et al. (in press) also reported larger drift rate change between speed-accuracy instructions in more difficult tasks than easier tasks. Interestingly, this is consistent with our result of significant drift rate change only in the first session, because the same task is relatively difficult for participants at the beginning of their training. Furthermore, studies using the DDM with variable non-decision time between different speedaccuracy conditions suggested decreased non-decision time when response speed is emphasized (Voss et al., 2004;Mulder et al., 2010Mulder et al., , 2013. Therefore, emphasizing speed or accuracy affects multiple processes, not only the total amount of evidence needed for making a decision. We found different effects of speed-accuracy instructions on the model parameters over the course of learning. For a difficult and unfamiliar task, emphasizing accuracy resulted in increased boundary separation, drift rate, and non-decision time. Once the participants learned the task after substantial training, the effect of speed-accuracy instructions was evident only on boundary separation. These findings confirmed a substantial role of boundary separation in response to speed-accuracy instructions (Ratcliff and Rouder, 1998;Wagenmakers et al., 2008;Starns and Ratcliff, 2014) throughout learning and generalized between trained and untrained stimulus features. The influence of speedaccuracy instructions on the other two DDM parameters is not intuitive, because unlike boundary separation, changing drift rate or non-decision time itself cannot describe an inverse relationship between decision error and RT as observed in SAT: increasing drift rate results in lower decision errors but shorter RT, and increasing non-decision time results in longer RT but no change in accuracy .
Nevertheless, several possible hypotheses may explain why learning influences the drift rate and non-decision time in response to speed-accuracy instructions. First, Rae et al. (in press) proposed that the quality of information extracted from the environment improves over the course of a single decision, and the rates of the changes are identical in both speed and accuracy emphasis conditions. Since the RT is smaller when response speed is emphasized, the drift rate estimated from the speed condition is largely based on the quality of information extracted early after stimulus onset, which would be systematically lower than the information quality later in a trial (i.e., as in the accuracy condition). Second, drift rate has been linked to the allocation of attention on the task (Schmiedek et al., 2007). It is possible that speed-accuracy instructions have impacts on the balance of attentional resources allocated between the decision process and other cognitive processes. For example, speed emphasis may facilitate the monitoring of elapsed time within a trial, which limits the attentional resources for extracting information for decision-making. Third, Rinkenauer et al. (2004) examined the SAT effects on lateralized readiness potentials (Leuthold et al., 1996;Eimer, 1998;Masaki et al., 2004) and observed decreased intervals between response-locked lateralized readiness potential onset and motor responses under speed emphasis (see Osman et al., 2000 for similar results). Since lateralized readiness potential intervals refer to the duration of motor processes after a decision being made, the findings from the electrophysiological data posit a role of speed-accuracy instructions on both decision and post-decision processes. This further supports our findings of decreased non-decision time under speed emphasis, because response execution is often considered an important component described by non-decision time in the DDM . However, it is not immediately clear why the SAT effects on drift rate and non-decision making are more evident at the beginning of training. An active account is that participants change their decision strategy after they become proficient about the procedure and the task (e.g., Adini et al., 2004). In other words, participants may learn to integrate information across larger periods of the stimulus presentation, decreasing the time spent on processes outside of decision-making and hence improving performance. Or, in a more passive account, because the task becomes much easier after training, there is only a limited capacity to improve on the accuracy and RT, which in turn limits the influence of speed-accuracy instructions on the model parameters other than boundary separation. Future investigations on how learning underpins the SAT at various task difficulty levels are necessary.
Our results demonstrated distinct perceptual learning mechanisms with different properties. As expected, training with feedback led to gradual improvements in decision accuracy and speed. The learning effect on accuracy was specific to the trained directions (Liu and Weinshall, 2000), but the improvement on RT partially generalized to untrained directions after training. Unlike most previous perceptual learning studies, which have focused only on decision accuracy but ignored decision speed (e.g., Fahle and Poggio, 2002;Dosher and Lu, 2007), we used the DDM to provide a mechanistic interpretation of both accuracy and speed improvements during learning (see Dutilh et al., 2009Dutilh et al., , 2011Petrov et al., 2011;Liu and Watanabe, 2012 for similar approaches). Drift rate increased over training and the increase was specific to the trained directions, compatible with the theory that sensory processing is enhanced after learning (Karni and Sagi, 1991;Gilbert et al., 2001). This is also consistent with neurophysiological evidence that improved behavioral performance over training is accompanied by changes in sensory-driven responses of neurons in areas associated with perceptual decisions (Law and Gold, 2008). Boundary separation decreased over training and did not significantly differ between trained and untrained directions after training. Therefore, after substantial training of two motion directions, less accumulated evidence is required to discriminate coherent motion between two novel directions, even though the quality of extracted information from novel stimulus (e.g., drift rate for untrained directions) is lower. These findings further confirmed previous studies showing the learning effect on drift rate and boundary separation (Petrov et al., 2011;Liu and Watanabe, 2012).
The current study highlighted the benefits of using Bayesian methods to implement the DDM with the recently proposed hierarchical extension (Vandekerckhove et al., 2011;Wiecki et al., 2013). The hierarchical DDM is powerful in recovering model parameters with limited observed data (e.g., Jahfari et al., 2013). This feature is particularly important for the current study, because data from different training sessions need to be considered separately. One major advantage of using Bayesian methods for parameter estimation is the practicality of the obtained posterior parameter distributions. As we demonstrated in the current study, the posterior distributions can either be used to provide point estimates for classical frequentist inference, or can be directly used for Bayesian inference at both individual and group levels.
Two issues require further consideration. First, the driftdiffusion model is only an exemplar model of a large family of sequential sampling models Smith and Ratcliff, 2004;Bogacz et al., 2006;Zhang, 2012), and there are also simplified accumulator models omitting the noise in momentary evidence Heathcote, 2005, 2008). These models mainly differ in how evidence supporting different alternatives is accumulated over time. It is of theoretical interest to explore whether our findings depend on the specific structure of the models we used. For example, one recent study showed similar influence of speed-accuracy instructions on model parameters in the DDM and in an accumulator model (Rae et al., in press). Second, we used a combination of bonuses and warning messages to help participants engage in the task, which is similar to early studies using a payoff matrix with criterion time (Fitts, 1966;Pachella and Pew, 1968) This design has been proven to be efficient in modulating behavior (Dutilh et al., 2009;Petrov et al., 2011). However, it is possible that participants would adopt a different decision strategy if the feedback or payoff is changed (e.g., the ratio of correct and error bonuses, see Simen et al., 2006Simen et al., , 2009Bogacz et al., 2010;Balci et al., 2011).
In summary, we showed that the influence of speed-accuracy instructions cannot be attributed to a single change in decision boundary, but also relates to changes in other parameters that are relevant to the decision-making process and depends on the stage of learning. Future research on this topic should therefore take into account the complexity of individual's response to speed-accuracy instructions.