Rational Decision-Making in Inhibitory Control

An important aspect of cognitive flexibility is inhibitory control, the ability to dynamically modify or cancel planned actions in response to changes in the sensory environment or task demands. We formulate a probabilistic, rational decision-making framework for inhibitory control in the stop signal paradigm. Our model posits that subjects maintain a Bayes-optimal, continually updated representation of sensory inputs, and repeatedly assess the relative value of stopping and going on a fine temporal scale, in order to make an optimal decision on when and whether to go on each trial. We further posit that they implement this continual evaluation with respect to a global objective function capturing the various reward and penalties associated with different behavioral outcomes, such as speed and accuracy, or the relative costs of stop errors and go errors. We demonstrate that our rational decision-making model naturally gives rise to basic behavioral characteristics consistently observed for this paradigm, as well as more subtle effects due to contextual factors such as reward contingencies or motivational factors. Furthermore, we show that the classical race model can be seen as a computationally simpler, perhaps neurally plausible, approximation to optimal decision-making. This conceptual link allows us to predict how the parameters of the race model, such as the stopping latency, should change with task parameters and individual experiences/ability.


IntroductIon
Humans and animals often need to choose among actions with uncertain consequences, and to modify those choices according to ongoing sensory information and changing task demands. The requisite ability to dynamically modify or cancel planned actions is termed inhibitory control, considered a fundamental component of flexible cognitive control (Barkley, 1997;Nigg, 2000). In this paper, we examine optimal inhibitory control in the context of the widely studied stop signal paradigm (Logan and Cowan, 1984), where the subject's go response on a primary task, such as a twoalternative forced choice discrimination task, is interrupted by a stop signal on some trials. Subjects are instructed to withhold the go response on stop trials: a successful response cancelation is a stop success (SS), whereas a response is considered a stop error (SE, see Figure 1). Typically, SE rate increases as the presentation time of the stop signal is delayed with respect to the go stimulus onset, in a characteristic pattern known as the inhibition function (e.g., Figure 5A, adapted from Emeric et al., 2007). More subtly, SE reaction time (RT) tends to be faster than go RT (e.g., Figure 5C, adapted from Emeric et al., 2007).
The classical model for the stop signal task is the race model (Logan and Cowan, 1984), where behavioral outcome on each trial is conjectured to arise from the competition of two independent processes: go and stop. In this model, a stop trial results in error if the go process finishes before the stop process (Figure 2A). Thus, the average stopping latency, called stop signal reaction time (SSRT) by the race model, determines how successfully the observer can interrupt the go process. Since the stop process and its outcomes are unobservable, SSRT is estimated from the observed go RT distribution and the error rate on stop trials at a given stop signal delay (SSD). Specifically, if SSRT is constant, and the go and stop processes (e.g., Verbruggen and Logan, 2009b) can be seen as a simpler approximation to optimal decision-making, whereby parameters of the race model, such as the SSRT, must vary with task parameters in a systematic way to maintain the best approximation to optimal decision-making. Thus, race model-like behavior, including the well-studied SSRT, can be understood as emergent properties of rational decision-making. Altogether, our results suggest that cognitive control plays a critical role in stopping behavior, and the brain implements optimal or near-optimal decision-making, possibly via a race-model-like process, in an adaptive and contextdependent manner.

Methods
Our computational model consists of two main components: (1) a monitoring process, which models sensory inference and learning about the identity, prevalence, and timing of the stimuli as hierarchical Bayesian inference, and (2) a decision process, formalized in terms of stochastic control theory, that translates the current expectations based on sensory evidence into a choice of action. In our model, the available actions at any given time include the two possible go responses and waiting one more time step. Repeated selection of the wait option results in a stop response on a given trial.
Consistent with typical experimental design, the model assumes that subjects are given a deadline for responding, and that they must withhold the go response for the same amount of time to indicate a stop response.

the MonItorIng process: BayesIan statIstIcal Inference
The monitoring process in our model tracks sensory information about the go and stop stimuli during each trial, integrating it with prior belief about the distribution of go stimulus identity, and the prevalence and timing of the stop signal. Figure 3 shows the generative model for sensory evidence in the task. The model assumes that subjects believe there are two hidden variables, corresponding to the identity of the go stimulus (d = {0,1}), and whether or not the current trial is a stop trial (s = 1 for stop, s = 0 for go). Priors over d and s reflect experimental parameters: P(d = 1) = 0.5, P(s = 1) = r = 0.25, where r is the prior probability that a trial is a stop trial. Conditioned on d, a stream of independent and identical (iid) inputs are generated on each trial, x 1 ,…,x t … where t indexes time within a trial from go signal onset, and the likelihood functions are p(x t |d = 0) = f 0 (x t ) and p(x t |d = 1) = f 1 (x t ). Without loss of generality, we assume f 0 and f 1 to be Bernoulli distributions with rate parameters q d and 1 − q d , respectively. The dynamic variable z t denotes the presence/absence of the stop signal: if the stop signal appears at time u then z 1 = … = z u − 1 = 0 and z θ = … = 1. On a go trial, s = 0, the stop signal of course never appears, P(u = ∞) = 1. On a stop trial, s = 1, we assume that the onset of the stop signal has a constant hazard rate, i.e., u is generated from a geometric distribution: p(u|s = 1) = (1 − λ) u − 1λ . Conditioned on z t , there is a second, conditionally independent, stream of observations associated with the stop signal: p(y t |z t = 0) = g 0 (y t ), and p(y t |z t = 1) = g 1 (y t ). Again, we assume for simplicity that g 0 and g 1 are Bernoulli distributions with rate parameters q s and 1 − q s , respectively.
The counterpart to the generative model is the recognition model, which specifies statistically optimal reverse-inference of the hidden variables based on the continual stream of sensory inputs. In the stop signal task, this means computing the posterior probability about go stimulus identity, 1 … denotes all the sensory inputs associated with the go stimulus, and On a majority of trials (go trials), a central fixation dot is followed by one of two targets requiring a saccade to the indicated location. (B) On stop trials, the target presentation is followed after a short delay (SSD) by reappearance of the fixation point. A saccade on a stop trial is a stop error (SE), and a successfully canceled movement is a stop success (SS). Figure adapted from Hanes and Schall (1995).  (Logan and Cowan, 1984) proposes that the finishing times of two independent (go and stop) processes determine trial outcome: stop or go, depending on which finishes first. SSD + SSRT (stop signal reaction time) specifies the finishing time of the stop process, and determines what fraction of go trials will finish earlier and therefore result in a stop error. (B) An implementation of the race model using a drift-diffusion process, similar to (Verbruggen and Logan, 2009b). The go process consisting of a constant drift rate corrupted by additive Wiener (Gaussian, white) noise on each time step, a temporal offset (also known as the non-decision time), and a threshold for evoking the go response. The stop process is assumed to initiate at a time SSD after the go stimulus, and to take a time of SSRT (assumed to be fixed here) to reach the threshold. Whichever process finishes first determines trial outcome: go or stop.

Shenoy and Yu
Rational decision-making in inhibitory control Frontiers in Human Neuroscience www.frontiersin.org where P(s = 1|θ > t, y t ) = P(s = 1|θ > t) again does not depend on past observations: The belief state at time t, the vector b t d t s t p p = ( , ), represents all the information the ideal observer has about the stimulus properties on the current trial. Figure 4A illustrates the behavior of this inference procedure, averaged across trials, for different types of trials. The evolution of the beliefs corresponding to the identity of the go stimulus (p d ), and whether the trial is a stop trial (p s ), are shown on trials without a stop signal (GO), as well as successful (SS) and error (SE) stop trials; in all examples, true d = 1. Over time, p d increases in all three kinds of trials as sensory evidence about the go stimulus accumulates. On the other hand, p s shows an initial rise due to prior expectation and then either decays to 0 on GO trials, or rises toward 1 on stop trials. Individual trajectories are stochastic due to noise in the sensory inputs. This stochasticity induces a go response on some stop trials and not others: stop error trials (non-canceled trials) are those on which the go stimulus belief state happens to be rising fast, and the stop signal is processed slower than average. Successful stop trials show the opposite trend.

the decIsIon process: optIMal stochastIc control
Based on the belief state, subjects have to make a decision at each moment about whether to go now or wait at least one more time point in case this is a stop trial; and if they wait, they need to make the same decision again using one more data point. To model this decision process, we again assume an ideal observer implementing a Bayes-optimal decision policy. To say what is optimal, we need to specify a loss function that captures the reward structure of the task, against which the decision policy can be optimized. We assume there is a time cost of c per unit time on each trial to capture the opportunity cost of not responding quickly. Consistent with experimental design, we also assume a deadline D for responding on go trials and for determining a subject has withheld a go response long enough on a stop trial. In addition, there is a penalty c s for choosing to respond on a stop signal trial, and a unit cost for making an error on a go trial (by choosing the wrong discrimination response or exceeding the deadline for responding). Because only the relative costs matter in the optimization, we can normalize the coefficients associated with all the costs such that one of them is unit cost. Let t denote the trial termination time, so that t = D if no go response is made before the deadline, and t < D if a response is made. On each trial, the policy π produces a response time t ≤ D, as well as a binary response d∈{0,1} if a go response is made (t < D). The loss function is: Inference about the stop signal is slightly more complicated due to the hidden dynamics in z t (going from signal-absent to signal-present at a stochastic onset time). We first compute p z t , the posterior probability that the stop signal has already appeared.
where p z 0 0 = (stop signal never occurs at the same time as the go signal), and h(t) is the conditional probability that the stop signal will appear in the next instant given it has not appeared already, where, recall, r is the prior probability of a stop trial. h(t) does not depend on the observations, since given that the stop signal has not yet appeared, whether it will appear in the next instant does not depend on previous observations. In the stop signal task, a stop trial is typically considered a stop trial even if the subject makes the go response before the stop signal. Following this experimental convention, we need to compute the posterior probability that the current trial is a stop trial, denoted p P s s t t  ( =1| ) y , which depends both on the current belief about the presence of the stop signal, and the expectation that it will appear in the future:

Shenoy and Yu Rational decision-making in inhibitory control
Frontiers in Human Neuroscience www.frontiersin.org or d = 0, depending on whether p d t is greater or smaller than 0.5, respectively. The dependence of Q w t on V t + 1 allows us to recursively compute the value function backward in time. Given V t + 1 and b t , we can compute 〈V t + 1 〉 by summing over the uncertainty about the next observations x t + 1 , y t + 1 , since the belief state b t + 1 is a deterministic function of b t and the observations (Eqs 1 and 4).
The initial condition of the value function can be computed exactly at the deadline since there is only one outcome (subject is no longer allowed to go or stop): and the corresponding optimal decision policy backward in time from t = D − 1 to t = 1. In our simulations, we do so numerically by discretizing the probability space for p s t into 1000 bins; p d t is represented exactly using its sufficient statistics.
Minimizing L p over the policy space directly can seem computationally daunting, since there is no obvious parameterization of the policy space; however, Bellman's dynamic programming principle (Bellman, 1952) provides an iterative relationship in terms of the value function (defined in terms of costs here), is the belief state, and a ranges over all possible actions.
In our model, the action (decision) space consists of {go,wait}, with the corresponding expected costs (also known as Q-factors), respectively. Note that our model produces a "stop" response on a trial by repeatedly deciding to wait rather than go until the deadline is reached.
The optimal decision policy chooses the action corresponding to the smaller Q-factor, and the value function is the smaller of the Q-factors Q g t and Q w t . Note that the go action results in either d = 1 is t + t 0 , where t 0 is a temporal offset parameter denoting nondecision time (Ratcliff, 1978;Bogacz et al., 2006). For b > 0, the response is correct if h is first crossed, and incorrect if −h is first crossed; vice versa if b < 0. We note that there is redundancy in the parameterization, such that b, s, and h can all be scaled by the same constant and remain an identical process; we can therefore fix s = 1 without loss of generality. Thus, the go process has three free parameters: b, h, and t 0 . The stop process, as typically modeled in the literature, (Logan and Cowan, 1984), is assumed to have a fixed finishing time of SSD + SSRT. Since SSD is given by experimental design, SSRT is the only free parameter for the stop process. On stop trials, if t + t 0 < SSD + SSRT, an error response occurs at t + t 0 ; otherwise, it is a correct stop trial and t is assumed to take on the value ∞.
Our goal is to find a diffusion model approximation to optimal decision-making. To do this, we compare the joint distribution of RT and choice based on simulation of the optimal model, p (t,d), and that from a parametrization of the race model p(t,d|U). We look for settings of the diffusion model parameters SSRT that would minimize the KL divergence between the two distributions: where the approximation comes from the fact that the expectation of the log likelihood ratio 〈logp(t,d)/p(t,d|U)〉 under the distribution p(t,d) can be approximated by a finite sum based on samples from p(t,d) -the approximation becomes exact as n → ∞. We note the interesting observation that minimizing the KL divergence is identical to minimizing the coding cost of the samples from the true (optimal) distribution by the approximate diffusion model distributions; it is also identical to maximum likelihood estimation of parameters of the approximate model as a function of the samples generated from the true (optimal) model. To evaluate the sum, we generate n = 10,000 samples from the optimal model: where the probability of each trial being a stop trial is r, and the probability of the go stimulus being each of the two alternatives is 0.5, both given by the actual experimental design in question. We can compute p(t,d|U) exactly, up to a discretization of values of k(t), by convolving p(k(t)) with standard-normal noise and removing probability mass beyond both thresholds at the next timestep, to get p(k(t + 1)). This gives p(t,d|U) on go trials. On stop trials, we truncate the distribution at SSD + SSRT, which then gives us both the error stop trial RT distribution, as well as the error rate, for each SSD. We then sum Figure 4B illustrates how these action costs evolve over the course of a trial, averaged across trials of different types as before: go (GO) trials, stop error (SE) trials, and successful stop (SS) trials. Since the probability associated with the (correct) go stimulus identity increases with accumulating sensory evidence, the cost of going drops, eventually crossing the cost of waiting and triggering a go response. On stop trials, the onset of the stop signal initiates an increase in the cost of going, when the cost of a stop error is factored in. In error stop trials, the go cost (Q g t ) crosses the wait cost (Q w t ) before the stop stimulus is fully processed. In successful stop trials, the go cost never dips below the wait cost. The RT histograms for go and error stop trials illustrate that, although the average go cost trajectories do not cross the average wait cost, every individual trajectory crosses over at some point on each trial.

approXIMatIng optIMal decIsIon-MaKIng By a race Model
We make the relationship between optimal decision-making and race-like behavior concrete by considering a specific implementation of the race model. One reason for examining this connection is that race-like processes may serve as a neural implementation of behavior in the stopping task (Hanes et al., 1998;Pare and Hanes, 2003;Boucher et al., 2007). In particular, we examine a diffusion model implementation which has long been used to model reaction times (Stone, 1960;Laming, 1968;Ratcliff, 1978;Luce, 1986;Hanes and Schall, 1996;Gold and Shadlen, 2002;Mazurek et al., 2003;Bogacz et al., 2006). Variants of the drift diffusion model have also been applied specifically to the stop signal task (Hanes and Carpenter, 1999;Verbruggen and Logan, 2009b).
Our implementation is illustrated in Figure 2B, where the go process consists of a constant drift rate with a starting point or offset, and additive, cumulative Gaussian noise on each time step of a trial. The stop process is modeled as a fixed-latency process with a corresponding latency parameter (SSRT). Although we could easily consider a stochastic stop process with its attendant rate and threshold, we specifically wish to model SSRT as measured in practice, i.e., by using a constant-SSRT assumption (Logan and Cowan, 1984). Finally, go responses are initiated by the process crossing a threshold, unless it is at a time exceeding SSD + SSRT, which is the finishing time of the stop process -in the latter case, no response is produced. For each condition in the reward manipulation task (Section 3.2), we select values for these four parameters, rate, offset, threshold, and SSRT, in order to best approximate the cumulative RT and stop error distributions of the optimal model. The basic drift-diffusion model has the following form: where the rate parameter b determines the direction (positive or negative) and speed of the average movement of the dynamic parameter k, and dW denotes Wiener noise, such that e dW is normally distributed with mean 0 and variance e 2 dt. In practice, we simulate this process by discretizing time and approximating it with a random-walk: with drift parameter b, and normally distributed noise w ∼ N(0, s 2 ). We assume that a decision is made when k(t) first crosses the threshold h or −h, whichever first. The simulated RT for a trial The race model explains these results as well, utilizing a similar proximate explanation: later initiation of the stop process allows more go trials to "escape," giving rise to the form of the inhibition function; stochasticity in the go process allows the go process to sometimes escape the stop process, and those that do happen to escape have shorter finishing times (Logan and Cowan, 1984). A critical difference is that by focusing on the finishing times of the stop and go processes, but not their underlying computational import, the race model cannot predict a priori the effect of changes in experimental constraints on stopping behavior. We elaborate further on this contrast by considering the effect of reward manipulations on stopping behavior. Leotti and Wager (2009) showed that subjects can be biased toward stopping or going by experimentally manipulating the relative penalties associated with go and stop errors. Their experiments associated a reward for fast go response times and penalty for stop errors, and manipulated these values in an iterative fashion to induce a particular degree of bias in each subject, as measured by the fraction of stop errors committed. Figures 6A,B shows that as subjects are biased toward stopping, they make fewer stop errors and have slower go responses. Since our model explicitly parametrizes the relative costs of go and stop errors (c s in Eq. 6), we can easily simulate such a manipulation by setting c s to a higher or lower value in Eq. 6. The new cost function induces different statistics in the trajectories of the action costs as in Figure 4B. In particular, making c s larger makes expected go cost higher, as the same probability of a stop trial lead to a greater stop error cost, and this has the effect of slowing the initial downward trajectory of the go cost curve, and speed its repulsion away from the wait cost if later the stop signal is introducedthe over all effect, is to slow down go reaction times and lower stop error frequency. Simulation results from the optimal model ( Figures 6D,E filled) confirm these intuitions and are similar to subjects' actual behavior (top row).

Influence of reward structure on stoppIng
Also shown in Figures 6C,F is the measure of stopping latency (SSRT) assumed by the race model, for human behavior and for the optimal model. Since the race model's conjectured stop process is not observable, the SSRT must be inferred from the go RT distribution and the stop error distribution. In particular, if going and stopping are assumed independent, and the SSRT is approximated as constant, then the difference between the midpoints of the RT and stop error cumulative distribution functions is an estimate of the SSRT (Logan and Cowan, 1984). Note, however, that when this estimation process is applied to human data in the experiment, the SSRT changes with reward manipulation (Figure 6C), and therefore cannot be used in isolation as a subjectspecific index of inhibition. Although SSRT is not an explicit component of our framework, nevertheless the same procedure outlined above can be used to estimate it for our model simulations, yielding the very same trend (Figure 6F, filled). The close match with human behavior suggests that SSRT is an emergent property of the interaction between going and stopping, and variations in SSRT are directly explained by optimal adjustments to the tradeoff between them. the log likelihood of each sample (t i ,d i ) under p(t,d|U) over all 10,000 samples. We do so for each setting of the diffusion model parameters U, and use Matlab's fmincon function to find the best-fitting parameters.

stoppIng BehavIor as a natural consequence of ratIonal decIsIon-MaKIng
In the stop signal task, subjects typically make more stop errors when the stop signal delay (SSD) is longer, and response times on stop error trials are on average faster than go RTs (Logan and Cowan, 1984;Hanes and Schall, 1995; also see Verbruggen and Logan, 2009a for a recent review). Figure 5 shows how this behavior arises naturally as a consequence of rational decision-making in the task. Data from human subjects performing a saccade version of the stop signal task (Emeric et al., 2007 ; Figures 5A,C), and from model simulations (Figures 5B,D), show the same characteristics: error rate increases as SSD increases, and RTs on stop error trials are on average faster than go trials.
Intuitively, the later the stop signal, the more likely that the cost of going has already dropped below the cost of waiting before the stop signal information can be factored in (see Figure 4B), leading to the increasing SE curve or inhibition function shown here. Faster RT on SE trials is an outcome of stochasticity in the processing of the go and stop signals; as shown in Figures 4A,B, stop error trials are those in which the go stimulus is processed faster than average (and the stop stimulus slower than average). This difference gives rise to the observed faster RT, illustrated by the histograms in Figure 4B. the resulting race model fit is able to approximate the RT distributions and the stop error distribution functions qualitatively well, as a result of the optimization procedure selecting the appropriate race model parameters for each condition.
In summary, optimal decision-making may be implemented by a suitably parameterized race-diffusion model, suggesting one possible neural mechanism for behavior in the task. Furthermore, with an explicit procedure for fitting the race model to the optimal model, we can predict a priori how the race model parameters, such as SSRT, should change under different experimental manipulations, since the optimal model encodes the experimental parameters in a principled manner and gives precise predictions of associated behavioral changes.

dIscussIon
We presented a rational decision-making framework for inhibitory control in the stop signal task. Our framework optimizes sensory processing and action choice relative to a quantitative, global behavioral objective function that explicitly takes into account the various costs associated with go errors, stop errors, and response

race Model and ssrt as eMergent propertIes of optIMal BehavIor
We examine the relationship between the race model and optimal behavior by fitting a diffusion model implementation of the race model to output from the optimal model ( Figure 2B, see section 2 for details). We examined how parameters of the best-fitting diffusion model vary as the reward structure of the task is manipulated (i.e., c s takes on different values). The best-fitting parameters are shown in Figure 7, and indicate that the SSRT parameter indeed has to be adjusted in a manner consistent with our optimal model's predictions, as well as the experimental data (Leotti and Wager, 2009). The fit also shows that the rate and threshold do not vary substantially. However, the offset parameter (non-decision time) increases with increasing stop error cost -this is consistent with later response times, without apparent informational gain, as c s increases. In general, the best-fitting race model for each c s behaves very similarly to the optimal model (Figures 6D-F, unfilled). Figure 8 shows the race model fits resulting from this optimization procedure, with (Figures 8A,B) showing the reaction time distribution of GO and stop error trials, as well as the cumulative SE distributions from the optimal model. Figures 8C,D show that   unaffected (A, B). Changes in SSRT are similar to those in optimal model and experimental data (Figure 5). Each bar denotes average of 10 simulated "sessions, " each session consisting of 10,000 trials. Error bar = SEM.

Shenoy and Yu
Rational decision-making in inhibitory control Frontiers in Human Neuroscience www.frontiersin.org ferent task conditions in order to best approximate the optimal model, and in order to account for experimental data (Emeric et al., 2007;Leotti and Wager, 2009). Our framework can therefore guide the search for, and provide a computational understanding of, the neural mechanisms underlying stopping behavior. For example, we conjecture that FEF neurons represent and track the relative values of various available actions such as going, waiting, and cancelation. In our model, the RT distribution is the outcome of an adaptively optimal policy acting on accumulated noisy sensory evidence, in light of the global objective function. Notably, the optimal policy is deterministic given a particular sequence of sensory inputs, so that stochasticity in response latency is entirely driven by stochasticity in sensory inputs, which determine RT variance and all other higher-order moments in the RT distribution. A related but distinct framework (Daunizeau et al., 2010) considers the restricted space of non-adaptive policies where a fixed stopping time is chosen at the outset of the trial, based on minimizing the expected cost for the chosen stopping time. It is non-adaptive in the sense that it chooses a mean stop time without considering the actual sequence of sensory inputs observed, and assumes variability around that mean to arise independently from a non-sensory origin. However, substantial experimental data suggest that simple perceptual decisions involve accumulation of evidence up to a bound, related to a specific confidence level in the probability space (and therefore dependent on the actual sequence of noisy inputs observed), rather than up to a chosen stopping time (see e.g., Gold and Shadlen, 2007 for a review). Moreover, from a theoretical perspective, optimal policies for the type of stopping problems, including the stop signal task considered here as well as the simpler two-alternative forced choice tasks (e.g., Gold and Shadlen, 2007), are known to live within the delay. We show that classical behavioral results in the stop signal task are natural consequences of rational decision-making. Moreover, the model can quantitatively predict the influence of subtle manipulations of task parameters, such as reward contingencies (Leotti and Wager, 2009), on stopping behavior. Our results suggest, therefore, that cognitive processing in the task is a continual, intertwined choice between go and wait (stop), under the influence of multiple cognitive factors in a computationally optimal manner.
We also examined the relationship between the race model and the rational decision-making model. The two models are motivated by fundamentally different levels of analysis, corresponding to algorithmic and computational models in Marr's (1982) levels of analysis. Despite its elegant simplicity and ability to explain a number of classical behavioral results, the descriptive nature of the race model precludes an a priori prediction of how behavior should change in order to accommodate various cognitive goals and task constraints. On the other hand, the optimal model requires complex computations unlikely to be directly implemented by the brain. Even if subjects' behavior is similar to model predictions, the brain may well implement a simpler approximation to the optimal algorithm. Recent studies suggest that the activity of neurons in the frontal eye fields (FEF; Hanes et al., 1998) and superior colliculus (Pare and Hanes, 2003) of monkeys could be implementing a version of the race model (Hanes et al., 1998;Boucher et al., 2007;Wong-Lin et al., 2009). Specifically, movement and fixation neurons in the FEF show responses that diverge on go and correct stop trials, indicating that they may encode computations leading to the execution or cancelation of movement. If the race model is an appropriate description of these neural activities, however, we showed that the race model (and its diffusion model elaboration) will need its parameters, such as SSRT, carefully adjusted in dif- Results based on 10,000 simulated trials from the optimal model, and also from the corresponding best-fitting diffusion race model. adaptive policy space, and not in the very restrictive sub-class of non-adaptive policies (Wald and Wolfowitz, 1948;Chow et al., 1971). In particular, adaptive policies can better accommodate moment-by-moment changes in perceived sensory information (Kiani et al., 2008). We note here that the original race model is agnostic with respect to to the source of stochasticity in reaction times, taking it as the consequence of some inherent stochasticity in the unspecified go and stop processes. However, the race model can be implemented using a drift-diffusion model to make explicit the role of sensory noise in decision-making, as we demonstrate in our simulations.
In our model, a stop decision is implemented as a sequence of wait actions. Neurophysiological evidence from monkeys (Hanes et al., 1998) and humans (Aron et al., 2007a,b) suggest that successfully stopped actions may involve increased activity in certain neural populations such as the fixation neurons of the FEF, or cortical regions such as the inferior frontal gyrus and subthalamic nucleus implicated by human imaging studies. Studies in humans involving fMRI and tractography data suggest that the inferior frontal gyrus may implement a stop action via a hyperdirect pathway to the subthalamic nucleus (Aron et al., 2007a,b). One important and planned line of inquiry for our work is to consider a rational model with an explicit stop action, in order to better account for what is known about the neurophysiology of stopping.
The stop signal task is traditionally thought of as probing behavioral inhibition, whereas other tasks such as the Stroop and Eriksen tasks (Stroop, 1935;Eriksen and Eriksen, 1974) are thought to engage cognitive inhibition (see e.g., Nigg, 2000 for a taxonomy). In contrast to this view, the close correspondence between our rational decision-making model and human behavior at the task demonstrates the influence of multiple cognitive factors on stopping behavior. Our previous work also showed that behavior in the Eriksen task (Yu et al., 2009) can arise from Bayesian statistical inference in a bounded rational manner (Simon, 1956). An interesting challenge is to explore how performance measures from these various inhibitory control tasks relate to each other within individuals, both empirically and from a computational perspective (Friedman and Miyake, 2004).
One major aim of our work is to understand how stopping ability and SSRT arise from various cognitive factors. Our work shows that SSRT arises from number of contributing elements: reward/ penalty-sensitivity, sensory processing rate, and top-down expectations such as that of stop signal frequency. Thus, SSRT should not be viewed as a unique, invariant measure of stopping ability for each subject, but rather as an emergent property of the dynamic, context-dependent comparison between going and stopping. This more nuanced view of stopping ability and SSRT may aid in the careful analysis of impaired stopping ability, e.g., longer measured SSRTs, in a number of psychiatric and neurological conditions, such as substance abuse (Nigg et al., 2006), attention-deficit hyperactivity disorder (Alderson et al., 2007), schizophrenia (Badcock et al., 2002), obsessive-compulsive disorder (Menzies et al., 2007), Parkinson's disease (Gauggel et al., 2004), Alzheimer's disease (Amieva et al., 2002), and so on. It is unlikely that these various conditions share an identical set of underlying neural and cognitive deficits. In our framework, almost all model parameters, such as the fraction of stop trials, the SSD distribution, stop error cost, and go response deadline, are set directly by the experimental design. The only exceptions are parameters representing the sensory noise corrupting go stimulus and stop signal processing. These sensory parameters may be one important source of inter-subject differences. However, it is also likely that in practice, individuals have different estimates for the other parameter values on any given trial in an experiment, given their prior biases, memory capacity, individual experiences, and learning rates. Since our model makes explicit the dependence of subject behavior on subtle differences in these subject-specific parameters, these parameters can be inferred from behavioral data directly via model-fitting. In the future, we plan to use model-fitting techniques, in conjunction with calibration experiments for independent estimation of behavioral biases, to study individual and group differences in inhibitory control.

acKnowledgMents
We thank Jaime Ide, Chiang-shan Li, Gordon Logan, Martin Paulus, Rajesh Rao, Jeffrey Schall, Veit Stuphorn, and Nick Yeung for stimulating discussions and helpful suggestions. Funding was partly from CAN CTA (ARL/DCS).