Sequential sampling model for multiattribute choice alternatives with random attention time and processing order

A sequential sampling model for multiattribute binary choice options, called multiattribute attention switching (MAAS) model, assumes a separate sampling process for each attribute. During the deliberation process attention switches from one attribute consideration to the next. The order in which attributes are considered as well for how long each attribute is considered—the attention time—influences the predicted choice probabilities and choice response times. Several probability distributions for the attention time with different variances are investigated. Depending on the time and order schedule the model predicts a rich choice probability/choice response time pattern including preference reversals and fast errors. Furthermore, the difference between finite and infinite decision horizons for the attribute considered last is investigated. For the former case the model predicts a probability p0 > 0 of not deciding within the available time. The underlying stochastic process for each attribute is an Ornstein-Uhlenbeck process approximated by a discrete birth-death process. All predictions are also true for the widely applied Wiener process.


INTRODUCTION
Sequential sampling models are powerful models to account simultaneously for choice probabilities and choice response times. They have become the dominant approach to modeling decision processes in cognitive science. Their application includes a variety of psychological tasks from basic perceptual decision to complex preferential choice tasks. Early on they have been applied to identification and discrimination tasks (e.g., Edwards, 1965;Laming, 1968;Pike, 1973;Link and Heath, 1975;Heath, 1981;Ashby, 1983); memory retrieval (e.g., Stone, 1960;Ratcliff, 1978;Van Zandt et al., 2000); and classification (e.g., general recognition theory, Ashby, 2000; exemplar-based random walk models of classification, Nosofsky and Palmeri, 1997) to account for speed-accuracy data. They have also been used for preferential decision tasks (e.g., decision field theory (DFT), Busemeyer and Townsend, 1993; multiattribute dynamic decision model, Diederich, 1997;Diederich and Busemeyer, 1999) to account for choice response times and choice probabilities interpreted as preference strength; judgment and confidence ratings (Pleskac and Busemeyer, 2010); to account for selling prices, certainty equivalents, and preference reversal phenomena (Busemeyer and Goldstein, 1992;Johnson and Busemeyer, 2005). More recently, they have been applied to combining perceptional decision making and payoffs (Diederich and Busemeyer, 2006;Diederich, 2008;Rorie et al., 2010;Gao et al., 2011). Furthermore, these models have been closely linked to measures from neuroscience like multi-cell electrode recordings (e.g., Ditterich, 2006;Gold and Shadlen, 2007;Churchland et al., 2008).
Sequential sampling models assume that (1) stimulus and choice alternative characteristics can be mapped onto a hypothetical numerical value representing the instantaneous level of evidence (activation, information, or preference-the wording often depends on the context), (2) some random fluctuation of this value over time occurs, (3) this evidence is accumulated over time, and (4) a final choice is made as soon as the evidence reaches a threshold. Therefore, sequential sampling can be described as a stochastic process. Two quantities are of foremost interest: (1) the probability that the process eventually reaches one of the thresholds or boundaries for the first time (the criterion to initiate a response), i.e., first passage probability; (2) the time it takes for the process to reach one of the boundaries for the first time, i.e., first passage time. The former quantity is related to the observed relative frequencies, the latter usually to the observed mean choice response times or the observed choice response time distribution.
Two classes of sequential sampling models have been predominantly used in psychology: Random walk/diffusion models and accumulator/counter models. The former are typically applied to a binary choice task, so that evidence for one choice alternative is at the same time evidence against the other. A decision is made as soon as the process reaches one of two preset criteria. In the latter, an accumulator/counter is established for each choice alternative separately, and evidence is accumulated in parallel. A decision is made as soon as one counter wins the race to reach one preset criterion. The accumulators/counters may or may not be independent. In the following we focus on random walk/diffusion models. For a review of both diffusion models and counter models see Ratcliff and Smith (2004).
To be more precise and to introduce notation, let X(t) denote the accumulation process. For a binary choice, say between choice options A and B (Figure 1), the models assume that the decision process begins with an initial state of evidence X(0). This initial state may either favor option A (X(0) > 0) or option B (X(0) < 0) or may be neutral with respect to A or B (X(0) = 0). Upon presentation of the choice options, the decision maker sequentially samples information from the stimulus display over time, retrieves information from memory, or forms preferences, depending on the context. The small increments of evidence sampled at any moment in time are such that they either favor option A (dX(t) > 0) or option B (dX(t) < 0). The evidence is accumulated from one moment in time to the next by summing the current state with the new increment: is called the drift rate and describes the expected value of increments per unit time. The factor σ (x, t) in front of the increments W(t + h) − W(t) of a standard Wiener process W(t) is called the diffusion rate, and relates to the variance of the increments. This process continues until the magnitude of the cumulative evidence exceeds a threshold criterion, θ. The process stops and option A is chosen as soon as the accumulated evidence reaches a criterion value for choosing A (here, X(t) = θ A > 0) or it stops and chooses option B as soon as the accumulated evidence reaches a criterion value for choosing B (here X(t) = θ B < 0). The probability of choosing A over B is determined by the accumulation process reaching the threshold for A before reaching the threshold for B. The criterion is assumed to be set by the decision maker prior to the decision task.
FIGURE 1 | The trajectories symbolize the accumulation process for three different trials. In one trial (red) the process is absorbed at the boundary for making an A response. In another trial (blue) the process is absorbed at the boundary for making a B response. For the third trial (black) the accumulation process still evolves and no response is yet initiated.
The Wiener process with drift, lately called drift-diffusion model in the psychological literature (Bogacz et al., 2006), is the most widely applied model. Different versions reflect additional assumptions for specific psychological domains. Ratcliff (1978) proposed a diffusion model for memory retrieval that is used for various psychological decision tasks. It is based on the work by Laming (1968) and Link and Heath (1975) and assumes variability in the starting point (i.e., X(0) follows a uniform distribution), and the drift rate μ = μ(t) of the Wiener process is normally distributed (cf. Laming). The residual time, i.e., the time other than the decision time, such as stimulus encoding and motor response, is assumed to be uniformly distributed and added to the decision time, i.e., response time equals the decision time plus a residual (non-decision) time. For a recent overview with applications see Voss et al. (2013). Other approaches include the Ornstein-Uhlenbeck model that linearly accumulates evidence with decay (Busemeyer and Townsend, 1993;Diederich, 1997), and the leaky competing accumulator model (Usher and McClelland, 2001) that non-linearly accumulates evidence with decay.
Common to almost all of these approaches is the assumption that a single integrated source of evidence generates the evidence during the deliberation process leading to a decision. In particular, the integrated source may be based on multiple features or attributes, but all of these features or attributes are assumed to be combined and integrated into a single source of evidence, and this single source is used throughout the decision process until a final decision is reached. Diederich (e.g., Diederich, 1995, 1997, 2003, 2008, however, assumed a separate process for each attribute 1 . The decision maker switches attention from one attribute to the next during the time course of one trial. For instance, in a crossmodal task (visual, auditory, tactile), Diederich (1995) assumed a serial processing controlled by stimulus input at given stimulus onset asynchronies (SOA). That is, the order of attributes, here a light, followed by a tone, followed by a tactile vibration, as well as the point in time when a new attribute was added, here the tone presented at t 1 (t 1 ms after the light onset) and the tactile vibration at t 2 (t 2 ms after the light onset) was determined externally by the experimental setup. In the following we will call an attention switch at predetermined, fixed times, and predefined order attributes, a deterministic time and order schedule. Often, however, neither the processing order of attributes nor the point in time when the decision maker switches attention from one attribute to the next are known or can be inferred from the experimental setup. For those cases, Diederich (1997) proposed a specific model in which attention switches from one attribute to the next with some probability. This is an instance of a random time and order schedule which will be investigated more systematically in the present study.
The purpose of this paper is to present a unified treatment of sequential sampling models for both deterministic and random time and order schedules. To do so we start with deriving expressions for mean choice response times and choice probabilities for a deterministic time and order schedule before we show how they extend to random time and order schedules, including Poisson, binomial, geometric, and uniform distributions for the attention time devoted to each attribute in the sequence before attention switches to the next randomly or deterministically chosen attribute. We will provide first numerical evidence on the influence of various properties of a schedule on the predictions for mean choice response times and choice probabilities.

PRELIMINARIES
The model applies to any finite number of attributes that the decision maker may consider, i.e., k = 1, . . . , K. For convenience we first describe the process for one attribute. As underlying information process for each attribute we assume an Ornstein-Uhlenbeck process X(t) defined by where W(t) is a standard Wiener process. The parameters δ k , γ k , and σ k are characteristics of the k-th attribute. The attribute characteristics may affect the quality of the extracted evidence for choosing A over B and this quality of evidence determines the drift rate δ k . That is, the better an attribute discriminates between A and B, the larger is δ k . The parameter γ k which induces a change of the drift rate depending on the current state in the state space is often connected to memory processes (e.g., primacy and recency effects), conflict situations (e.g., approach-avoidance), or similarities between choice alternatives. Thus, together the effective drift δ k − γ k X(t) determines the direction and the velocity of the process when considering the k-th attribute at time t. Note that by setting γ k to 0 results in a Wiener process with drift. That is, all the analysis we perform in the following is also valid for the Wiener process with drift. The diffusion coefficient σ k indicates the variance of the increments of the process, for simplicity, we will set σ k = σ for all k.

MATRIX APPROACH
Stochastic processes such as the above X(t) can be approximated by a discrete time, finite state space Markov chain. We use the matrix approach since it is simple to implement, sufficient in determining the entities of interest, i.e., choice probabilities and choice response times, and flexible to account for non-stationary and non-linear properties one wishes to include for the decision making process in the future. The continuous state space where is the step size of change in evidence. To achieve convergence in the limit, the discretization parameters ( for state space, and τ for time) are tied to each other by the relation = σ √ τ .
The attribute-related matrices P k , k = 1, . . . , K, are given in their canonical form by . As → 0 (or, equivalently, τ → 0), the decision probabilities and mean choice response times obtained from the Markov chain model converge to the values obtained from the underlying continuous process X(t). The identity matrix I corresponds to the two absorbing states (−m B and m A ) associated with the two decision thresholds, one for each choice alternative; the matrix Q k contains the transient probabilities, corresponding to the updating evidence process, and the matrix R k contains the one-step transition probabilities from the transient to the absorbing states. In particular, the first column vector of the matrix R k (denoted by R B,k ) contains the transient probabilities for reaching alternative B, while the second R A,k contains the ones for alternative A. For details and derivations see Diederich (1997) and Diederich and Busemeyer (2003).

TIME AND ORDER SCHEDULE
For K attributes, each one to be considered for some specific time in some specific order it is convenient to introduce a formal schedule of both time and order. A finite time and order schedule consists of a set of L consecutive time intervals {[t l − 1 , t l ]} l=1,...,L and the attribute sequence {k l ∈ {1, . . . , K}} l = 1,...,L which specifies that during the time interval [t l − 1 , t l ] the k l -th attribute is considered. At switching time t l , l = 1, . . . , L − 1, attention switches from attribute k l to attribute k l + 1 . Depending on the situation, the final time t L may be set finite (then the decision process may also finish without deciding for one of the alternatives) or infinite. Consequently, the process X(t) determined by such a schedule is a piecewise Ornstein-Uhlenbeck process, defined over a finite partition the process is determined by (1) with k = k l . Figure 2 shows an example with three Frontiers in Human Neuroscience www.frontiersin.org September 2014 | Volume 8 | Article 697 | 3 FIGURE 2 | A piecewise Ornstein-Uhlenbeck process with three different attributes. The attribute order is (1, 2, 1, 3), attribute 1 is considered twice in the sequence of attribute consideration. Switching attention from one attribute to the next occurs at fixed times t 1 , t 2 , and t 3 .
The trajectories reflect the accumulation process for two different trials. The black solid lines indicate the effective drift of the process.
different attributes (K = 3) and a deterministic time and order schedule of length L = 4 with switching times t l independent of the trajectories, and attribute order (1, 2, 1, 3), i.e., k 1 = 1, k 2 = 2, k 3 = 1, k 4 = 3 (note that the first attribute is reconsidered once). For fixed resp. τ , the m × m transition probability matrixP n containing the transition probabilities p ii := P(X n + 1 = i |X n = i) for the n-th step of the discretetime random walk depends on the currently considered attribute defined by the time and order schedule, i.e., we setP n = P k l if n = n l − 1 , . . . , n l − 1, where n 0 = 0, τ n l ≈ t l for l = 1, . . . , L (if t L = ∞, we formally set n L = ∞).

CHOICE PROBABILITIES AND MEAN CHOICE RESPONSE TIMES
In this section we derive the choice probabilities and mean choice response times for various time and order schedules. For simplicity we assume an unbiased process, i.e., with X(0) = 0 and symmetric decision thresholds , i.e., θ A = −θ B . Since the diffusion coefficient is a scaling parameter it will be set to σ = 1 for all attributes throughout. We start with the deterministic time and order schedule.

DETERMINISTIC TIME AND ORDER SCHEDULE
The evidence accumulation process for attribute k 1 , which is considered first, evolves until time t 1 when the second attribute k 2 comes into consideration, triggering a change in the accumulation process. This attribute in turn is considered until time t 2 when a third attribute k 3 is considered and so forth until a decision is initiated (or t L is reached). Let the random variables T A and T B denote the finite time when the process reaches a decision threshold θ A or −θ B , stops, and a decision response for A or B is initiated. With the switching times t l replaced by integers n l ≈ t l /τ , the choice probability Pr[choose A] = Pr(T A < ∞) is then approximated by the value p A obtained from the discrete random walk model as where Z is the probability distribution for the initial state X(0). For instance, for an unbiased process, Z would be a coordinate vector with probability 1 at state 0 halfway between the decision thresholds. The remaining vectors and matrices are those defined in (2). The evidence accumulation process for a successive attribute starts with the final evidence state of the previous attribute. Note that Z Q n 1 are defective distributions, i.e., the entries of these vectors do not sum up to 1, for the states of the random walk at discrete times n 1 , . . . , n L − 1 . Further note that the stochastic process is time Diederich, 1992Diederich, , 1995.
Similarly, the mean response time for choosing alternative A is approximated as The probability and the mean response time for choosing alternative B can be determined similarly. Note that p 0 := 1 − (p A + p B ), the probability of not making a decision until the final time t L , is strictly positive if t L < ∞. As shown in Diederich (1997), these formulas can be further compactified. We will do this below for the general case of deterministic and random schedules by deriving an efficient recursion for their evaluation.

RANDOM TIME AND ORDER SCHEDULE
The above derivation of formulas for choice probabilities and mean response times for a deterministic time and order schedule have counterparts for random schedules which we describe next in three steps.

Random time schedule
We assume that the number of discrete time steps during which attention is paid to the k-th attribute is a discrete random variable denoted by T at with given distribution. In principle, this distribution may change its type and may have different parameters, such as expected value, depending on the attribute and the attribute order {k l } l = 1,...,L . This can be used to model time pressure and other temporal effects. However, often we assume one and the same distribution type for attention times across all attributes, and allow for different parameters only.
For instance, the geometric distribution (as implicitly considered in Diederich, 1997) is given by Pr(T at = n) = (1 − r) n − 1 r, n = 1, 2, . . . , and characterized by a single parameter r > 0, with expectation 1/r and variance (1 − r)/r 2 , and the uniform distribution is defined as

Constructing random time and order schedules
We create a random time and order schedule of length L in two steps: First, given an initial distribution of k 1 ∈ {1, . . . , K}, we create the attribute sequence {k l } l=2,...,L using a non-stationary Markov chain model with transition probability matrices D (l) , l = 1, . . . , L − 1. In a second step, for each l = 1, . . . , L, the attention time T (l) at = n l − n l − 1 is created by the discrete random variable responsible for the attention time paid to the k l -th attribute, choices are independent for the different l. Consequently, t l − t l − 1 ≈ τ T (l) at is the real attention time paid to the k l -th attribute. We note that semi-random schedules, where the sequence {k l } is given deterministically, and only the T (l) at are determined as in the second step outlined above, are covered if we choose the D (l) such that d (l) k l ,k l + 1 = 1. To understand the recursive computation of choice probabilities and mean response times in this more general case, we first consider the special cases L = 1, 2, and illustrate the derivation on some distribution types of the random variable T at generating attention times by providing concrete formulas. In general, the distribution for T at is given by its probability mass distribution (pdf) and cumulative distribution function (cdf) Pr(T at ≤ n) = f n,k := n i = 0 p i,k , n = 0, 1, . . . .
We start with L = 1, and will drop the index l from the notation introduced in the previous subsection. Since the probability of choosing alternative A at the i-th step is given by Z Q i − 1 k R A,k , i = 1, . . . , T at , and T at is a random variable distributed according to (5) we get A similar formula holds for p B,k . To avoid repetition, introduce the row vector p AB,k := [p B,k , p A,k ], then The 2 × (m − 2) matrix V k depends on the attribute and its parameters via Q k , R k , and on the chosen attention time distribution and the cdf (f n,k ). For the discussed concrete attention time distributions these matrices may be precomputed, in some cases closed-form expressions can be found, e.g., for the geometric distribution with parameter r = r k we have Next we discuss choice probabilities for the case L = 2, assuming for simplicity that the attention time distribution is the same for all attributes. To save on indices, denote k 1 ≡ k , k 2 ≡ k, and D (1) ≡ D (this matrix is responsible for the random choice of k given any k ). Then the decision probability vector p AB,k ,k for reaching alternatives B or A in with attribute order (k , k) has two parts: the probabilities of having decided on while still considering the k -th attribute (i.e., T A /τ ≤ T at , where T at is the randomly generated attention time for the first attribute k ) plus the probabilities that τ T at < T A /τ ≤ T at + T at , where T at is the randomly (and independently) generated attention time for the second attribute k. On top of this, k itself is randomly chosen according to the entries in the k -th row of D. Thus, for each fixed k 1 = k and n 1 = T at according to (6) probabilities for reaching a decision after n 1 are given by Thus, for L = 2, the choice probabilities (under the assumption that k 1 = k is fixed) can be obtained as are (m − 2) × (m − 2) matrices depending on the attribute and attention time distribution type. For example, for the geometric distribution this simplifies to Z the K × 1 array with each entry equal to the initial distribution Z (and think of Z as its transpose, a 1 × K array with entries Z ), B the K × K diagonal array with the B k on the diagonal (similarly for C defined later), I the K × K diagonal array, with identity matrices I of the appropriate size on the diagonal, V the K × 1 array with the V k as entries (similarly for W defined later), and p AB the K × 2 matrix, whose rows are the choice probabilities [p A , p B ]| k 1 =k defined before in the case L = 2. Then the above result for L = 2 can be compactly written as Note that the product BD of the array B with the matrix D is interpreted as the K × K array with d k k B k as entry in row k and column k. Moreover, by iterating (8), one arrives at the formula for arbitrary L: Formulas for mean response times can be derived similarly. Indeed, for L = 1, denote by ET A,k the mean response time for reaching alternative A when considering the k-th attribute for a random time T at distributed according to (5). Then Similarly for ET B,k and et B,k . Thus, similar to (6), we can write The matrices W k can be precomputed to any accuracy at essentially the same cost as the V k . For particular distributions, the formulas can be turned into closed form expressions. Next, let us look at L = 2. By using similar notation and arguments as for choice probabilities, the quantities et A,k ,k , et B,k ,k have a part before and after T at . This, together with (10), (11), gives Thus, the counterpart of (8) is From here, combining with (8), a joint recursion for computing p AB and et AB results: We conclude this section with a few remarks. In Diederich (1997), under the name MADD/pp, a slightly different presentation of random schedules is given for the special case of geometrically distributed attention times. It is not hard to see, that (with the notation r ij used in the K = 3 example presented in Section 4.2 Diederich, 1997) our model is equivalent to MADD/pp as L → ∞, if we set r k = 1 − r kk for the parameters r of the geometrically distributed T at , k = 1, 2, 3, and d kk = 0, d kk = r kk /(1 − r kk ), k = k, for the entries of the matrix D = D (l) , l ≥ 1. The advantage of the MADD/pp model is that it provides closed form formulas for the case L = ∞, a possibility that we did not pursue here for other types of attention time distributions.
In previous sequential decision models with finite L (Diederich, 1997), the last attribute was always considered infinitely long (infinite decision horizon) to avoid the situation of no decision, i. e., p 0 > 0. This can be incorporated into the current model by modifying the definition of the matrices V k , W k corresponding to the last interval [t L − 1 , ∞) to and modifying the recursion (14) slightly. Alternatively, one can artificially change the parameters of the attention time distribution for l = L such that its expected value is sufficiently large, and make p 0 practically negligible. Since infinite decision horizons do not seem to adequately reflect the situation of a real decision process or laboratory experiment, it might be interesting to work under scenarios where t L is fixed and finite that we described in this paper.

SIMULATIONS
We present some simulations that demonstrate the predictive power of the proposed model. We focus on features that have not been considered in Diederich (1997) for the deterministic case. Throughout this section we fix certain parameters, such as σ = 1, θ A = −θ B = 10, = 1 4 , τ = 1 16 (this implies a state space size of m = 81), and always start at the neutral position X(0) = 0 between choice alternatives A and B.

IMPACT OF ATTENTION TIME DISTRIBUTIONS
First, we show how different assumptions on the randomness of the attention time T at (i.e., the time spent on considering a certain attribute) influences choice probabilities and mean response times. In the first example, we assume just two attributes with parameters δ 1 = 0.2, γ 1 = 0.03, δ 2 = 0.04, γ 2 = 0.003, both attributes favor alternative A, the first one more strongly than the second one 2 . The attributes are considered only once (L = 2), with order k 1 = 1, k 2 = 2. The first attribute is considered for time t 1 = τ n 1 , where n 1 is a random variable T at described above with given expectation N. For the second 2 Note that when looking only at the numerical values of the drift parameter δ 1 = 0.2 and the decision criterion θ A = 10 and assuming that the attention times t 1 to the first attribute are large enough it would suggest mean response times in the range T A ≈ 50 (and very small p B ). However, since γ 1 = 0.03 it leads to a negative effective drift δ 1 − γ 1 X(t) if X(t) comes close θ A , and the mean response times become much longer. This also demonstrates the effect of the parameter γ k , and a difference between Ornstein-Uhlenbeck process and Wiener process based models. attribute we compare two situations: (1) We assume an infinitely long decision horizon t 2 = ∞, and (2) we determine a finite time horizon t 2 = τ n 2 by choosing n 2 = n 1 + T at which is also T at distributed with the same expected value N. These two situations are depicted in Figures 4, 5. The graphs show choice probabilities and mean response times as functions of the expectation τ E(T at ) of the real attention times. Lines of different color represent different distributions. Distributions with a small variance, such as the Poisson distribution, the binomial distribution, and the uniform distribution with M ≈ √ N produce results indistinguishable from the deterministic case. This holds for all tested situations shown below. This means, small uncertainties in attention time spans do not influence the observable choice frequencies and mean response times. However, as the variance of the attention times grows, we see quantitative and qualitative changes. Compared to the deterministic attention time situation, the geometric distribution differs most, and the uniform distributions with M = N/2 = 150 (Unif.1) and M = N − 1 = 299 (Unif.2) are intermediate. Moreover, there is expectedly a big difference for small mean attention times between finite and infinite decision horizons. Most importantly, for the former case it predicts a probability p 0 > 0 of not deciding within the available time t 2 . We claim that for many situations, where an infinite time horizon does not represent reality well enough, our finite schedule model might be more appealing. This aspect will be pursued in further research. The attribute considered first for a random time t 1 strongly favors alternative A, followed by a second attribute which only weakly favors A but is considered indefinitely. Note that graphs for distribution types with small variance are almost indistinguishable from the graph corresponding to deterministically fixed t 1 (variance 0) and therefore are omitted here. followed by an attribute more strongly favoring A (δ 2 = 0.2, γ 2 = 0.03). As expected, the results look now different, however, the main conclusions from the previous example concerning the influence of the randomness type for attention times and the differences for finite vs. infinite time horizons remain the same. Most importantly here, the model predicts a preference reversal (i.e., choice probabilities from below 0.5 to above 0.5) as a function of attention time when one attribute is in favor of choosing alternative A and the other in favor of choosing alternative B. Parameter studies, as in Diederich (1997), will be pursued further elsewhere.

Frontiers in Human
To complete the picture, we show a three-attribute example (K = 3) in Figure 8. The chosen attribute parameters are now δ 1 = 0.04, γ 1 = 0.003, δ 2 = −0.1, γ 2 = 0, δ 3 = 0.2, γ 3 = 0.03, i.e., a weakly in favor of A, in favor of B, and strongly in favor of A sequence of attributes. Attention times for the first two attributes are chosen independently from each other but with the same distribution with fixed mean value; the last attribute is considered indefinitely.

DEPENDENCE ON ATTRIBUTE ORDER
The proposed sequential decision model is sensitive to the order in which the attributes are consider. If we consider in the aforementioned second two-attribute example the attribute in favor of A first, and then the attribute in favor of B we get very different patterns as shown in Figure 9 compared to Figure 6. A similar effect is true for the above K = 3 example. In Figure 10, the attribute in favor of B is now the last one; the graphs need to be compared with Figure 8. One interesting pattern can be observed. If the evidence for choosing one alternative decreases in the sequence of attribute consideration then the model predicts faster choice response times for the more frequently chosen alternative-a typical pattern observed in response time analysis. However, if the evidence increases in the sequence of attribute consideration then the model predicts faster choice response times for the less frequently chosen alternative which has been called fast error, as shown in Figure 11 compared to So far, all examples shown are with a fixed, deterministic attribute order with no repetitions (semi-random schedule, L = K). The evaluation of fully random time and order schedules requires larger L, and will be presented elsewhere.

CONCLUDING REMARKS
The proposed multiattribute attention switching (MAAS) model can predict a very complex choice probability/(mean) choice response time pattern. It may appear too flexible to be testable. However, this is not the case. If two attributes both favor alternative, A say, and the first attribute that is considered provides more evidence for choosing A than the second (δ 1 > δ 2 ), then the model predicts always shorter response times for the more  An attribute weakly favoring alternative A is considered first for a random time t 1 , followed by a second attribute favoring B considered for a random time t 2 − t 1 , while the last attribute (strongly favoring A) is considered indefinitely. The random attention times t 1 and t 2 − t 1 for the first two attributes are independently chosen from the same distribution. We show graphs of choice probabilities and mean response times as functions of the expected attention time E(t 1 ) = E(t 2 − t 1 ) = 10. . . 500 for different distribution types. Again, small variance distributions yield almost identical results. frequently chosen alternative, here A, regardless of the assumed underlying attention time distribution. If the order of processing these attributes is reversed, i.e., the attribute that favors alternative A less is considered first (δ 2 > δ 1 ), then the model always variability in drift rates, i.e., a statistical means where the drift rate itself is a random variable. It is difficult experimentally to disentangle the variability stemming from the stochastic process itself and the variability from the distribution of different drift rates. As Jones and Dzhafarov (2013) pointed out, the predictions of various sequential sampling models rest upon the assumptions made about the assumed probability distributions. This is not the case here. The model is falsifiable without assuming specific distributions. Rather than relying on statistical mechanisms to ensure an observed response patterns we rely on assumptions about cognitive processes such as attention switching and salience. The specific attention time distribution used for an application may be related to the experimental paradigm. For instance, when tracking eye movements, the sequence of attribute consideration and the switching times are directly observable, and a deterministic or a uniform distribution with a small variance is advisable. When all attributes are shown simultaneously, like in complex objects, and attention may shift at any moment in time a geometric distribution or a uniform distribution with a large variance may describe the situation better. Testing the model rigorously will be pursued in the future.