Automatic Generation of Randomized Trial Sequences for Priming Experiments

In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.


IntroductIon
In almost all psychological research that applies an experimental strategy, randomization of stimuli, participants and/or experimental conditions is essential for the validity of the obtained results. Many paradigms are simple enough that an on-line randomization can be applied, i.e., the experimental software can determine the stimuli to be shown by itself (e.g., by randomly choosing a stimulus from a given set). Some experimental setups, however, feature complex inter-trial dependencies such that proper randomization is more difficult. Consider for example negative priming (NP) experiments (e.g., Tipper, 1985). In these paradigms, the experimental condition of a trial depends not only on the stimuli presented in the trial (probe) but also on the preceding trial (prime). This dependency between trial i and trial i − 1 makes proper randomization difficult (because the condition in trial i + 1, in turn, depends on the stimuli in trial i). This paper presents a software that was designed to generate randomized stimulus-sequences for (negative-) priming experiments based on a heuristic optimization method known as genetic algorithms (GAs).
For illustration, we present a standard NP study (Schrobsdorff et al., 2007) which we will use as a reference throughout the manuscript to clarify the exposition of our methodology. In their study, Schrobsdorff et al. (2007) presented pictograms of everyday objects (ball, book, bench, boat, bed) in two different colors and required their subjects to voice the label of the green stimulus while ignoring the red stimulus (see Figure 2). Repetition priming and NP are realized by repeating one or both (partial vs. full repetition) of the objects from prime to probe either in the same (positive priming) or reversed colors (NP). The basic result is that reaction times and error rates are reduced for identical repetitions while they are increased for distractor-to-target repetitions (in relation to a control condition where no stimuli are repeated). 1998) which further question the practical applicability of blocked trial-presentation. Last but not least, the experimental efficiency is reduced by half of what could be achieved by continuous presentation. As a consequence, the time required for the experimental session increases, potentially causing mental fatigue which is associated with loss of cognitive control (Lorist et al., 2005) which in turn appears to be a prerequisite for NP to occur (de Fockert et al., 2010).
Continuous presentation schemes in which trial i is the probe for trial i − 1 and the prime for trial i + 1 (e.g., Kramer and Strayer, 2001;Titz et al., 2008;Behrendt et al., 2010) circumvent these problems but also make randomization much harder. This is due to the fact that the trials are not independent anymore: Changing the stimuli in display i changes the experimental condition of trial i and trial i + 1. This dependency makes a randomized presentation of stimuli over the complete sequence very difficult when factors such as the number of presented trials for each priming condition are to be balanced across the experiment. A pseudo-randomized trialpresentation is often used where the stimuli are selected based on a list that fulfills the desired properties. Generating such lists is not trivial because of the cross-trial dependencies. In addition, regarding trial-sequence generation as an optimization problem, there may not exist a perfect solution such that the "optimal" sequence will necessarily violate some of the experimental constraints.
Furthermore, care must be taken that the sequences do not induce the use of strategies by the participants. For example, in NP tasks, subjects could use the prime distractor to predict the probe target (May et al., 1995) if they were aware of the experimental manipulation. The presentation of "random" (in the sense of unpredictable) trial sequences is therefore necessary. The literature on implicit sequence learning suggests that participants are able to exploit regularity in trial sequences for responding (Stadler, 1992;Stadler and Neely, 1997;Boyer et al., 2005) even if this regularity is quite subtle as, e.g., when the sequence is generated by an abstract grammar (Visser et al., 2009). To ensure "unpredictability," many experimentalists control the presented trial-sequence in a way that some structural criteria are fulfilled. For example, it is desirable that the trials do not come in predictable patterns (e.g., multiple consecutive instances of the same experimental condition) and that stimuli appear an equal number of times. This is a tricky aspect, though: In an information-theoretic sense, any additional constraint (e.g., avoiding consecutive trials of the same experimental condition) on stimulus-sequences will reduce the entropy (the theoretical unpredictability) of the process that generated them (for an introduction to information theory, see Cover and Thomas, 2006). However, humans are more affected by local regularities and it is therefore desirable to avoid local structure in the sequences (see literature on implicit sequence learning, e.g., Stadler, 1992). The criteria applied in the design of the trial sequences are only loosely defined because it is unclear what regularities have to be avoided in order to ensure unpredictability. This poses a difficulty to computer-aided trial-sequence generation as all constraints must be formally specified. In addition, any automated optimization technique must allow for deviations from the optimum and be flexible enough to adapt to different experimental needs.
In this paper, we present a computer program that makes use of a global optimization heuristic known as genetic algorithms (GAs) (Goldberg, 1989) to generate and optimize stimulus-sequences for priming experiments. As the name suggests, GAs are inspired by natu-ral evolution and the parameters have therefore a natural equivalent in evolutionary biology. This fact allows for an intuitive understanding of how parameter changes will affect the results which is very useful when working with the optimization algorithm. Our program allows to generate trial sequences for a variety of priming tasks. All priming paradigms that include two dimensions on which stimuli are distinguishable are supported. This includes in particular naming and categorization tasks in semantic or identity priming paradigms but also other tasks, such as priming in visual search (Kristjánsson and Driver, 2008).
In the following, we will shortly outline the theory of GAs. Furthermore, we will present the program and explain how it can be used to generate trial sequences for specific experiments. Finally, we present examples of results acquired using the program, compare it to an on-line randomization approach and discuss potential extensions.

GenetIc alGorIthMs
Genetic algorithms are search heuristics that can be used to approximate a globally optimal solution to a problem. GAs have been successfully applied to many real-life problems in fields as different as economics, biology, and computer science (Goldberg, 1989). The approach is inspired by biological evolution: It mimics concepts like inheritance, mutation, natural selection, and recombination (Beasley et al., 1993a,b). Basically, the algorithm generates random solutions to the problem and improves them by means of evolutionary strategies. The operation of the algorithm therefore requires a way to generate solutions from the space of all possible solutions S to the optimization problem and a fitness function F:S → [0,1] that assigns a real number (fitness value) to each member of S.
The performance of the GA depends more on the choice of an efficient encoding scheme and the fitness function than on the settings of other more peripheral parameters. Therefore, a first problem is to find a suitable representation of valid solutions. In allusion to the biological equivalent, such a representation is called a genome, g ∈ S. The classical version of the GA requires a binary coding of the input (Goldberg, 1989): For example, if any solution can be coded as a sequence of arbitrary integer numbers, these can be represented as binary strings following the standard decimal to binary conversion. However, it is often beneficial to use a representation that is specifically tailored to the search-space and does not allow "invalid" solutions to be coded. We therefore implement a genome designed to encode exactly the valid stimulus-sequences but not more (see Section 2.2.2).
The second and most important step to apply a GA to an optimization problem is to design the fitness or objective function F. This function takes an instance of the set of all possible solutions and evaluates it in terms of how well it solves the problem, i.e., it assigns a score, the "fitness," to it. The design of this function is crucial for the performance of the algorithm as it is the only means for the algorithm to determine the efficiency of a generated solution, thereby governing the result of the optimization completely.
There are several variants of GAs but the general behavior is as follows: A random initial population P 0 = {g 1 ,…,g N } of N possible solutions is generated and a fitness value is calculated for each genome. Then, a reproductive cycle is started in which two instances from P 0 are sampled according to their fitness which are used to generate two offspring (i.e., members of the next population P 1 ) by recombination. With probability p mut , each genome from indexed by integer numbers ranging from 1 to n stim . Experimental conditions are defined by which stimuli repeat from one trial to the next. All possible repetitions of one or both stimuli are supported as shown in Table 1.
In our reference study (Schrobsdorff et al., 2007), the set of stimuli comprises the n stim = 5 objects ball, book, bench, boat, and bed. Each of these stimuli can be the target (when it is presented in green) or the distractor (red). The study implemented priming conditions with both partial (DT, TT) and full repetitions (DTTD, DDTT), see Figure 2. The software was designed with such identity NP experiments in mind but is also applicable to other priming paradigms. In fact, all priming paradigms that include two dimensions on which stimuli are distinguishable are supported. This includes in particular naming and categorization tasks in semantic or identity priming paradigms but also other tasks, such as priming in visual search (Kristjánsson and Driver, 2008).
It is also possible to use the software for generating trial sequences for semantic or affective priming tasks, even though a bit of additional work is required. Consider for example the study by Damian (2000) in which priming between same-category stimuli (vehicles, tools, animals, furniture, and clothing) was investigated. Priming was investigated by considering, e.g., whether subsequent presentation of two different tools or vehicles primed each other. In this case the categories, not the actual instances, would correspond to stimuli in our software. The mapping from category to specific object would have to occur after the sequence has been generated and is out of the scope of the software. However, this is relatively straight-forward using a spreadsheet software to replace category-labels with instances (though care must be taken to randomize the number of occurrences of each stimulus as well). We will discuss potential extensions of our software in the general discussion.

Coding and genetic operators
Currently only sequences for two distinct types of stimuli ("targets" and "distractors") can be realized. A trial-sequence "genome" g = (t,d), thus consists of two sequences (t i ,d i ) ∈ {1,…,n stim } × {1,…,n stim } with i ∈ {1,…,n trials } each of which codes for one of the n stim possible stimuli 1 . Thus, the genome g is a sequence of numbers indicating which stimulus is presented as target and distractor in each trial and S is a population undergoes a mutation (i.e., one bit of the genome is flipped, see Figure 1B). There are several possible schemes for recombination (Beasley et al., 1993a), the most classical being crossover where parent genomes are split at a random location and recombined with probability p cross (see Figure 1A). The natural selection is represented by the fitness-biased selection of genomes for recombination from the parent population. This step is repeated until the children-population is again of size N. The parent population is then dropped and the whole procedure iterated with the new population as parent population, thereby generating a sequence of populations P P P 0 1 → → →  n . The iterations end when either a maximum number of generations has been reached or the mean fitness of the population saturates (i.e., the algorithm converges).
Maybe an image is helpful to illustrate these rather technical explanations: Picture a population of rabbits living on a desolate island. The island holds enough resources to support only N individual rabbits (this is the population size parameter). There are some dangers on the island (high waves, predator animals) such that evolutionary pressure (the fitness function) encourages the mating and reproduction of fitter individuals (the reproductive cycle). During reproduction, the rabbit's genomes are recombined (cross-over) and are subject to random mutations such that child rabbits may differ considerably from their parents. Because the island can hold only N rabbits, for each child rabbit, one of the adult ones must perish. No child can perish before all adults are gone and after all adult rabbits have died, the new population begins reproducing. This corresponds to one iteration of the GA and continues until a specified number of generations have lived on the island. With such a picture in mind, it is easy to attach meaning to the parameters of the optimization algorithm and to predict how a parameter change might influence the results.

applIcatIon to trIal-sequence GeneratIon
In order to apply GAs to the problem of trial-sequence generation, a coding scheme must be devised allowing to encode trial sequences in a way that the genetic operators cross-over and mutation can be applied such that they yield valid solutions. Furthermore, a fitness function needs to be specified that encodes the experimental requirements and that makes the sometimes implicitly given requirements and assumptions explicit.

Scope of the approach
We formalize the experimental setup as follows: There are two different "types" of stimuli (i.e., target and distractor) and a number of stimulus "identities" (e.g., specific words or pictures) which are (ii) priming conditions appear a desired number of times [e.g., each condition was to appear 80 times in Schrobsdorff et al. (2007) study], (iii) only three consecutive trials of the same condition are allowed, (iv) all objects appear as target an equal number of times (e.g., there have to be as many target "benches" as target "books"), (v) all objects appear as distractor an equal number of times (e.g., there have to be as many distractor "benches" as distractor "books").
The choice of these criteria reflects an attempt to find the minimal number of criteria such that the software produces sequences that do not show any obvious structure. Additional criteria (e.g., the distribution of objects per priming condition) are usually fulfilled when a large number of trials is generated. The implementation of more or alternative criteria is straight-forward, but requires programming skills because the source-code of the application needs to be adapted.
Only desired priming conditions. This criterion is implemented as the number of trials of desired experimental conditions divided by the number of all trials, Desired distribution of priming conditions. The user can specify a desired distribution r i giving the relative number of times each of the priming conditions i ∈ Θ: = {control, DT, TT, TD, DD, DDTT, DTTD} should occur in the generated sequence. To calculate this criterion, we have to compare the empirical distribution e i = n i /n trials (where n i is the number of trials of priming condition i, i.e., e i is the relative frequency of trials of priming condition i) against the desired distribution r i . The L 1 distance is a suitable measure, since the maximum of d L 1 is 2 (if Σ i e i = 1 and Σ i r i = 1) and the criteria can therefore be calculated to lie within the range [0,1] such that a weighted sum of the criteria can be interpreted. The criterion can then be expressed as the set of all possible trial sequences. In our reference study, a typical genome looks like that in Table 2. We define the genetic operators cross-over C : S × S → S (defining sexual reproduction by combining two genomes) and mutation M : S → S (defining a mutational change of the genome) such that they must produce valid solutions. The cross-over operator randomly determines a trial at which to split the genome and concatenates the parent genomes such that the child is identical to the "father" up to the split point and identical to the "mother" thereafter. The mutation operator simply replaces each stimulus (either in t or in d) independently with a random element from {1,…,n stim } with probability p mut (i.e., the probability that there is at least one mutation in a genome is 2n trials p mut and is typically very small).

Fitness function
We chose a fitness function F:S → [0,1] for trial-sequence generation that evaluates a solution in terms of a weighted sum of a number of n crit separate criteria C i :S → [0,1] that are rescaled according to a power-law with exponent k The weights ω i ∈ [0,1] are restricted to sum to 1 and can be adjusted by the user to emphasize the importance of some of the criteria. The scaling exponent k can be chosen to give more emphasis to low scores when the overall convergence behavior is not satisfactory (i.e., the algorithm converges at low fitness values, e.g., 0.8).
The following n crit = 5 criteria were implemented in our software: (i) only desired priming conditions are realized (e.g., in the reference study, only control, DT, TD, DTTD, and DDTT trials are presented),  Table 1 for the abbreviations). Mutation randomly changes one of the labels (e.g., "bed" in trial 1 to "bench") and cross-over concatenates two partial tables split at a randomly chosen trial.

Tutorial -suggested workflow
Getting the best result out of the optimization can be tricky when the conditions posed by the experimenter are hard to fulfill. That means, depending on the actual requirements, the optimization problem might be very difficult or the optimal solution might not be good enough because there is no trial-sequence that can come close to satisfying the constraints. Our experience in working with the program resulted in a typical workflow: First, the number of stimuli and the choice of experimental conditions are set along with the length of the trial sequence and the desired distribution of the stimulus-repetition conditions. Then, a first run with the default parameters is executed to check the general level of the solution's fitness. If the results are unsatisfactory because the sequences do not fulfill the constraints, the GA settings should be manipulated first (i.e., larger population size, more populations) and the resulting increase in performance be evaluated. If the algorithm is observed to converge (when the curve approaches an asymptote) and the performance is still suboptimal, it is necessary to increase the number of trials and/or the number of objects in the sequence since a good solution does not seem to exist. Finally, once a good sequence has been generated, it can be manually tweaked to further increase the fitness. In the following, we give a detailed account of this procedure.
(i) Setting experimental constraints The first step is obviously to specify the requirements of the planned experiment. The parameters are collected in the "Setup"-tab of the program: Number of stimuli in the experiment, number of trials and the stimulus-repetition conditions along with the desired frequency of the conditions. Depending on the choice of these parameters, the difficulty of the optimization problem is going to vary: It is for example comparatively easy to realize an overrepresentation of control trials because they leave the algorithm the freedom to choose four different stimuli without constraints. In contrast, if many full repetition (DDTT) or reversed repetition (DTTD) trials are desired the constraints may be severe and, in fact, not possible to fulfill completely. (ii) Running the optimization It is suggested that an initial run using the default parameter settings is performed. The default parameters have been carefully selected to be successful for a number of requirements. The main parameter is the "number of generations": It directly determines how many iterations are run and should be the first to be increased. In the next steps the results of the algorithm need to be validated and adjusted to yield optimal results. (iii) Checking algorithm convergence The "Convergence"-tab presents a plot of the population scores as a function of population index similar to Figure 4. The user should verify that the score has settled at a high level and does not continue to grow. If the scores have not yet converged, the number of generations should be increased and the algorithm rerun. The default number of generations was set to an intermediate value (5000 generations) in order to provide a fast initial result. The program's runtime will increase proportionally to the Only three consecutive trials of same condition. All trials that lie in a consecutive sequence of more than three trials of the same condition are counted as n consecutive . The criterion evaluates to C n n 3 1 ( ) . g = − consecutive trials (4) Uniform distribution of stimuli over target/distractor. These criteria are evaluated similar to the second one by calculating the L 1 distance between a uniform distribution and the empirical distribution. The empirical distribution is the relative frequency each stimulus appears as target/distractor, thus  (6) where #S is the number of elements (cardinality) of set S. After the separate criteria C 1 to C 5 are calculated, they are combined into an overall fitness-score F according to Eq. 1.

software IMpleMentatIon
The program is available as source-code and pre-compiled binaries for a variety of platforms (Microsoft Windows, Mac OS X, Linux 32/64 bit) from the project webpage (Ihrke, 2011) and is released under the GNU General Public License 2 (Free Software Foundation, 1991). The software comes with a graphical user interface (see Figure 3) allowing to vary all important parameters. Documentation and installation instructions are available from the project webpage and from within the application ("Help"-tab).
There are two separate tabs that are designated to set the experimental constraints and the optimization parameters for the GA, respectively. After providing the experimental constraints, the user can hit the "Go" button and the GA will begin the optimization. After the GA has converged, the best sequence found during the optimization is presented in the "Result"-tab along with descriptive statistics. This sequence can be saved as a comma-separated list and, e.g., imported into a spreadsheet program (import via commaseparated values, CSV) or read by the experimental software. On the project's webpage (Ihrke, 2011), we provide example code for several commonly used presentation environments that read our file-format, thus making the generated sequence accessible to the presentation routine. Furthermore, a plot of the algorithm's convergence is provided, showing the fitness-score of the worst, average, and best individual over successive populations. This allows the user to directly assess how well the algorithm converged and to judge the quality of the final sequence. Also, the generated sequence can be modified in place and re-evaluated. Finally, a configuration can be saved to file and re-loaded into the application.
(v) In case of good convergence but poor results: Increase degrees of freedom Occasionally, if experimental parameters are unfortunately selected, the GA may converge at a suboptimal level. In this case, increasing the number of iterations will not solve the problem as the algorithm is stuck in a local maximum. There are a couple of parameters that can be tweaked to provide the algorithm with more flexibility. However, the described steps may also result in a much larger number of necessary iterations and can even prevent the GA from converging at all. Different steps should be taken, depending on the distribution of the criterion scores: • One score low, all others high Sometimes, a good global score can be achieved by choosing a "loser" criterion score that will take on low values while all others can achieve better ones. For example, if a large number of full repetitions (DDTT) are desired, the algorithm could number of generations. If the initial run was fast enough, the number of generations can safely be doubled or tripled (resulting in a run twice or three times as long) to see whether additional value can be gained by longer convergence. Note, that the number of iterations required to solve the problem is dependant on the number of trials in the genome: The longer the trial-sequence, the more iterations have to be run to find a good solution (since the searchspace is much larger). (iv) Check best genome The "Result"-tab provides an editable table showing the best sequence encountered during optimization along with a graphical summary of the properties of the sequence: There are histograms showing the distribution of the stimuli as target and as distractor and a histogram for the distribution of experimental conditions. The desired distribution is depicted in dark gray for easy orientation (Figure 3). Finally, the partial scores defined in Section 2.2.3 are listed, giving an indication of the quality of the sequence.

results
The performance of the GA depends on the mix of experimental conditions chosen by the user and on the number of stimuli and trials (as well as the algorithm's parameters). Typically, when the maximal fitness is unsatisfactory (i.e., the criteria are not fully satisfied), an increase of the number of stimuli or trials will provide enough degrees of freedom for the algorithm to find a more satisfactory solution. For standard settings realized in many studies, the overall score of the best sequence is between 90 and 100% (see Section 3.1). An important aspect when using an optimization strategy such as GAs is the computational complexity and hence the time required to solve the optimization problem. The runtime of the software depends critically on the number of iterations, the population size, and the number of trials that are to be generated (length of the genome) 3 . It is also possible that unfeasible parameter settings will yield impractical running times. In our applications, the software generally finished calculation in a reasonable amount of time on standard desktop-computers and laptops (10 s up to several minutes for large number of iterations). If impractical running times are encountered, the user is encouraged to use the program's plot-panel to evaluate how well the algorithm converges for a low number of generations and increase that number slowly until a good trade-off between runtime and quality of the result is found.

exaMples
The examples presented here are available as settings-files that can be loaded directly into the software to reproduce the discussed results.

Example 1: standard setup
As a first example, we generate a trial sequence for a priming experiment implementing only negative (DT) and positive priming (TT) conditions in addition to control conditions. In order to control for a bias caused by an overrepresentation of trials including any repeating stimulus, we generate 50% control and 25% of each of the priming conditions. We use 10 different stimuli and realize 400 trials. Without changing any of the choose to disregard the "only three consecutive same-condition-trials" criterion and produce a large number of consecutive full repetitions. As a remedy, a larger value of k (the power-law-scaling exponent) can be used. This will increase the penalty given to low values of the partial scores. Increasing the weight for the loser-criterion might help as well but can also lead to a different loser being chosen by the algorithm. • All scores low When the algorithm converges at a low level for all partial scores, several remedies can be tried: At first the simple GA should be examined (instead of the default steady-state GA). This algorithm is more flexible and convergence will be slower. If the convergence is still good but at a low value, the "elitism" checkbox can be unchecked increasing variability even more. Additionally, larger populations or larger probabilities of mutation and cross-over can help in this situation (also in combination with a steady-state algorithm). The algorithm should be rerun several times to check whether the convergence is reproducible. (vi) Manual tuning Finally, the software provides facilities to change and re-evaluate the sequence according to the discussed criteria. The first few best sequences encountered during the run are put into the table in the "Result"-tab (they can be cycled through using the arrow-buttons). It is possible to manipulate items in the sequence and generate a new score on-the-fly which is very convenient for manipulating a close-to-optimal sequence. Such fine-tuning may be desired, when the algorithm included unwanted experimental conditions (which may happen because of the trade-off over the criteria in Eq. 1).
Note that it might in general be more practical to generate shorter segments of trial sequences (e.g., one for each experimental block). This is because (i) their generation is easier to control and (ii) they can be combined in randomized order in the experiment without taking care of the trials on the boundary between two blocks. In addition, the final full sequence will satisfy the constraints also locally. However, it may be more difficult to fulfill the distribution constraints for short sequences.  The runtime depends linearly on the three parameters: Doubling the number of generations/trials or the population size will approximately double the runtime as well. and generated distribution of priming conditions are easily remedied by hand (in this case, by converting two DD and one TD trials into DDTT trials).

Example 3: Ihrke et al. (2011)
In a recent study, we conducted an NP experiment in which we implemented four different priming conditions, DT, TT, TD, and DD ) that were to occur an equal number of times. There were 840 trials and 5 different stimuli. The results after 10000 iterations are optimal (F = 1, Figure 6).

coMparIson wIth on-lIne randoMIzatIon
While it is more difficult to perform an on-line randomization for priming experiments than it is for paradigms without prime-probe relations, it is still possible when explicitly accounting for the intertrial dependencies. To formulate a subroutine that can return the next stimuli given the preceding ones, it is convenient to use the terminology of Markov-chains. In these stochastic processes, the probability to be in a state at a given point in time depends only on the previous state. When associating the states with the presented stimuli, it is possible to create an algorithm for generating the next display which will converge to an optimal distribution of conditions and target/distractor stimuli in the limit of large number of trials. The details of this Markov-chain are described in the Appendix.
parameters except the number of generations (the "runtime"), the algorithm returns almost perfect results (all scores close to 1, see Figure 4): The stimulus objects are uniformly distributed over target/distractor and the conditions appear in appropriate relations.

Example 2: Schrobsdorff et al. (2007)
The reference study presented above and illustrated in Figure 2 (Schrobsdorff et al., 2007) implemented both partial and full repetitions (DT, TT, DDTT, and DTTD) besides the control condition. Each condition was presented 80 times such that there was a total number of 400 trials (excluding a practicing phase). Five different stimuli were used.
The requirements for this study are rather hard to fulfill, because the full repetitions (DTTD and DDTT) pose restrictions on the stimulus-sequence. A run with the default parameters running for 10000 generations did not succeed in producing the desired distribution of priming conditions (control conditions were overrepresented and DDTT conditions were underrepresented). By giving more weight to criterion (ii) however (ω 2 = 0.32, ω 1 = ω 3 = ω 4 = ω 5 = 0.17), the algorithm was successfully guided toward the near-optimal solution shown in Figure 5. Target and distractor stimuli are perfectly uniformly distributed and the few remaining discrepancies between required  similarity to biological evolution which lets the meaning of the parameters become transparent. A tool for manual tweaking of the generated sequences is provided as well. The software is suitable for priming tasks that vary from trial to trial on two independent dimensions. The program is hosted as an open-source project and is expected to evolve with the needs of experimentalists into a more general tool for trial-sequence generation. The software was designed to be extensible such that new requirements can be integrated in future versions with minimal programming effort.

future dIrectIons
We seek to extend the functionality of our program in several ways. Currently, if any experimental variation in addition to the priming conditions is desired, it must be added "by hand" to the generated trialsequence using, e.g., a spreadsheet software. For example, recent studies have focused on response-repetitions in addition to the stimulus-repetitions covered by our software (e.g., Rothermund et al., 2005;Mayr et al., 2011). Randomization of the response-sequence in addition to the trial-sequence therefore includes additional inter-trial dependencies which must be uniformly distributed as well. We opt for the inclusion of such additional restrictions in future versions of our software.
Another important step is to increase the number of paradigms to which the program can be applied. Consider, for example, n-back tasks (e.g., Schmiedek, 2009): to support this task it is necessary to model inter-trial dependencies between trial i and trial i − n instead of only prime-probe dependencies. Our goal is to gradually approach a generality that will allow to model any such experimental requirements. This requires, however, a much more flexible formalism allowing the individual researcher to adapt the program to his needs. Currently, to adapt the software to different paradigms, programming skills are required: The fitness function as well as the stimulusdependencies must be explicitly specified in the source-code. Later versions of our software are to include the possibility to specify any inter-trial dependencies on a variable number of dimensions.
Finally, we want to implement a measure for the randomness of a trial-sequence in order to avoid predictability in the recorded sequences. Potential candidates for such a measure are the Approximate Entropy (Pincus, 1991;Pincus and Kalman, 1997) or a derivate (e.g., Sample Entropy, Richman and Moorman, 2000) or the calculation of an entropy-measure for a minimal finite-state automaton fitted to the trial sequence (Cleeremans et al., 1989). However, there are several problems defining "predictability" by human subjects: First and foremost it is unclear what kind of sequential structure can be learned and used by human subjects. Studies on implicit sequence learning show that rather complex and stochastic structure can be exploited. Furthermore, due to the multivariate character of most trial sequences, it is not obvious whether only the sequential structure of each dimension or also cross-dependencies can be predicted. These considerations must be taken into account when designing a "measure of predictability" for trial sequences.

acknowledGMents
The first author acknowledges financial support by the Göttingen Graduate School for Neurosciences and Molecular Biosciences (GGNB). This work was supported by BMBF grant numbers 01GQ1005B and 01GQ0432. Discussions with Hecke Schrobsdorff, J. Michael Herrmann, Floran Schmiedek, and Marcus Hasselhorn are gratefully acknowledged.
In Table 3, we present a comparison of stimulus-sequences generated using the GA and the on-line randomization for the three examples from the preceding section. The GA performs better in all cases. Because of the stochasticity of the on-line randomization, it can be expected that the overall quality of the method converges to optimal values with growing number of trials. In Figure 7, we present the score of the generated sequences using both the GA and the on-line randomization as a function of length (number of trials). While the GA finds an optimal solution independent of the number of trials in the sequence, the on-line solution approaches good results only for a large number of trials. The remaining gap between the two curves is a result of criterion C 3 in the calculation of the fitness-scores: Because the number of consecutive trials of the same experimental condition is punished in the GA but not in the on-line algorithm, the fitness of the on-line solution will always be lower than the GA-solution.

dIscussIon
Priming paradigms are widely applied in behavioral research and properly randomized trial sequences are an important basis for any successful priming experiment. Because the cross-trial dependency makes the use of on-line randomization more difficult and can even pose problems in manually designing randomized trial sequences, we developed a software that automatically generates suitable trial sequences based on a GA optimization strategy. GAs are argued to be particularly intuitive for behavioral researchers due to their  with s i ∈ Ω are determined by the required distribution of priming conditions π i , i ∈ Θ with Σ i π i = 1. As before, this distribution is specified by the experimenter. Depending on which priming condition is chosen, the stimuli in the next display are restricted: In full repetition or full reversal trials, the stimuli will be the same as in the current trial (so there is only one possible transition). For all conditions with a single repetition, the number of possible stimuli for the probe is n stim − 2, because one stimuli is fixed (the repeated one) and the other one can be any stimulus except the current target or distractor. In order to achieve a uniform distribution over the stimuli, the probability for the priming conditions in the probe is equally divided among all possible stimuli that realize this condition, such that A ij = p k = π k /N k for all i,j that result in priming condition k ∈ Θ. The number of possible transitions N k per priming condition k is given in Table A1. For example, if the relative frequency of DT conditions in the trial sequence is to be π DT = 0.2, then A ij = 0.2/(n stim − 2) for all i,j that will result in a DT condition. This will ensure that stimuli will be uniformly distributed in the limit of large number of trials. The process is illustrated in Figure A1.
A script written in the R environment for statistical computation (R Development Core Team, 2010) that implements this idea is available from the website (Ihrke, 2011).

appendIx on-lIne randoMIzatIon for prIMInG experIMents
Since in priming experiments the dependency is only between prime and probe, it is possible to use an on-line randomization approach based on Markov-chains (for an introduction to Markov-chains, see Papoulis and Pillai, 2002). A Markov-chain is a stochastic process X t that is characterized by the Markov-property: The probability distribution for trial t depends only on the previous outcome, P X s X s X s P X s X s The chain is currently in state (1,2) shown in blue (target 1 and distractor 2) and moves to a new stimulus combination with the transition probabilities A ij as defined in the text.