Precise Quantification of Behavioral Individuality From 80 Million Decisions Across 183,000 Flies

Individual animals behave differently from each other. This variability is a component of personality and arises even when genetics and environment are held constant. Discovering the biological mechanisms underlying behavioral variability depends on efficiently measuring individual behavioral bias, a requirement that is facilitated by automated, high-throughput experiments. We compiled a large data set of individual locomotor behavior measures, acquired from over 183,000 fruit flies walking in Y-shaped mazes. With this data set we first conducted a “computational ethology natural history” study to quantify the distribution of individual behavioral biases with unprecedented precision and examine correlations between behavioral measures with high power. We discovered a slight, but highly significant, left-bias in spontaneous locomotor decision-making. We then used the data to evaluate standing hypotheses about biological mechanisms affecting behavioral variability, specifically: the neuromodulator serotonin and its precursor transporter, heterogametic sex, and temperature. We found a variety of significant effects associated with each of these mechanisms that were behavior-dependent. This indicates that the relationship between biological mechanisms and behavioral variability may be highly context dependent. Going forward, automation of behavioral experiments will likely be essential in teasing out the complex causality of individuality.


INTRODUCTION
Individual animals exhibit idiosyncratic behavior, even when their genetics and rearing environment are held constant. This variability is termed intragenotypic variability (Stamps et al., 2013) and likely arises in part due to stochastic effects during development (Vogt, 2015;Honegger and de Bivort, 2018), which, in a quantitative genetic framework, are classified as microenvironmental plasticity (Morgante et al., 2015). Intragenotypic variability in animal behavior is likely a major component of animal personality, an ecologically and evolutionarily important dimension of variation (Freund et al., 2013;Bierbach et al., 2017). A single genotype giving rise to a broad distribution of random phenotypes may constitute an adaptive evolutionary strategy, termed "bet-hedging, " to increase the probability that for any fluctuation in the environment, some individuals will be fit, increasing the odds that a lineage never goes extinct (Hopper, 1999). While bet-hedging has strong theoretical foundations, in the context of animal behavior it has limited evidence [but see Kain et al. (2015) and Akhund-Zade et al. (2020)]. A challenge in studying bet-hedging is that behavioral variability is difficult to measure; larger sample sizes are needed to precisely estimate the variance of a trait, compared to the mean. This is largely because the former requires sampling phenotypes in the tail of a distribution, which are rare by definition.
Increasing behavioral assay throughput via automation is an effective way to attain the sample sizes needed to study variability. This can be achieved through miniaturization and parallelization of imaging platforms in a lab context (Kain et al., 2012;Churgin et al., 2017;Pantoja et al., 2017;Stern et al., 2017;Barlow et al., 2021). While the up-scaling of experiments is easiest with small, lab-adapted animals, such approaches do work with species beyond the common genetic models (Crall et al., 2016Bierbach et al., 2017;Ulrich et al., 2018). Gains in data throughput can be achieved with the help of robots that automate animal handling (Alisch et al., 2018), move cameras between arenas (Alisch et al., 2018;Crall et al., 2018) or track a single animal over long periods of time (Johnson et al., 2020). Automation of analysis is also essential, and innovations in animal centroid tracking (Panadeiro et al., 2021), body-part tracking using neural networks (Hausmann et al., 2021) and behavioral classification (Kabra et al., 2013;Berman et al., 2014;Todd et al., 2017) constitute a rich tool set for rapidly extracting behavioral measures from digital data sets.
High-throughput, automated behavioral assays have been used to investigate the variability of Drosophila behavior (Mollá-Albaladejo and Sánchez-Alcañiz, 2021; Mueller et al., 2021;Steymans et al., 2021;Werkhoven et al., 2021). The species' deep genetic toolkit facilitates the study of proximate mechanisms controlling variability such as neurotransmitters (Kain et al., 2012;Honegger et al., 2020), neural circuits Skutt-Kakaria et al., 2019;Honegger et al., 2020;Linneweber et al., 2020), genes (Kain et al., 2012;Ayroles et al., 2015;Wu et al., 2018), environmental variation (Akhund-Zade et al., 2019), and social effects (Alisch et al., 2018;Versace et al., 2020). Of these studies, the three that have assayed the greatest number of individuals (Ayroles et al., 2015;Buchanan et al., 2015;Skutt-Kakaria et al., 2019) all employed a common behavioral assay: spontaneous locomotion in Y-shaped mazes. As flies walk freely in these arenas, they make a leftvs-right choice every time they cross through the center of the maze. Individual flies make hundreds of such choices per hour. This yields a large data set per individual, which in combination with a high throughput of individuals, makes this assay particularly amenable to the study of variability. Beyond the number of left-right choices made and their average handedness, the Y-maze assay also produces behavioral measures related to the higher-organization of turn sequences and their timing (Ayroles et al., 2015).
Individual left-vs-right turning bias is correlated with counterclockwise-vs-clockwise bias in open arenas  indicating that the behavioral measures in this assay are not entirely geometry-dependent. Humans may exhibit a comparable form of locomotor bias in the curvature of their trajectories when trying to walk straight without visual feedback (Souman et al., 2009). The leftright symmetry of this assay evokes the phenomenon of fluctuating asymmetry, in which individual variation in the extent of morphological asymmetry is used as a measure of developmental stability (Van Valen, 1962;Debat et al., 2011). Indeed, both left-vs-right turn bias in Y-mazes and morphological traits examined for fluctuating asymmetry tend to have average values (typically close to left-right symmetry) that are robust across genotypes and selection (Pélabon et al., 2006;Ayroles et al., 2015).
Here, we took advantage of the high precision and throughput of the Y-maze assay to characterize the distribution of individual behaviors and their variability along different experimental axes. We collected nearly all the data from Y-maze experiments conducted by lab members since this assay was devised in 2010. In descriptive analyses, we characterized the distribution of individual Y-maze behavioral measures, and their correlations, with unprecedented precision. In hypothesis-driven analyses, we examined the effects on variability of manipulations of serotonergic signaling, the gene white [previously shown to affect phototactic variability; Kain et al. (2012)], sex, and temperature. On the whole, these analyses reinforce the finding that genotype and the choice of behavioral measure itself have consistently large effects on measures of variability (Akhund-Zade et al., 2019), though some environmental manipulations can have large effects in a behavior-dependent fashion.

RESULTS
We collected experimental records from hundreds of experiments examining the Y-maze behavior of 183,496 individual flies (Figure 1). In total, these flies made 79.8 M left-right choices. Four behavioral measures were recorded for each fly (Ayroles et al., 2015): turn bias (percent of turns to the right), number of turns, and turn switchiness. The last is a measure of the degree to which flies alternate between left and right turns, normalized by their turn bias. A fly making exactly as many left (right) followed by right (left) turns as expected in a binomial model has a switchiness value of 1. Lower switchiness indicates fewer LR/RL turn sequences, and, conversely, longer streaks of L or R turns. The fourth measure, turn clumpiness, captures the non-uniformity of turn timing, i.e., the extent to which flies made choices in bursts. We changed the formula for the last measure midway through the data collection period [compare Buchanan et al. (2015) and Werkhoven et al. (2021)], making this measure hard to compare across experiments; therefore we excluded it from further analysis. In addition to behavioral data, the record for each fly also included metadata about the experimental circumstances, including ( Table 1): the fly's genotype, experimental conditions, temperature during behavior, age of the fly, the experimenter who recorded the behavioral data, the ID# of the array of arenas ("tray") in which it behaved, the ID# of the imaging box in which it behaved, the date, the number of arenas in its tray, the software used to record its behavior, the software used to produce its behavior measures, and its sex. The proportions of all flies for five of these metadata categories are shown in Figure 1B.
The size of our data set allows some of the most precise estimation of behavioral distributions across individuals to-date. We computed kernel density estimates of the distributions of turn bias, number of turns and turn switchiness (Figures 2A,D,G). The distributions of all measures are essentially unimodal, with the distribution of handedness appearing roughly Gaussian (Figure 2A). However, it deviates from that distribution in a number of ways: it is denser at its mode and in tails corresponding to strong turning biases around 0.1 and 0.9. This is reflected in a kurtosis greater than three ( Figure 2B; see below). The empirical distribution of handedness is technically trimodal, with small peaks corresponding to flies with biases very close to 0 and 1. Most flies in these peaks performed fewer than 50 turns, indicating that these peaks may be the consequence of undersampling within these individuals.
To assess the precision of measures quantifying these distributions we looked at the distribution of estimates (under bootstrapping) of the mean, standard deviation, skewness and kurtosis of the behavioral distributions ( Figures 2B,E,H). These were generally quite narrow, indicating precise estimation, and generally broader for the higher-order statistics. This was expected as the higher-order statistics have exponential terms that render them more sensitive to sampling error. But their precision did not always decrease monotonically ( Figure 2H). To extend this analysis, we computed the standardized moments of each distribution, up to the 20th moment, for each behavioral measure (Figures 2C,F,I). To our surprise, the data provided robust estimates even of the 20th moment of turn bias and turn switchiness. This was true even in 10-fold subsamples of the turn switchiness data, but was not the case for number of turns ( Figure 2F) or odd moments of the turn bias data ( Figure 2C). This indicates that the reliability of estimates of high-order distribution statistics depends on the underlying distribution, not just the sample size.
In our studies of turn bias in Y-mazes (Ayroles et al., 2015;Buchanan et al., 2015;Akhund-Zade et al., 2019;Werkhoven et al., 2021), we operated under the assumption that the mean turn bias was 0.5 in all genotypes. For example, this assumption was the basis of a decision to not model the interaction of genetic variation for the mean and variability of turn bias in Ayroles et al. (2015). On close examination of this measure in our new data set, we found evidence that the mean turn bias may not be 0.5 (Figure 3). The mean of turn bias in the grand data set was 0.496 ( Figure 3A), indicating a slight left bias to Y-maze turn choices. This slight left bias was also present in the distribution of genotype, sex and experimenter (Figures 3B-D) mean turn biases, suggesting that the apparent left bias in the grand mean is not likely attributable to imbalance among the metadata covariates. Indeed, a linear model with 11 meta variables as predictors (all but date, which renders the model rank deficient) and 636 coefficients has a turn bias intercept of 0.485 (SE 0.0099). The apparent effect of experimenter ( Figure 3D) was not strongly seen in the above model (lowest p-value = 0.04 across 10 experimenters; nor in a model with only genotype and experimental condition as the other predictors: lowest p-value = 0.11). In contrast, 47/569 genotypes have significant effects (p < 0.05) in a linear model where genotype is the sole predictor of turn bias ( Figure 3E). This is a significant enrichment, and supports the conclusion that the average turn bias is under biological control.
Since our behavioral data was multidimensional (turn bias, number of turns and turn switchiness were measured for each fly), we were also able to investigate the joint distributions and correlations of these measures. We first tested whether there String indicating the experimental conditions Flies reared on agar media supplemented with 10 mM 5-HTP Flies reared on agar media supplemented with 25 mM 5-HTP Flies reared on agar media supplemented with 50 mM 5-HTP Flies reared on cornmeal-dextrose media supplemented with 10 mM 5-HTP Flies reared on potato media supplemented with 10 mM 5-HTP Flies reared on potato media supplemented with 25 mM 5-HTP Flies reared on potato media supplemented with 50 mM 5-HTP Flies reared on cornmeal-dextrose media supplemented with 10 mM aMW Flies reared on control agar media Flies reared on control agar media supplemented with 15 mg/mL ascorbic acid Flies reared on potato media supplemented with 10 mM aMW Flies reared on potato media supplemented with 20 mM aMW Flies reared on potato media supplemented with 25 mM aMW Flies reared on potato media supplemented with 50 mM aMW Flies reared on control potato media Flies reared on control potato media supplemented with 15 mg/mL ascorbic acid Flies subjected to heat-shock at day 10 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 14 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 1 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 3 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 4 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 5 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 6 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 7 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 8 of development (Ayroles et al., 2015) Flies subjected to heat-shock at day 9 of development (Ayroles et al., 2015) Flies reared in darkness Flies subjected to heat-shock post-eclosion, prior to behavioral assay Flies reared in incubators at 18 • C Flies reared in incubators at 20 • C Flies reared in incubators at 23 • C Flies reared in incubators at 25 • C Flies reared in incubators at 30 • C Flies are the progeny of single parents selected for turn biases  Flies reared in high intensity enrichment population cage (  Fly's sex. "Both" indicates that both males and females were used in this experimental group, in unspecified proportion eyeColor State of the white genetic locus. See Figure 5. + indicates wild type, − null, and m mini-white alleles might be a correlation between turn bias and number of turns, specifically a negative correlation arising from higher sampling error in estimating turn bias for flies making fewer turns. Counter to this prediction, we observed a slight positive correlation (r = 0.036; p = 4 * 10 −52 ). Incidentally, we noticed the effects of the discreteness of number of turns as a measure, and the resulting limited values that turn bias can take on, as a fractallike (Trifonov et al., 2011) structure in the scatter plot of absolute turn bias vs. number of turns ( Figure 4A). Next, we examined the joint distribution of turn switchiness and number of turns ( Figure 4B). This two-dimensional distribution had two conspicuous features: an uncorrelated mode containing the vast majority of the flies, and a smaller mode exhibiting a negative linear relationship between turn switchiness and number of turns. The flies in this second mode were nearly all reared on potato flake media [which was sometimes supplemented with drugs targeting the neurotransmitter serotonin; Dierick and Greenspan (2007), Kain et al. (2012), and Krams et al. (2021)]. Of these flies, approximately 296 flies were reared on media including the serotonin inhibitor aMW, 429 were reared on the serotonin precursor 5-HTP, and 942 were reared on control media. Notably, being reared on potato food was not a guarantee that a fly fell in this part of the distribution; the vast majority of flies in such rearing conditions fell in the predominant uncorrelated mode of the joint distribution along with flies fed on standard cornmeal-dextrose media.
Finally, we used the Y-maze data set to revisit several previously examined hypotheses about the proximate mechanisms regulating behavioral variability. We first asked whether the distribution of measures of turn bias variability across genotypes was consistent between the distribution seen in Ayroles et al. (2015) and the other genotypes present in our data set. The lines examined in that paper come from the Drosophila Genome Reference Panel [DGRP; Mackay et al. (2012)], a collection of inbred lines established from the natural population of flies in Raleigh, NC USA. The remaining 339 genotypes in our data set come from a variety of sources, mostly lab stocks, and include 165 lines expressing the transgenic driver Gal4 (Brand and Perrimon, 1993) in neural circuit elements (Jennett et al., 2012). Thus, these genotypes do not represent a sample from a natural population. The distribution of their genotype-wise variability in turn bias was largely similar to that observed in DGRP lines (Figure 5A), with genotypes exhibiting coefficients of variation in handedness ranging from less than 0.2 to more than 0.4.
Neuromodulation may have a special role in the control of behavioral variability (Maloney, 2021), e.g., phototaxis (Kain et al., 2012;Krams et al., 2021) and olfactory preference (Honegger et al., 2020). We conducted experiments to see if serotonin modulation controls variability of locomotor behaviors in the Y-maze. Specifically, we measured the variability of turn bias, number of turns and turn switchiness in DGRP lines which were treated with alpha-MW (a serotonin synthesis inhibitor), 5-HTP (a biosynthetic precursor of serotonin) (Dierick and Greenspan, 2007) or their respective control media. These treatments generally had small effects on behavioral variability across genotypes (ranging from a −10% to a 7% increase), with the exception of the effect of 5-HTP on variability in the number turns, which, in two versions of the experiment increased variability by 16 and 25% ( Figure 5B). Overall, these results imply that although serotonin levels can affect the variability of turn number, there is not a strong effect that is consistent across behaviors.
We previously determined that the effect of serotonin on phototactic variability was dependent on the gene white, which encodes a transmembrane transporter that imports the serotonin precursor tryptophan into neurons. We scored the flies in our Y-maze data set for their white genotype, which could range from wild type to homozygous null, with intermediate conditions of (likely) partial rescue by the expression of the "mini-white" allele at non-endogenous transgenic insertion sites (Klemenz et al., 1987). Lines with homozygous null alleles at the endogenous white locus exhibited higher variability in number of turns, with the exception of lines that were also heterozygous for miniwhite at a transgenic locus. The molecular function of White suggests that its disruption should produce a behavioral phenotype like serotonin synthesis inhibition, which had no effect in our pharmacological manipulations (whereas feeding flies serotonin precursor increased variability, like white disruption). White genetic disruption was associated with small reductions in variability in turn bias and turn switchiness (Figure 5C), consistent with the small decreases seen in the aMW pharmacological experiments ( Figure 4B). Overall, we found some agreement in the effects of serotonin pharmacological experiments and white disruption, but not perfect agreement, suggestive of behavior-dependent complexity in the relationship between white, serotonin, and variability.
It has been hypothesized that individuals of the heterogametic sex will exhibit greater trait variability due to noise in gene compensation (James, 1973), though a recent meta-analysis found no significant sex-bias in the variances of 218 mouse traits (Zajitschek et al., 2020). We fit linear models to Levene-transformed turn bias, number of turns, and turn switchiness data, with genotype and sex as predictors, to test for the effect of sex on behavioral variability. Males had variability that was −6.8% (p < 0.001), 7.5% (p < 0.001), and 1.8% (n.s.) greater than that of females in turn bias, number of turns, and turn switchiness respectively.  Lastly, we examined the effect of temperature during behavioral testing, with the hypothesis that flies would exhibit higher variability at high temperature (32-33 • C) than at room temperature (22-23 • C). This would be consistent with a mechanism in which heat pushes neural circuits out of the range in which physiological buffering keeps circuits operating similarly despite latent developmental and genetic variability (Tang et al., 2012;Rinberg et al., 2013). We examined this specifically for genotypes that had paired experiments at low and high temperature, and did not express any temperature-sensitive effectors. We found that high temperature had no effect on turn bias variability, but significantly decreased number of turns variability and turn switchiness variability by 37 and 32% respectively ( Figure 5D). Temperature does affect the mean number of turns, typically increasing it by making flies more active. Our analysis controlled for this by assessing meannormalized variability (the coefficient of variation: µ/σ). Overall, our analyses of the effects of potential proximate mechanisms controlling variability revealed a complex picture with (often small) effects of serotonergic regulation, white genotype, sex and , and turn switchiness (right) for DGRP genotypes in six pharmacological experimental conditions targeting serotonin. Each point is a genotype in a particular experimental condition. Lines pair genotypes across a drug medium and its associated control medium. Numbers at top indicate the effect size from control to drug treatment. Bold effect sizes are statistically significant and colored by the direction of their effect (red = lower variability; cyan = higher). *p < 0.05; **p < 0.01; ***p < 0.001. n = 157 genotypes comprising 38,316 flies. (C) Violin plot of estimation distributions for the variability of turn bias (magenta), number of turns (gold) and turn switchiness (turquoise) vs. genotype of the white gene.+ indicates wild type, + mw.hs the "mini-white" allele typically used to mark a transgenic insertion, and -a null allele [typically w 1118 ; Hazelrigg et al. (1984)]. white genotypes are ranked in estimated order of expression disruption. The site of w +mw.hs insertion varied by line; the semi-colon notation in the panel label indicates that this site might be on a different chromosome than the endogenous w locus. n = 85, 551,1,863,75,866,1,484,14,888  temperature. For all of these manipulations, the direction of effect on variability was behavior-dependent.

DISCUSSION
We gathered Y-maze data collected by lab members back to the origination of this assay 11 years ago. This large data set comprised the behavioral measures of over 180,000 individual flies that made a total of nearly 80 million left-right choices. With it, we were able to estimate the distribution of three measures of individual behavior with unprecedented precision, even out to the 20th standardized statistical moment (Figure 2).
In exploratory analyses, we noticed two surprising patterns: (1) a discrete change in the relationship between turn bias magnitude and turn switchiness in a subset of animals that had been reared on potato flake media used for pharmacological experiments, and (2) that flies appear to have a slight left bias in their Y-maze choices. Finally, we used our data set to test several hypotheses pertaining to proximate control of variability in behavior, finding significant behavior-dependent effects of drugs targeting serotonin, mutation of the white gene (which encodes a channel that imports serotonin precursor), sex and temperature. Compared to the effects of genotype and the choice of behavior measure, the effects of these manipulations were generally small and context-dependent, underscoring the complexity of relationships between axes of biological regulation and behavioral variability.
Admittedly, a motivation for this study was the desire to explore a very large data set reflecting the work over many years of many lab colleagues. In that spirit, it is fun to think about how throughput might be expanded another order of magnitude in the coming years. One possibility is robotic fly-handling (Alisch et al., 2018), which has yet to be deployed at scale in support of a large screen. Another possibility is tracking flies using capacitive sensors (Itskov et al., 2014) instead of with cameras. This would remove the need for long optical axes that force our behavior boxes to be tall, allowing a dense, vertical packing of arenas within a minimal bench footprint.
While increasing throughput through further automation is an appealing possibility, and perhaps essential for certain classes of experiments (like experimental selection for variability, which would require testing thousands of individual flies per generation for a year or more), it is not without conceptual consequences. One of these is how to assess small effects that are extremely statistically significant due to large sample sizes. Two examples from this study are the apparent slight left turn bias (Figure 3) and the significant positive correlation between turn bias magnitude and number of turns ( Figure 4A). A turn bias of 0.496 compared to an expected value of 0.5 is indeed a small discrepancy, but it might nevertheless be biologically significant given the consistent failure of artificial selection experiments to evolve directional asymmetry in a variety of fly morphological characters (Carter et al., 2009). Another aspect of working with large data sets is that sampling error is likely to be small compared to inadvertent biases in the data [Meng, 2018;see Bradley et al. (2021) for an important example]. I.e., accuracy is unlikely to improve with further observations, but instead with the harder work of addressing systematic miscalibration, misunderstandings of what is being measured, or structure in the data leading to effects like Simpson's paradox. A way forward among these challenges may be to conduct experiments and analyses under a variety of biological conditions, increasing the odds that inferences generalize across contexts (Voelkl et al., 2020), an approach that would also be boosted by throughput and automation.
With caveats of big data in mind, we want to consider possible errors that might explain the apparent slight (but highly significant) left mean turn bias. All experimenters who conducted these experiments are right-handed. It is formally possible that chiral manipulation during the experimental setup imparted a slight chirality to turning in the Y-maze, though we cannot think of a convincing mechanism by which this would happen. We also cannot think of mechanisms by which small, inevitable asymmetries in our behavioral rigs would impart a consistent left bias to behaviors measured across several generations of rigs and tracking software versions. Arguments in favor of the apparent left turn bias being real are previous reports of small mean asymmetries in wing size and shape (Klingenberg et al., 1998), possible indirect effects of conspicuously asymmetrical anatomical features like the gut, or the contribution of the Asymmetric Body, a small neuropil abutting the premotor Central Complex that is consistently larger in the right hemisphere (Wolff and Rubin, 2018).
While we found that our data set allowed the precise estimation of the distribution of individual behavioral scores, we also saw that the stability of higher-order moment estimates depended strongly on the behavioral distribution in question (Figure 2). Thus, there is not necessarily a simple rule for how large a sample is needed to estimate higher order statistics of its distribution. In the joint distribution of turn bias magnitude and turn switchiness, we observed two distinct modes between these measures, and, to our surprise, found that most of the points falling in the rarer mode came from experiments where flies were reared on potato flake food ( Figure 4B). These flies comprised a relatively small subset of multiple experiments, in both control and drug conditions, from many genotypes. Thus, rearing on potato media is the best explanatory variable we could find for this mode of variation. We previously observed that acutely switching flies from cornmeal-dextrose media to potato media increased their variability in odor preference (Honegger et al., 2020). Perhaps this perturbation also alters the correlation structure (Lea et al., 2019), in a subset of flies, between turn bias and turn switchiness. Since these measures may relate to the paths animals take through natural environments, a fooddependent change in turning might alter foraging statistics, perhaps adaptively.
Finally, we used this large data set to examine hypotheses about proximate mechanisms controlling variability. We found many significant effects, such as 5-HTP or disruption of the white locus increasing variability in number of turns, disruption of white decreasing variability of turn bias and turn switchiness, males exhibiting slightly lower variability in turn bias but higher variability in number of turns, and conducting experiments at high temperatures lowering variability in number of turns and turn switchiness (Figure 5). We expected temperature to increase variability per results in the crab stomatogastric ganglion (Tang et al., 2012;Rinberg et al., 2013), but our high temperature experiments did not push the flies to their critical thermal limits (Kellermann et al., 2012). Thus, perhaps even higher temperature manipulations might result in consistent increases in variability across behaviors.
Our variability results indicate a complex, behavior-dependent relationship between many biological mechanisms and behavioral variability, which likely parallels the complexity of mechanisms controlling the means of behavioral traits. Experimental automation, and the high throughput it permits, made these and other findings on behavioral individuality feasible. However, individual projects drawing on tens of thousands of flies have already identified genetic (Ayroles et al., 2015) and neural circuit  regulators of variability as well as complex gene x environment x behavior interactions affecting variability (Akhund-Zade et al., 2019;Versace et al., 2020). Inferences that were uniquely possible with data from hundreds of thousands of flies include the slight left-bias in turning and precise estimation of high statistical moments of behavioral distributions. The enduring scientific value of such results remains to be seen. Regardless, further automation of behavioral assays will speed up both large and small scale projects and, more importantly, liberate researchers from mindless, repetitive behavioral assays.

Data and Analysis Code
All behavioral measures and metadata values, along with the code underlying analyses are available at http://lab.debivort.org/ precise-quantification-of-behavioral-individuality/ and https:// zenodo.org/record/5784716.

Assays Over Time
Since the locomotor handedness Y-maze assay was developed, there have been several changes to the experimental protocol. While we are confident that the data collected through these iterations are comparable, these changes potentially represent confounding variables for the grand analysis presented here. The structure of each fly's assay is represented in our raw data table by several entries (see Table 1 for definitions): expTemp, trayID, boxID, arrayFormat, acquisition, and analysis. We found no significant effects of these variables on mean the means or variabilities of the behavioral measures analyzed in this study.

Typical Fly Handling
Unless otherwise indicated (via the expCond variable), the default culture conditions were cornmeal-dextrose media containing tegosept (Lewis, 1960) and incubation on the bench or in incubators at 21-25 • C with 12/12 h light cycles. Our source of media was Scientiis, LLC (Baltimore, MD, United States), product ID: BuzzGro, until 2013, at which point we switched to media produced by the Harvard Fly Food core facility. The recipes are nominally the same between these sources. Flies were generally anesthetized under CO 2 to load them into y-mazes, though a small portion of flies were anesthetized by ice or loaded without anesthetization. Flies were given a period of 15-30 min of acclimation to the Y-mazes after loading before data collection began.

Pharmacological Experiments
Experimental flies receiving drug treatments were reared from egg-laying in drug-supplemented media (or control media). Drug media are indicated in the expCond metadata variable (see Table 1). To supplement media, drug was added to distilled, deionized water, which was then added to dry potato flake media, or drug was added directly to agar media liquified momentarily in a microwave oven. To attain the final concentrations of aMW, the following concentrations were used per media vial: 10 mM = 131 mg/60 mL; 25 mM = 327 mg/60 mL; 50 mM = 655 mg/60 mL. For 5-HTP, the following concentrations were used: 10 mM = 10.1 mg/60 mL; 25 mM = 330 mg/60 mL; 50 mM = 661 mg/60 mL. Fifteen milligrams ascorbic acid was added to each 60 mL media vial as an anti-oxidant in 5-HTP treated groups and their controls. The two 5-HTP experiments presented in Figure 5 were conducted on potato media and cornmeal-dextrose media (#2) but are otherwise identical. To control for the average dose of experimental flies, prior to drug experiments we measured the average number of progeny to eclose following a 24 h parental egg-laying session, on cornmeal-dextrose media, for each of the DGRP lines (Akhund-Zade et al., 2020). The number of parental animals for drug experiments was adjusted proportionally, lineby-line, to target an identical number of progeny on the drug media for each line.

Behavioral Assay
Data was collected in Y-shaped mazes arrayed in trays Alisch et al., 2018;Werkhoven et al., 2019) and imaged in enclosed behavioral boxes  under diffuse white LED illumination typically provided by custom LED boards (Knema LLC, Shreveport, LA, United States). The number of Y-mazes per tray varied, as indicated by the arrrayFormat variable. Individual Y-mazes had 3-fold rotational symmetry, and ended in circular "culde-sacs" where the fly could turn around before making a subsequent choice. Trays were fabricated from three layers of acrylic, making up the floor (clear), walls (black) and a lid-holding layer (black). The surface of the floor layer was roughened to encourage flies to walk on it, using a random orbital sander and 200 grit sand paper until 2013 and a sand-blaster thereafter. Lids over each maze were cut from clear acrylic. All acrylic parts were cut to shape by a laser engraver. Schematics for trays and imaging boxes are available at https://github.com/de-Bivort-Lab/dblab-schematics/tree/master/Ymaze. Trays were imaged in opaque enclosures constructed from aluminum extrusion and laser-cut acrylic panels (https://github.com/de-Bivort-Lab/ dblab-schematics/tree/master/Behavioral%20Box). A variety of USB digital cameras (often made by PointGrey) with resolution exceeding 1 MP were used to capture video of behaving flies for real-time tracking at 6-30 Hz. The default assay length was 2 h. Fly centroids were computed in real time using background subtraction implemented in a variety of custom software environments coded in LabView or MATLAB. The centroid tracking software used in recent experiments was MARGO .

Statistics and Analysis
Analysis was conducted in MATLAB 2017b (The Mathworks, Natick, MA, United States) using custom functions. 95% confidence intervals estimated by bootstrapping were estimated as ± twice the standard deviation of values across bootstrap replicates. For the analysis of the effect of temperature on variability (Figure 5D), the 23 • C groups include experiments conducted at 22 • C and the 33 • C groups include experiments conducted at 32 • C. Genotypes were only included in the temperature analysis if they had data recorded at both temperatures and did not express any thermogenetic constructs. Thus, most genotypes in this analysis were controls for thermogenetic experiments or wild type lines. Significance in the serotonin pharmacological and temperature experiments was assessed by paired t-tests, and all reported p-values are nominal.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
BB oversaw the project and conducted data analysis. All authors collected the data and edited the manuscript.