Why Do Durations in Musical Rhythms Conform to Small Integer Ratios?

One curious aspect of human timing is the organization of rhythmic patterns in small integer ratios. Behavioral and neural research has shown that adjacent time intervals in rhythms tend to be perceived and reproduced as approximate fractions of small numbers (e.g., 3/2). Recent work on iterated learning and reproduction further supports this: given a randomly timed drum pattern to reproduce, participants subconsciously transform it toward small integer ratios. The mechanisms accounting for this “attractor” phenomenon are little understood, but might be explained by combining two theoretical frameworks from psychophysics. The scalar expectancy theory describes time interval perception and reproduction in terms of Weber's law: just detectable durational differences equal a constant fraction of the reference duration. The notion of categorical perception emphasizes the tendency to perceive time intervals in categories, i.e., “short” vs. “long.” In this piece, we put forward the hypothesis that the integer-ratio bias in rhythm perception and production might arise from the interaction of the scalar property of timing with the categorical perception of time intervals, and that neurally it can plausibly be related to oscillatory activity. We support our integrative approach with mathematical derivations to formalize assumptions and provide testable predictions. We present equations to calculate durational ratios by: (i) parameterizing the relationship between durational categories, (ii) assuming a scalar timing constant, and (iii) specifying one (of K) category of ratios. Our derivations provide the basis for future computational, behavioral, and neurophysiological work to test our model.

One curious aspect of human timing is the organization of rhythmic patterns in small integer ratios. Behavioral and neural research has shown that adjacent time intervals in rhythms tend to be perceived and reproduced as approximate fractions of small numbers (e.g., 3/2). Recent work on iterated learning and reproduction further supports this: given a randomly timed drum pattern to reproduce, participants subconsciously transform it toward small integer ratios. The mechanisms accounting for this "attractor" phenomenon are little understood, but might be explained by combining two theoretical frameworks from psychophysics. The scalar expectancy theory describes time interval perception and reproduction in terms of Weber's law: just detectable durational differences equal a constant fraction of the reference duration. The notion of categorical perception emphasizes the tendency to perceive time intervals in categories, i.e., "short" vs. "long." In this piece, we put forward the hypothesis that the integer-ratio bias in rhythm perception and production might arise from the interaction of the scalar property of timing with the categorical perception of time intervals, and that neurally it can plausibly be related to oscillatory activity. We support our integrative approach with mathematical derivations to formalize assumptions and provide testable predictions. We present equations to calculate durational ratios by: (i) parameterizing the relationship between durational categories, (ii) assuming a scalar timing constant, and (iii) specifying one (of K) category of ratios. Our derivations provide the basis for future computational, behavioral, and neurophysiological work to test our model.

INTEGER RATIOS AND MUSICAL RHYTHM
What are small integer ratios, and what makes integer-ratio rhythms special? A ratio between two inter-onset-intervals (IOIs) is the division between two, usually adjacent durations. Integer ratios can be written as a fraction: 1.5 equals 15/10 or 3/2, but √ 2 for instance cannot be written as a fraction. An integer ratio is small if the result of the division can be written as a small integer number divided by another small integer number e.g., 2/3, but not 23/51 (Pikovsky et al., 2003;Strogatz, 2003).
A rhythm, by definition as used here, is a pattern of durations (London, 2004, p. 4) characterized by the succession of event onsets over time, in other words a series of IOIs.
Auditory rhythms with small integer ratios between IOIs are common in the world's music Toussaint, 2013;Savage et al., 2015). Psychological and neural research suggests that small integer-ratio rhythms allow a more accurate internal representation (Essens, 1986;Sakai et al., 1999), improved deviance detection (Jones and Yee, 1997;Large and Jones, 1999), enhanced memory (Deutsch, 1986;Palmer and Krumhansl, 1990) and reproduction Essens, 1986), and better synchronization (Patel et al., 2005). The distortion of near-integer ratios toward integer ones (or their harmonics) reported in behavioral (Fraisse, 1982) and neurophysiological studies (Motz et al., 2013) further supports the idea of small ratios acting as "attractors" (Gupta and Chen, 2016). This idea has recently received support from studies of iterated learning and reproduction. When humans reproduce an initially randomly-timed rhythmic sequence, and this process is repeated in a cascade fashion within one or across several individuals, the sequence is subconsciously reshaped to be composed of IOIs related by small integer ratios (Figure 1A; c.f. Polak et al., 2016;Ravignani et al., 2016Ravignani et al., , 2018Jacoby and McDermott, 2017).
Why do rhythms (i.e., patterns of durations) tend to exhibit small integer ratios? Why are humans drawn to rhythms with such a peculiar mathematical property, in both perception and production? Does this property reflect a special quirk of music perception and/or motor sequencing, or could it be explained by domain-general aspects of cognition? Can we explore these alternatives through mathematical formalism? Here, we explore mathematically the possibility that the human bias toward small integer ratios may be explained by a combination of scalar expectancy and categorical perception.
We begin by outlining the relevant classical frameworks for human timing, and go on to summarize the evidence in support of the small-integer ratio bias in rhythm perception. We then present our proposal linking the frameworks to the bias through mathematical formalisms. Specifically, we draw on the scalar property of time interval estimation to formulate a simple model of categorical perception that may result in an integer ratio bias (Figure 1), and link this to neural oscillations. We conclude by briefly discussing the merits and limitations of our model and outlining future goals.

PSYCHOPHYSICAL AND OSCILLATORY APPROACHES
Two major theoretical approaches, among several, have been suggested to account for the mechanisms behind human timing (Wing and Kristofferson, 1973a,b;Getty, 1975;Meck, 1996;Church, 1999;Grondin, 2001Grondin, , 2010Mauk and Buonomano, 2004;Karmarkar and Buonomano, 2007;Ivry and Schlerf, 2008;Allman et al., 2014;Merker, 2014). The most influential and empirically tested psychoacoustic model is the "scalar expectancy theory" (Wearden, 1991;Allman and Meck, 2011). Psychophysical research shows that human timing often follows Weber's law (Bizo et al., 2006): the error for an interval duration being timed is proportional to the duration of that interval. One perception-based formulation states that the ratio between the just-noticeable difference (JND) and the duration of a reference stimulus is constant across stimulus length (Grondin, 2001). In another formulation, the coefficient of variation (standard deviation divided by mean) in estimating durations is constant across durations ( Figure 1D; Gibbon, 1977).
Another relevant approach to timing mechanisms comes from neuroscience and physics. It suggests that neural oscillations entrain (or even "resonate") with the periodicity of external stimuli at multiple time-scales (Buzsaki, 2006;Large, 2008;Arnal and Giraud, 2012;Gupta, 2014;Aubanel et al., 2016;Celma-Miralles et al., 2016). Specifically, it states that phase and frequency of neural oscillations entrain with the phase and frequency of external events at multiple metrical levels. For instance, processing a metronome beat will induce lowfrequency oscillations and/or power fluctuations in highfrequency oscillations following the periodicity of the beat, plus its multiples or divisors. Critically, the stability of the connection between two or more active neural oscillations, i.e., the "resistance" to external perturbations, depends on the ratio of their periods (e.g., 1:1, 2:1, 2:3). Small integer ratios typically confer greater stability. This may explain the perceptual advantage for integer-ratio stimuli over more complex metrical patterns (Large and Kolen, 1995). Other frameworks state that specific neurons or neural channels are tuned to particular durational intervals or tempi (Merchant et al., 2013;Bartolo et al., 2014).

ITERATED DRUMMING EXPERIMENTS: SMALL INTEGER RATIOS AS COGNITIVE ATTRACTORS
Recent behavioral research investigated human priors for durations in rhythmic patterns (Ravignani et al., 2016(Ravignani et al., , 2018Jacoby and McDermott, 2017). Participants were given drumming sequences to reproduce to the best of their ability. The patterns produced were presented to the same or a new participant in an iterative procedure. Strikingly, "first-generation" participants were given completely random patterns, and "last-generation" participants produced rhythms exhibiting small integer ratios, in line with previous work on e.g., bimanual tapping (Peper et al., 1991(Peper et al., , 1995aPeper and Beek, 1998).
Specifically, participants were presented with sequences of IOIs sampled from a uniform distribution U (e.g., Figure 1B). As the patterns were transmitted through "chains of reproductions, " (Ravignani et al., 2016(Ravignani et al., , 2018Jacoby and McDermott, 2017), distribution U converged toward a distribution D: a human observer's posterior distribution of IOIs (e.g., Figure 1A). This distribution is multimodal, and the modes are related by small integer ratios, a universal property of human musical cultures (Ravignani et al., 2016;Jacoby and McDermott, 2017).
Here we aim to explain the distribution D via established psychophysical principles, none of which explicitly entail smallinteger ratios. In other words, is the integer ratio bias a perceptual primitive in itself, or might it arise from the interaction of more are simulated; they were randomly sampled from several normal distributions, with total sample size as in (A). (F) Schematic representation of potential parameters linking scalar timing and small integer ratios. Panel (F) was produced without simulated or experimental data. Notice how the x-coordinate of the intersection point between the two Gaussians can be parameterized as to µ 1 + sc u 1 µ 1 (first Gaussian) and µ 2 − sc l 2 µ 2 (second Gaussian). For more than two Gaussians, the intersection can be parameterized as µ k + sc u k µ k (first Gaussian) and µ k+1 − sc l k+1 µ k+1 (second Gaussian). This parameterization is used in the derivations below.
fundamental primitives? Jacoby and McDermott (2017) related a theoretically hypothesized prior with built-in integer ratios to an empirically estimated prior, showing that these were aligned.
Here, we investigate whether it is possible to derive a prior with similar properties by not building in the integer-ratio, but by combining empirically founded principles of timing with a minimum of assumptions (and room for refinement by future testing).

PROBABILISTIC INFERENCE FOR INTERVAL RATIO CATEGORIES
Our concrete question is: Under which conditions will a distribution G show small-integer ratios, without having built these ratios into our model? Without any assumptions, distribution G would equal the uniform IOI distribution U in expectation. In other words which results on basic mechanisms of rhythm perception and production allow us to turn U into G? Below, we make four assumptions based on psychophysical evidence and reduce the number of free parameters in the model drastically with little loss of generality. We begin by elaborating on previous formalizations to make relevant assumptions explicit and comparable.

ASSUMPTION 1: CATEGORICAL TIMING
An n-event rhythm defines a sequence of IOIs d = (d 1 , . . . , d n−1 ) and of ratios r = (r 1 , . . . , r n−2 ), such that Perception of a rhythm r induces a representation z = (z 1 , . . . , z n−2 ), with a strong tendency to categorize. The vector z is a sequence of a small number of unique phenomenal interval-ratio categories that represent the observed data r. More specifically, the notation z i = k identifies that interval ratio r i is attributed to phenomenal category k (Ravignani et al., 2018). Whilst not used explicitly in our calculations, z formalizes the first key assumption: the processing of rhythmic sequences recruits a categorical interpretation of time intervals from a continuous stream of events (Clarke, 1987;Schulze, 1989;Desain and Honing, 2003). Behavioral evidence shows that also human motor timing is categorical: participants tapping produce IOI distributions with distinct peaks reflecting underlying durational categories (Collyer et al., 1994). This suggests that the distribution G can be approximated as a multimodal mixture of normal distributions (Figure 1C), rather than a uniform distribution ( Figure 1B). A small number of durational categories naturally results in a small number of ratio categories. For the perception of a rhythmic sequence as a whole, we would argue that the perceived durations be transformed toward forming small ratios, as supported by iterated drumming experiments (Jacoby and McDermott, 2017), "ideally" into integer multiples of the smallest unit. Whilst categorical timing may appear to be a simplifying psychological concept (Schulze, 1989;Drake and Bertrand, 2001;Desain and Honing, 2003;ten Hoopen et al., 2006) based on behavioral observations, it may not be that far off neural reality. The notion of durational categories relate to basic durational tuning properties of premotor neurons recorded in non-human primates (Merchant et al., 2013). For instance, categories can be mapped to interval tuning in the premotor neurons of monkeys performing a synchronization continuation task (Merchant et al., 2013). Here, the distribution of preferred intervals could be viewed as a prior, although this distribution is multimodal, rather than bimodal as in Merchant et al. (2013). In addition, human neuroimaging work showed specific activation patterns for the perceptual processing of integer interval ratios (Sakai et al., 1999). Moreover, sequences of small integer ratios may induce a metrical beat by the hierarchical organization of periodicity at two or more levels, i.e., the occrurence of an accent at a multiple small integer of the shortest time unit at the next higher level . Metrical structure is thus a higher, multi-level demonstration of the psychological prior toward small-integer ratios, that affords accurate reproduction . Moreover, the perceptual timing of rhythms with such a metrical beat is more accurate, their subjective percept "catchier" and their recognition more robust against temporal scaling, i.e., speeding up or slowing down the tempo, as the pattern is processed as one coherent whole rather than a series of time intervals, in contrast to rhythms that feature small integer ratios but no metrical beat (Grube and Griffiths, 2009).

ASSUMPTION 2: BAYESIAN INFERENCE OVER GAUSSIAN CATEGORIES
A general assumption in rhythm research is that perceptual timing can be described as a process combining prior beliefs with sensory input. One way to capture this mathematically is to model time perception as Bayesian inference (Jazayeri and Shadlen, 2010;Cicchini et al., 2012;Merchant et al., 2013;Pérez and Merchant, 2018). Whilst our analysis relies on the nature of the prior rather than how it is deployed during perceptual interpretation, taking a Bayesian viewpoint is useful. It lets us express a prior distribution as an inductive bias (Thompson et al., 2016) and has been successfully applied in previous models of time interval estimation (e.g., Jazayeri and Shadlen, 2010;Cicchini et al., 2012). Employing Bayesian inference, we can characterize participant behavior as attributing a categorical representation to interval ratio r i according to the distribution p z i = k |r i ∝ p(r i |z i = k)p(z i = k). Our focus is the prior distribution over categories, p(z i = k), equivalently G. Alternatively, it would be possible to model learners' assumptions about a likelihood distribution as a source of bias (e.g., Jazayeri and Shadlen, 2010;Cicchini et al., 2012). Jacoby and McDermott (2017) recently modeled ninterval rhythms as single points in the n-1 dimensional simplex, and formulated a multivariate-mixture prior over this space, assuming Gaussian models to underlie each of the mixtures. Namely, they formulated a multivariate p(z) directly. Our approach to the prior is closely related. Like Jacoby and McDermott (2017), we express the prior as a mixture of Gaussian components. However, our formulation treats an n-interval rhythm as a set of n-1 independent samples from a univariate multimodal distribution, rather than a single multivariate sample. The two approaches essentially represent minor variants of the model for covariance of interval ratio categories. The assumption that the distribution p(z) has a Gaussian form should be tested in future work, but is in line with existing work and a fair first approximation.
We write the prior as a K-dimensional Gaussian mixture of interval ratio categories, and the data likelihood as i.i.d. Gaussian underlying these categories, such that the marginal distribution of interval ratios has the form: Here, the prior assigns to each Gaussian k = 1, ..., K a weight in the mixture, ϕ k , which determines its relative prominence as a category; a category mean µ k , which specifies the expected interval ratio underlying this category; and a category variance σ k . The assumption we make is that weights are constant: ϕ k = K −1 (corresponding to an equal number of observations in the Gaussians in Figures 1C-E). Whilst we hope to examine this assumption empirically in the future, we proceed under the most neutral assumption: no interval-ratio category is privileged.

ASSUMPTION 3: A SMALL NUMBER OF SUB-SECOND CATEGORIES
Assuming that our indexing of categories under the prior is strictly ordered by the category means, such that µ j < µ k ⇔ j < k, we can immediately express our second empirical constraint on distribution G: only few categories exist (Desain and Honing, 2003;Motz et al., 2013;Ravignani et al., 2016Ravignani et al., , 2018. K is naturally limited by our approach to only model components for small integer ratios, and these are limited in number. Furthermore, we bound the range of category means µ k from 200 ms (London, 2004, p. 35) to 1,000 ms (Shaffer, 1983;Desain and Honing, 2003;Buhusi and Meck, 2005). This constraint limits K to the largest number of categories such that no category mean exceeds 1,000 ms: (2)

ASSUMPTION 4: SCALAR TIMING
So far, our assumptions constrain neither category means µ k nor standard deviations σ k . Our final, perhaps most central assumption is that timing exhibits scalar properties in the subsecond time range considered here (Gibbon, 1977;Matell and Meck, 2000). Scalar timing drastically reduces the number of free parameters describing distribution G, by expressing category variances as a function of category means. The standard deviation of each category σ k equals the mean µ k multiplied by a constant, dimensionless factor s ( Figure 1E): Previous empirical reports estimated s to approximate 0.025 (Friberg and Sundberg, 1995;Madison and Merker, 2004).

LINKING CATEGORICAL PERCEPTION AND SCALAR TIMING: HOW CLOSE CAN WE GET TO INTEGER RATIO INTERVALS?
All four assumptions are empirically based and independent of each other. Now, G can be further characterized by the degree of overlap between Gaussians composing the mixture. To formalize this, we assume each category k to intersect with its adjacent neighbors k−1 and k+1 at a distance proportional to c l k and c u k away from its mean µ k (Figure 1F), which is a constant proportion of the standard deviation σ k . c l k and c u k parameterize the overlap between categories: they express how many standard deviations away from its mean µ k the cluster k intersects the cluster k+1, and how many standard deviations away from its mean µ k+1 the cluster k+1 intersects the cluster k ( Figure 1F shows an example for k = 1,2).
Combining this idea of a parameterized overlap with scalar properties, each cluster k extends from µ k − sc l k µ k to µ k + sc u k µ k . Under these assumptions, the distance between the means of two adjacent distributions ( Figure 1F) can be written as and their ratio as Substituting (5) into (4) provides which can be simplified and rewritten as Equation (7) requires, to be well-defined, that its right side is positive, namely Operationally, the category means following from the constraints on G can be calculated using the recursion equation: The constraints structure the space of component Gaussians in the prior such that, by specifying µ 1 , we can compute µ k for all k ≤ K using Equation (9) (Figure 1E). These quantitative tools enable the formulation of several questions. Given our post-hoc knowledge that the prior is characterized by categories centered at small integer ratios, do the constraints we laid out structure the prior such that integerratio clusters are predicted by setting µ 1 to the smallest possible integer ratio?
An alternative approach might be to assume that one ratio is e.g., ½, and ask whether our equations imply small integer

Bayesian inference
FIGURE 2 | Schematic representation of the perspective introduced by this paper. Black solid-line boxes represent empirically supported assumptions. "Bayesian inference" is outlined in gray to indicate that it is used here as a working assumption and conceptual framework, rather than an empirically supported assumption on cognitive processes (Shi et al., 2013). "Neural oscillations" are dashed because they represent observed neural process whose connection with the other behavioral concepts has not been proven (yet). The quantitative parameters are: category means µ i , a scalar constant s, and c i , which is the abbreviation of c l i and c u i , parameterizing the overlap between categories. The proposed way of representing rhythmic structure depends, among other factors, on the constancy of r k (see main text). A deviation from this constancy would result in larger integer ratios, with the deviation accumulating over the categories when iterating equation (8). Empirical work (e.g., Ravignani et al., 2016;Jacoby and McDermott, 2017) has tried to operationalize the connection between the "mathematical perfection" of integer ratios and their empirical counterpart in a number of alternative ways. This perspective does not address how and when a real number is perceived as an integer ratio, leaving this as an empirical question for psychophysics research. In general, large integer ratios, and even irrational-number ratios, can be perceived as small integer ratios if close enough to one. For instance, 2 7/12 ≈1.498307 is irrational (Coxeter, 1968) but close to 3/2. Virtually all pianos, today, employ this irrational number (1.498307) in their well-tempered tuning, which is "close enough" for human hearing to the integer ratio 3:2. At the same time, the "catchiness" of a rhythm also depends on small deviations from the integer ratios. For instance, delayed occurrences of expected beats even at varying levels of deviation from the underlying rhythms (together with the compensatory temporary speed-ups) are perceived as interesting, while a strictly regular rhythm will quickly appear dull. ratios for the remaining cluster centers. More generally, do the constraints laid out impose an integer ratio structure on the prior without assuming an integer ratio for any of the clusters, simply by setting c k in a certain way?
HOW DO c u k AND c l k RELATE TO µ k ?
The x-coordinates for the intersection point, expressed as µ k − sc l k µ k and µ k + sc u k µ k , can be substituted in the respective Gaussian probability density functions, equated to impose the condition of intersection on the y-axis ( Figure 1F): which simplifies as: Equation (11) means that the difference of squares between c's is proportional to the logarithm of the ratio of the two means.
As the right side of Equation (11) is always strictly positive, c u k can never equal c l k+1 . While this does not constitute a mathematical contradiction with our formulation (still leaving an infinite number of mathematically possible c's), it is admittedly difficult to interpret psychophysically.

SUGGESTED EXPERIMENTS: MODELING AND PSYCHOPHYSICS
Equations (7, 9) support a potential link between scalar timing and integer ratios, as they include the integer ratios r k and the scalar constant s (Figure 2). These generative formulas can be implemented in computational simulations to explore the shape of the parameter space. Given specific values for parameters s, c u k and c l k , the equations will return a unique set of ratios: are these small integer ratios? Likewise, given one single integer ratio µ 1 , all other µ k are determined by Equation (9): which values of µ 1 result in r being integer ratios and s, c u k and c l k being psychophysically plausible values?
The perspective we offer here creates the basis for expanding not only into theoretical but also empirical work on s, c u k and c l k . Experimental research can advance this approach by estimating s, c u k and c l k via Equation (7) or (11). Here, we treated the parameter s as an a priori known, one-valued constant (s = 0.025). To improve the model further, the variance of s might be estimated by replications of previous psychophysical experiments such as those by Friberg and Sundberg (1995) and Madison and Merker (2004). Values for c u k and c l k can be estimated from experiments testing the perception (and misattribution) of durational categories.

LIMITATIONS, DISCUSSION, AND CONCLUSIONS
We explore quantitative links between scalar timing and the human bias toward small integer ratios. The arguments we provide reduce the explanatory space to a few hypotheses. One possibility is that integer ratios are not a human cognitive primitive, but rather a simple by-product of other cognitive constraints, and their interaction.
Alternatively, the scalar timing framework might not be the most suitable one to explain the integer-ratio phenomenon of human rhythm. If one adopts oscillatory frameworks, integer ratios might simply arise from the oscillatory properties of brain activity, and so can scalar properties and categorical perception. Small integer ratios in particular would just reflect epiphenomena of harmonics of one oscillator or the interaction between two or more oscillators (Collyer et al., 1994;Strogatz, 2003;Buzsaki, 2006;Gupta, 2014;Merker, 2014;Gupta and Chen, 2016). Neural resonance to musical rhythm (Large, 2008), interval tuning (Merchant et al., 2013;Bartolo et al., 2014), and population clocks (Crowe et al., 2014;Gouvêa et al., 2015;Bakhurin et al., 2016;Merchant and Averbeck, 2017) present alternative timing mechanisms, documented by in-vivo recordings of neural populations and compatible with the observed small integer bias.
In any case, scalar timing and oscillatory theories are simplifications, i.e., approximate descriptions derived from confined experimental set-ups. Neurally and behaviorally, the dissociation or compatibility between scalar timing and oscillatory theories is more complex than it may appear in higher level cognitive theories, and only detailed neural models will enable us to define the actual underlying mechanisms.

AUTHOR CONTRIBUTIONS
AR and BT conceived the idea and performed the mathematical derivations. All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

FUNDING
AR was supported by funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 665501 with the research Foundation Flanders (FWO) (Pegasus 2 Marie Curie fellowship 12N5517N awarded to AR). AR and BT were also supported by a visiting fellowship in Language Evolution from the Max Planck Society and ERC grant 283435 ABACUS (awarded to Bart de Boer).