A Dynamical Systems Approach to Spectral Music: Modeling the Role of Roughness and Inharmonicity in Perception of Musical Tension

Tension-resolution patterns seem to play a dominant role in shaping our emotional experience of music. In traditional Western music, these patterns are mainly expressed through harmony and melody. However, many contemporary musical compositions employ sound materials lacking any perceivable pitch structure, rendering the two compositional devices useless. Still, composers like Tristan Murail or Gérard Grisey manage to implement the patterns by manipulating spectral attributes like roughness and inharmonicity. However, in order to understand the music of theirs and the other proponents of the so-called “spectral music,” one has to eschew traditional categories like pitch, harmony, and tonality in favor of a lower-level, more general representation of sound—which, unfortunately, music-psychological research has been reluctant to do. In the present study, motivated by recent advances in music-theoretical and neuroscientific research into a the highly related phenomenon of dissonance, we propose a neurodynamical model of musical tension based on a spectral representation of sound which reproduces existing empirical results on spectral correlates of tension. By virtue of being neurodynamical, the proposed model is generative in the sense that it can simulate responses to arbitrary sounds.


INTRODUCTION
Music gives rise to some of the strongest emotional experiences in our lives. Even though the first surviving theoretical treatments of the power of music to move the soul were written in the fifth century B.C. [1], the origin of this power still largely remains a mystery. However, both musicological and music-psychological evidence seems to converge on the theory that music arouses emotions by a sophisticated play of tension-resolution patterns [2]. For instance, many authors describe the musical language of Richard Wagner (1813-1883) as "the language of longing" [2]; it may not be merely a coincidence that Wagner's common practice was to introduce a dissonant chord, making the listener "long for" a more consonant chord to "resolve" the dissonance, and then keep the listener in tension by delaying the resolution or slap him right away with another dissonance [ [2], p. 334-339].
Creating and resolving tension is an easy task for composers who follow the nineteenth century Western tradition (as, e.g., most "mainstream" composers of film music do); any standard textbook on harmony and voice leading provides them with plenty of recipes [e.g., [3]] tested by centuries of musical practice and decades of psychological research [e.g., [4] and references therein]. However, over the course of the twentieth century, many composers enriched their palette with sounds possessing neither definite pitch (precluding melody) nor perceivable voice structure (precluding harmony), thus venturing out into territories about which traditional theory has nothing to say but hic sunt leones [5]. Still, while tantalizing their audience with a sound palette ranging from pure tones to the most atrocious noises, they seek control over how their music is experienced by the listener as much as their more conservative colleagues do [6].
Devoid of any perceivable pitch structure, the ferocious sound materials contemporary art music is at times so fond of can only be conceived in terms of loudness and timbre. This forces any composer seeking a full control over these "beasts" to dive from the lofty heights of venerable musical abstractions like pitch, harmony, and tonality to the cold depths of spectral representations of sound. However, beauty emerges even from such depths; by careful manipulation of roughness and inharmonicity composers like Tristan Murail or Gérard Grisey "tense" their audience no less than Richard Wagner by his mastery of tonal harmony; indeed, the term "spectral music, " used when referring to the music style pioneered by the former two composers [7], does not tell the whole story.
As usual, music-psychological research somewhat lagged behind compositional practice; loud music has been shown to be perceived as more tense than soft music [8]; likewise, roughness (that is perception of rapid beating due to interference of close frequencies) seems to be positively correlated with tension [9,10]. A recent study assessed the effect of specific timbre attributes on the perception of tension [11], confirming particularly the role of roughness, inharmonicity (deviation of the constituent frequencies from integer multiples of a fundamental tone) and spectral flatness. However, the functional forms and mechanisms through which such stimuli aspects combine to give rise to perceived tension are still unclear.
For standard Western musical intervals, roughness is a principal source of perceived dissonance in musical material, which thus gives rationale to mathematical models of musical dissonance [12]. In Stolzenburg [13], a mathematical model of dissonance has been proposed and shown to correlate highly with empirical psychophysical data. The core idea of the model can be illustrated on a simple example: Consider an interval of perfect fifth, the most consonant interval after the octave, which, in the standard Western tuning, corresponds to the distance of seven semitones. Hence, denoting f 1 and f 2 the fundamental frequencies of the tones spanning the interval, 12 , e.g., ′ = [2,3]. Then, the dissonance of the interval will be 2, the minimum of ′ . Likewise, for an interval of major sixth, considered less consonant than P5, with frequencies [f 1 , f 2 ] = [f 1 , f 1 2 9 12 ], we have ′ = [3,5], the dissonance being 3 in this case. Finally, for a dissonant interval of minor seventh with frequencies [f 1 , f 2 ] = [f 1 , f 1 2 9 12 ], ′ = [4,7] and the dissonance will be 4. In general, the dissonance of any vector of frequencies, f , approximated as ′ ∈ Z n ≥1 , is assumed to be proportional to the minimum element of ′ . Note that the dissonance of an interval estimated this way does not change if we include harmonics (integer multiples) of the constituent frequencies. However, it does change if one uses a different rational approximation of the frequency ratio; incidentally, all standard Western intervals except for the octave are characterized by an irrational frequency ratio. In Stolzenburg [13], this inconvenience is dealt with by averaging over several alternative approximations.
The quantity above, called relative periodicity [see Definition 6 in [13], p. 17], is equivalent to obtaining the period of the fastest oscillation having the frequencies in question as its harmonics, in particular the period assessed in cycles of the lowest frequency in question. Interestingly, this oscillation has been experimentally observed to be represented in the auditory brainstem response to the intervals listed above [14], with the representation being particularly faithful for relatively more consonant intervals.
Motivated by the latter observation, we put forward a neurodynamical model of tension which is in line with the basic concepts of pitch perception of complex sounds and reproduces the results concerning the effect of roughness and inharmonicity reported in Farbood and Price [11] and, at the same time, provides a dynamical interpretation of relative periodicity [13]. In this regard we follow suit of existing studies which apply the dynamical systems theory to composition [15] and analysis [16,17] of music.

METHODS
Everyone interested in neurodynamical modeling faces the same basic dilemma: which model to use? For modeling perception of music, the most common choices are the leaky integrate-andfire (LIF) model [18][19][20][21] and a canonical model for gradientfrequency networks of Wilson-Cowan-type neural oscillators [22][23][24][25]. Still, neuroimaging methods are far from giving us an assurance that among the myriad possible models one of these is the "correct" one. Hence, to improve our chances, instead of adhering to a particular model right from the beginning, we take a whole class of models as our point of departure in the hope that the class is wide enough to include a good approximation to the actual biological system. More precisely, we proceed by derivation of a normal form to which any of the class members can be transformed through a continuous near-identity change of variables and parameters and (possibly) a time scaling. Then, analysis of the entire class effectively reduces to analysis of the normal form [26].
The pioneering work of the latter approach in our field is Large et al. [22]; the model proposed therein can even be fit to auditory brainstem responses to musical intervals [24,27]. Consequently, one could argue that we already have a neurodynamical model of tension at hand. However, to prove analytically that the latter is indeed a decent model of tension, we would first have to simplify the model substantially for the specific purpose, adopting thus a similar strategy as in previous applications of the model [23,28,29]. Moreover, the class of systems of which the latter is a canonical model [22,27] is not the class of systems we are interested in, as shown below.
Our choice of model class is motivated by the observation that the auditory system is sensitive to the periodicity of the signal (see section 1). A possible explanation for the observation is that the system comprises an array of oscillatory "detectors" with external auditory signal input; these can be viewed (and indeed physically seem to be) ordered tonotopically with respect to their eigenfrequency. Within this framework, eliciting a sustained oscillation in one of the oscillators represents detection of the corresponding period in the input signal. Or, on a continuous scale, the more sustained an oscillation is, the more "'confident" is the auditory system that the input signal exhibits the corresponding period. In order for a stimulus to elicit an oscillation in a model belonging to our class of choice, it needs to destabilize the (originally stable) quiescent state; if this shift in stability is relatively small or intermittent, the oscillation will have small or fluctuating amplitude.
To avoid introducing unnecessary complexity, we start building our model class of interest by considering the simplest possible model of an oscillator: where x 1 , y 1 ∈ R andẋ 1 ,ẏ 1 are time derivatives. In matrix form: Here, x 1 and y 1 could be interpreted as the amount of local inhibitory and excitatory synaptic activity, respectively, but the particular physiological interpretation of the variables is not important for our discussion. In line with the scenario outlined above, we want the oscillator to transit from a quiescent state (say, [x 1 , y 1 ] = [0, 0]) to an oscillation when subject to an input having the oscillator's period (1 in this case). By definition, the spectrum of such an input consists of frequencies which are a subset of 1, 2, . . . , M. As we later aim to introduce the possibility of modeling the effect of inputs with frequencies deviating from perfect multiples of the base frequency, we allow already in the basic model for repeated frequency components of the input, yielding the input frequency vector [ω 1 , ω 2 , . . . , ω n ] = = ], where M is the highest frequency contained in the input. Representing the i-th frequency component of the input with frequency ω i as [x i+1 (t), y i+1 (t)] ∈ R 2 , we extend the oscillator with forcing by the input: , y 2 (t), x 3 (t), y 3 (t), . . . , x n+1 (t), y n+1 (t)) g(x 1 , y 1 , x 2 (t), y 2 (t), x 3 (t), y 3 (t), . . . , x n+1 (t), y n+1 (t)) , where f (·), g(·) are smooth real functions. Two more steps are needed in order to make the model class amenable to derivation of a normal form. First, we need to rewrite this non-autonomous system as an autonomous one. This is straightforward, since [x i+1 (t), y i+1 (t)], being frequency components of a signal, are harmonic oscillations. Let x = [x 1 , y 1 , x 2 , y 2 , . . . , x n+1 , y n+1 ]: Second, we expand the f (·) and g(·) functions around the origin: where f d (x), g d (x) are homogeneous polynomials of degree d and O(x 4 ) is a polynomial of degree 4 or more. Before delving into the actual derivation, a few remarks are in place. First, the idea of modeling the auditory system as a tonotopically arranged array (or rather a series of arrays) of oscillators is in fact not new [23,24,27]. Further, since the lowest frequency of signal with harmonic spectrum corresponds to its perceived pitch, Equation (1) is, by design, a pitch detector. Then, instantiating Equation (1) with a range of eigenfrequencies can be thought of as matching a template to the signal; consequently, our model can be considered a neurodynamical implementation of template matching postulated as a possible mechanism underlying pitch perception [30].
As the first step of the derivation, we diagonalize the linear part of Equation (1) using the following two matrices: where ı is the imaginary unit. Denoting the linear part of Equation (1) as A, the diagonalized matrix reads: The diagonalization defines a change of variables: After this change, Equation (1) reads: where and · is complex conjugate. From here, with a slight abuse of notation, we drop for simplicity the subscript of g ζ (ζ ) which is a function in the complex vector space corresponding to g(x) from section 2, that is obtained through the change of variables applied, i.e., ; likewise for f (ζ ). Our subject of study will be the normal form of the system comprising Equations (3) and (4). When in normal form, f (ζ ) in Equation (3) will only contain monomials of the form where p = [p 1 , p 2 , . . . , p n ], q = [q 1 , q 2 , . . . , q n ], and [r, s, p, q] is a nonnegative integer solution to the equation Analogously, the exponents in g(ζ ) solve (see Equation 2). Since f (ζ ) and g(ζ ) contain neither constant nor linear terms (see section 2), For conciseness, from now on, whenever v = [r, s, p, q] satisfies the above inequality, we write v ∈ S when it solves Equation (6) and v ∈ S when it solves Equation (7 To broaden the class of systems covered by our normal form, we unfold Equation (3) using small parameters α ∈ R and β ∈ R n :ż This way, in addition to the models with Taylor expansion around the origin of the form (Equation 1), our class now includes models whose Taylor expansion around the origin has the forṁ The corresponding normal form then readṡ Intuitively, the α parameter makes it possible to change the stability of the origin (see section 2.1) whereas the β k parameters allow the model to resonate when the inputs are not exactly its harmonics. In fact, it might happen that two different inputs approximate the same harmonic. That is, their frequencies equal (1 + β i )ω i and (1 + β j )ω j , respectively, with ω i = ω j . This is the reason why we allowed for duplicate frequencies in the input frequency vector (see the beginning of section 2). It might appear that Equation (9) only models period detection in a full harmonic spectrum (up to the n-th harmonic). However, it can be easily adapted to a more general stimulus with frequencies H harm ⊂ by removing the dependence on those z j and w j for which j / ∈ H harm from the equation forż 1 anḋ w 1 (assuming the dimension of the system is large enough to accommodate any stimulus of practical interest). Since the righthand sides of the equations are polynomials, it suffices to zero-out the coefficients of those terms containing nonzero powers of the offending z j or w j . This will turn out to be useful when extending the model to an array by making scaled copies of Equation (9); while changing the eigenfrequency by scaling the left-hand side, we can zero-out coefficients as needed to reflect the changing relation between the eigenfrequency and the inputs.
It might be interesting to compare Equation (9) to [[22], Equation 15] reproduced below in a form that facilitates the comparison [see also [27], Equation A.1], which is also a normal form for an oscillator coupled to a set of sinusoidal inputs: where In Equation (10), the sum runs over all such vectors [r, s, p, q] for which either r = s + 1, s ∈ Z >0 , p = q = 0, or r = 0, s ∈ Z ≥0 , p, q ∈ Z n ≥0 , r + s + |p| + |q| > 0. In contrast, the sum in Equation (9) only runs over the nonnegative solutions to Equation (6), whose coefficients are determined by input frequencies. Hence, whereas Equation (10) applies to inputs of arbitrary frequencies, Equation (9) requires that the frequencies are close to the harmonics of the oscillator's eigenfrequency. On the other hand, in contrast to Equation (9), Equation (10) presupposes a particular form of the coefficients (see Equation 11). Consequently, Equation (9) covers neither a subset, nor a superset of the models covered by Equation (10).

Model Analysis
In this subsection, we analyse the normal form (Equation 9) derived in section 2. More precisely, we study the stability of the origin (z 1 = w 1 = 0). The choice of the origin as the focus of this section is motivated by our previous (arbitrary) choice of the origin as a "quiescent state" of the oscillator (see the beginning of section 2). The reason why we treat its stability in such a detail here is that it determines whether, e.g., the oscillator stays quiet (its period was not detected in the input signal) or oscillates (its period was detected in the signal)-see the beginning of section 2.
As the first step of the analysis, note that all solutions to Equation (9) are of the form ϕ2+(1+β2)ω2t) , . . . , ρ n e ı(ϕn+(1+βn)ωnt) ] ϕ2+(1+β2)ω2t) , . . . , ρ n e −ı(ϕn+(1+βn)ωnt) ] . (12) Consequently, using the simplified notation = [1, 1, . . . , 1, 2, 2, . . . , 2, . . . , N, N, . . . , N] and β = [β 1 , β 2 , . . . , β n ] as above, and introducing ρ = [ρ 1 , ρ 2 , . . . , ρ n ], ϕ = [0, ϕ 2 , . . . , ϕ n ], and β = • β = [ω 1 β 1 , . . . , ω n β n ], we can drop equations forż 2 ,ż 3 , . . . ,ż n+1 andẇ 2 ,ẇ 3 , . . . ,ẇ n+1 from Equation (9) and writė Introducing new coordinates relative to a rotating frame of reference e ıt , u = z 1 e −ıt (14) v = w 1 e ıt , and new parameters, Equation (13) As we will see in section 3, under a rather generic restriction on Equations (16) and (17), the stability of u = v = 0 crucially depends on the relative periodicity and the inharmonicity of the input, as formalized in previous studies, and hence its perceived tension (see section 1). Namely, we require that Equation (16) contains no linear terms in v and, symmetrically, Equation (17) has no linear terms in u. For the remaining linear terms, Equation (6) reduces to Assuming the origin is a fixed point of the system of Equations (16) and (17), its stability is determined by the Jacobian of the system evaluated at the origin: where In particular, the fixed point solution at the origin is stable, if all the eigenvalues of the Jacobian have negative real parts; while it is unstable if at least one eigenvalue of the Jacobian has positive real part. Apparently, without input (ρ = 0), the stability is solely determined by the matrix A, particularly the unfolding parameter α. For positive α, the fixed point at origin is unstable, while for negative alpha, the fixed point at origin is stable. Note that whereas the original class of models (section 2) only covers systems in which the origin is marginally stable (α = 0), the "unfolded" class (Equation 8) encompasses the entire spectrum of stability of the origin. With input, one can view the Jacobian as the matrix A perturbed by a time-dependent term consisting of a sum of oscillators with amplitudes depending exponentially on p + q (with base ρ) and frequencies pq . Thus, if we consider the neural auditory system as spontaneously possessing stable fixed point for a given pitch-detector, i.e., its α < 0, only inputs with high amplitude ρ and/or spectral content giving rise to suitable solutions [1, 0, p, q] ∈ S with small value of (p + q) can perturb the matrix A sufficiently for the fixed point to lose stability at least transiently (note the complicated periodic behavior of the perturbation on the right-hand side), and the pitch-detector show input-modulated oscillatory behavior. An example of such scenario is presented later in section 3.
Let us now in detail assess which monomials appear on the right-hand side of the reduced equations. Note that all solutions to Equation (18) correspond to non-negative integer linear combinations of a finite set of minimal solutions, i.e., they have the structure: where M is a matrix with the i-th row, denoted [m i , n i ], equal to the i-th minimum solution to Equation (18) [31]. Consequently (see Equation 15), As noted above, we model stimulation with H harm ⊂ by zeroing-out those monomials containing nonzero powers of z j or w j for which j / ∈ H harm . Consequently, B kM will be nonzero if and only if

RESULTS
In this section, we show how the system of Equations (16) and (17) reacts to stimulation with complex tones varying in relative periodicity and inharmonicity. For the specific case of complex tones consisting of two harmonics, analytical treatment is feasible as we are basically dealing with an interval comprising two pure tones. Let the frequency ratio of the two harmonics be approximated as i : j, H harm = [i, j], where i ≤ j. Table 1 summarizes the rows of M ′ together with the corresponding frequencies of the complex exponentials in Equation (21) (i.e., ±(m j − n j ) β ) and the exponents of ρ (i.e., m j + n j ) for this case. Note that both the frequencies (in absolute value) and the exponents grow monotonically with i, which is precisely the relative periodicity of the interval (see section 1). Hence, as long Quantities appearing in Equation (21) for the special case of stimulation with an interval (that is, as B kM does not grow superexponentially with i, the amplitudes of the complex exponentials (i.e., ρ m j +n j in Equation 21) increase with decreasing relative periodicity of the interval. Further, it can be shown that the frequencies above also grow (in absolute value) with the inharmonicity of the interval, the other factor in perception of musical tension considered here. Let f 1 and f 2 denote the lower and the higher frequency of the interval, respectively, that is, (see Equation 12). Additionally, let The inharmonicity of the interval [f 1 , f 2 ] with respect to the fundamental frequency f 0 is defined as its weighted Manhattan distance to the interval [if 0 , jf 0 ], comprising the i-th and the j-th harmonic of f 0 . The distance is weighted by the squared signal amplitudes and normalized by f 0 and the sum of the squared signal amplitudes [11]. In our particular case, the inharmonicity I ij f 1 f 2 is equal to Indeed, the frequencies grow (in absolute value) with the inharmonicity of the interval. Consequently, noting that A governs the stability of the fixed point solution at origin without input (ρ = 0), we conclude that, under the above assumption on B kM , pure-tone intervals with lower relative periodicity and lower inharmonicity (i.e., those perceived as less tense) cause a higher-amplitude and slower fluctuation of the driven system eigenvalues around those of A than those with higher relative periodicity and inharmonicity (perceived as more tense). Note that there is an ambiguity of approximation represented by a choice of i and j in Equation (22). Further, Equations (16) and (17) are far from being a global model of perception of tension even in complex tones consisting of just two harmonics; they are local in the sense that they only model perception of a particular tone with respect to a particular approximation. Last but not least, we still have to demonstrate that the above fluctuations of stability translate to features of the oscillatory dynamics in a meaningful way. We address these issues now when considering the general case of stimulation with a complex tone consisting of more than two harmonics. To this end, we construct an array of models like Equation (16) differing in eigenfrequency. Here, each eigenfrequency represents a choice of f 0 in Equation (22) so that the entire array essentially works as a pitch detector. Time traces from simulations of such an array are depicted in Figures 1, 2.
The equations for the array were derived by applying the above restriction on linear terms to Equation (9), writing-out the inputs (Equations 12, 16, 17) and using Equations (6) and (18), then truncating the higher-order terms, and, finally, setting and scaling the time for convenience by the eigenfrequency, f k , which yields a parametrically-forced normal form for supercritical Andronov-Hopf bifurcation: (24) Here, (ω j ) i signifies a vector with ω j at the i-th position and zero otherwise. and β are set as = [ω 1 , ω 2 , . . . , ω 54 ] = [1, 1, . . . , 1, 2, 2, . . . , 2, . . . , 6, 6, . . . , 6] β = 24TET 1 − 1 (see Equations 1,8), where each element of 24TET approximates the corresponding element of as a power of 2 1 24 (the 24tone equal-tempered tuning); and 24TET are aligned in such a way that ω 5 = ω 24TET, 5 = 1 and hence β 5 = 0. The oscillators (Equation 24) receive connections from a bank of input units-linear oscillators with eigenfrequencies spanning from B♭ 0 to B 4 in quarter-tone steps. In accordance with Equation (8), each oscillator (Equation 24) is only connected to input units with frequencies (f k ω i β i in Equation 8, after scaling by f k ) approximating its harmonics (in the above tuning) and, additionally, to frequencies up to 4 quarter-tones below and above these. In other words, it does not have fixed homogeneous connectivity input strength from all input units, but rather receives (weighted) input only from input units with frequencies close to its first six approximate harmonics; the connectivity of each oscillator is thus effectively defined by a connectivity pattern or kernel consisting of six unimodal elementary Gaussian kernels (k(l) = e −0.5l 2 ; l ∈ {−4, . . . , 4} and k(l) = 0 otherwise) centered at the harmonics. See visualization of the connectivity kernel in Figure 3. Moreover, only the connections emanating from the input units whose frequencies are included in the stimulus are set to have nonzero amplitude in the respective simulation. Note that by fixing a set of eigenfrequencies (corresponding to different choices of f 0 in Equation 22) and input units and restricting the connectivity to (near-)harmonics, there remains no ambiguity in approximation of the input; each oscillator, as long as the input falls within the reach of its connectivity kernel, approximates the input in its own, unique, way.
All simulations were run from initial conditions z 1k (t 0 = 0) = 0.001 , with α = −0.001 , a parameter setting corresponding to the (almost loss of) stability (without input) of the fixed point z 1k = 0. Three alternative inputs were applied, whose spectra can be seen in Figure 4. The first corresponds to harmonic input with the C tone at its base plus its first five harmonics (tones with integer multiple frequencies of the base tone). The second input results from a transformation of the first which increases inharmonicity while the third is a result of a transformation which increases roughness. As can be seen from Figures 1, 2, both transformations seem to increase fluctuation of stability of the origin, as predicted by our analysis pertaining to two-frequency stimulation. This results in an increase in amplitude modulation across the oscillator array and a corresponding decrease in peak amplitude (see Figure 5). In other words, an increase in perceived musical tension seems to be related to an increase in fluctuation of stability of the origin which manifests itself as an absence of a stable dominant amplitude peak. These preliminary observations are largely confirmed by computing the minimum and the maximum of each oscillator's amplitude trace (see Figure 6). Consequently, we put forward the absence of a stable dominant amplitude peak as a hallmark of perceived musical tension in our model. ). The first "blob" represents connections which stem from input units with eigenfrequencies ranging from B♭ 0 to D 1 and target the oscillator's "slots" corresponding to ρ 1,1 , ρ 1,2 , . . . , ρ 1,9 in Equation (24); likewise, the second blob connects B♭ 1 , . . . , D 2 to ρ 1,10 , ρ 1,11 , . . . , ρ 1,18 etc. The connectivity of any other oscillator is obtained by shifting this kernel so that the center of the first blob is aligned with the oscillator's eigenfrequency.

DISCUSSION
We propose the absence of a stable unambiguous pitch detection modeled as the absence of a pronounced amplitude peak in an array of oscillators to be a correlate of timbre-induced musical tension. In the class of oscillators we chose for populating the array, the amplitude of the limit cycle is determined by the stability of the origin; if the stability switches between a stable and an unstable regime fast enough, the amplitude doesn't have enough time to grow. We show that the frequency and magnitude of this switching depends on inharmonicity and roughness of the input to the oscillator. Imagine such an oscillator is actually present in the brain; when subject to a tense (inharmonic and/or rough) stimulus, it will remain almost silent, leading to an "unclear, " "unstable, " "difficult to memorize" etc. percept (see Figures 1, 2D,F). In contrast, a less tense stimulus would result in a "clear", "stable", "easy to memorize" etc. percept (see Figures 1, 2B). Of course, neurophysiological and neuroimaging evidence shows the what we have in our head is not a single oscillator, but rather an entire bank of them; we show in our simulations that the results of our analysis of single oscillator generalize to an array of them in the sense that the average oscillation amplitude across the array is lower for tense than less tense stimuli (see Figure 6).
Of course, tension is clearly not a one-dimensional phenomenon and different aspects of it could be related to different aspects of the underlying neurodynamics. For instance, in a nonlinear model like the one proposed here, loudness of the input is going to affect both the general amplitude of the oscillations and their temporal fluctuations-in a frequencydependent manner, as our example simulations for two loudness levels suggest. We consider disentangling these not necessarily orthogonal dimensions of tension as a natural extension of the currently proposed modeling framework.
We have proposed a neurodynamical model of musical tension (see Equation 24) which reproduces existing empirical results on timbral correlates of tension, is consistent with neuroimaging findings [14] in that consonant stimuli compared to dissonant stimuli elicit more sustained periodic neuronal activity of higher amplitude, and due to its generative nature can provide prediction of perceived tension of an arbitrary sound input. More precisely, we have demonstrated that both inharmonicity and roughness make the spectrum of the simulated signal flatter and more variable (wider range over time) (see Figures 1, 2, 6). Note that while [14] quantified the periodicity by the amplitude of the autocorrelation peak of the signal spectra, we rather proposed the absence of a temporally persistent, pronounced amplitude peak in the spectrum of the elicited neural activity as a possible correlate of tension-a related indicator that is also present in the results presented in [14]. One might even speculate, based on the similarity of the spectrum to the major key profile [32], that the same principles underlie perception of tonality.
Considering the simulation results reported above in more detail, we note that the overall increase in fluctuation of stability of the origin for the inharmonic and the rough input as compared to the harmonic one can be explained based on the analytical insights into the dynamics of a single oscillator obtained earlier.
More precisely, the nearly-harmonic relations in the inharmonic and the rough input introduce oscillating terms into most of the oscillators' coupling functions; the increase of amplitude modulation is, in turn, accounted for by the fact that the amplitude of the stable limit cycle of Equation (24) is determined by the stability of the origin. The decrease of the peak amplitude is, for the inharmonic input, probably due to the connectivity; there are no exact harmonic relations in the input and hence no oscillator can align its connectivity kernel optimally with the input (see Figure 3). For the rough input, it might be a consequence of scaling down the amplitudes of its frequency components to keep the overall loudness equal to that of the other inputs which consist of fewer harmonics (see Figure 4).
As for our general approach, a few comments are in order. First, for the sake of simplicity, we chose a subclass of multiple centers [a generalization of double centers; see [26]] as our family of models. It might be an interesting avenue for future research to determine whether there are other families of models in which relative periodicity and inharmonicity of the input plays such an important role.
Also for the sake of simplicity, we only considered relative periodicity and inharmonicity of pure-tone dyads. For general sounds, we would be dealing with the set of nonnegative solutions to a general linear Diophantine equation (Equation 18). To the best of our knowledge, the structure of the set (its minimum generators) can only be determined algorithmically [e.g., [31]]. This makes analytical insights virtually impossible in the general case.
Further, concerning the phenomenon wherein loud music is perceived as more tense than soft music [8], we argue that, replacing the bank of input units with a model of cochlea, the effective input generated by a loud harmonic spectrum would be very similar to the rough input used in the simulations reported FIGURE 4 | Spectrum of the harmonic (A), inharmonic (B), and rough (C) input used in the simulations. The construction loosely follows the procedure from [11]. We chose ǫ = 3 for the soft and ǫ = 5.4 for the loud inputs.   Figure 2 (B) with the first 20 s dropped to attenuate the effect of the same initial conditions. Incidentally, note the similarities between Figure 6 and the major key profile from Krumhansl and Kessler [32].
here. More precisely, we expect the loud harmonic spectrum to displace not only those segments of the basilar membrane whose eigenfrequencies match the harmonics, but also the adjacent segments (see Figure 4). This way, the effect of loudness would be accounted for by a combination of cochlear physiology and sensitivity of our model to roughness.
Finally, even though the choice of spectral representation was motivated by our interest in contemporary art music, especially the so-called "spectral music, " the model presented here is applicable to any kind of music; indeed, even music composed with traditional categories in mind ends up being rendered as sound which can be fed into our model.
To conclude, mapping perception to neurodynamics is hard. However, from time to time, a favorable constellation of research sheds light on the underlying physiology. The fruitful concept of relative periodicity [13] suggests that roughness, as one of the perceptual "dimensions" of timbre contributing to tension, might originate in (neural) resonance. Indeed, in this study, we have shown that the dynamics of stability of the origin in a wide class of periodically forced nonlinear oscillators crucially depends on the relative periodicity of the input and, additionally, on its inharmonicity. Since roughness and inharmonicity are principal constituents of perceived tension, we have effectively put forward a possible neurodynamical explanation of musical tension. Moreover, for a particular model belonging to the above class, we have demonstrated by simulations that tense inputs result in an absence of a persistent dominant peak in the spectrum of the time series generated by the model.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.