Your new experience awaits. Try the new design now and help us make it even better

EDITORIAL article

Front. Signal Process.

Sec. Audio and Acoustic Signal Processing

Volume 5 - 2025 | doi: 10.3389/frsip.2025.1715792

This article is part of the Research TopicSound Synthesis through Physical ModelingView all 7 articles

Editorial: Sound Synthesis through Physical Modeling

Provisionally accepted
  • 1IRCAM, Paris, France
  • 2Sciences et Technologies de la Musique et du Son, Paris, France
  • 3Queen's University Belfast School of Electronics Electrical Engineering and Computer Science, Belfast, United Kingdom
  • 4Universita degli Studi di Bologna Dipartimento di Ingegneria Industriale, Bologna, Italy

The final, formatted version of the article will be published soon.

Physical modeling synthesis aims to simulate sound by solving the equations governing an instrument or acoustic system's behaviour. This paradigm stands apart from signal-based methods by directly encapsulating the cause-and-effect relationships of sound production. Its appeal lies in the fidelity of transients, the mapping from parameters to sound, and the capacity to expose how instruments work. Over the past fifty years, models have grown from simple linear caricatures to treatments of strongly nonlinear, coupled systems, frictional contacts, nonlinear audio effects, nonlinear air flows, and more. That realism comes at the cost of greater numerical hurdles and, consequently, the need for schemes that remain stable and efficient. More recently, new tools have entered the picture. Machine learning is being used to tune or hybridise physical models, yielding differentiable systems that can learn while staying tied to physics. Energy-based formalisms such as port-Hamiltonian models provide a systematic way to enforce passivity in power-exchanging systems. At the same time, hardware-GPUs, FPGAs, modern CPUs-has opened up real-time simulation at scales that were out of reach only a decade ago. This special issue sits in that landscape. The contributions span multi-physics and nonlinear modelling, energy-consistent algorithms, machine learning assistance, and hardware acceleration, all aimed at sharpening accuracy and stability while keeping models playable in real-time. What follows is a tour of the six papers and of how they move the field forward. The articles in this Research Topic span a diverse range of musical instruments and methodological approaches, yet they are unified by an emphasis on physically grounded models and system-level rigour. Below, we summarise each article in turn and discuss its context within the broader field. Riccardo Simionato and colleagues present a hybrid modeling approach that fuses deep learning with traditional DSP to emulate piano sounds in a differentiable framework [1]. Focusing on single piano notes, their method learns to synthesise the quasi-harmonic spectrum using physics-derived parametric formulas whose parameters are optimised from recorded samples. By embedding known piano acoustics (such as inharmonicity of strings and partial envelope decays) into a neural network training process, the model achieves a high degree of realism while remaining lightweight and fully interpretable. Notably, the learned synthesiser reproduces each note's partial frequencies (stretched by string stiffness) and amplitude dynamics across different key velocities. The authors report that the model generalises accurately across the piano's range, successfully capturing the inharmonicity and level of each partial. Remaining challenges include some loss of accuracy for very high-frequency partials at loud dynamics, likely due to limited training data in those regimes. Importantly, the architecture is modular and amenable to real-time use: it operates with low latency and modest computational load, making it suitable for interactive digital piano applications. This work exemplifies the trend of differentiable physical models, where neural networks are guided by physical insight-here, yielding a novel piano model that incorporates both data-driven and physics-based synthesis. It underlines how machine learning can enhance physical modeling by automating parameter tuning and system identification, all while preserving the intuitive control and explainability of classic models. Champ Darabundit and Gary Scavone propose a discrete port-Hamiltonian system (PHS) approach to model a single-reed woodwind instrument (such as a clarinet) in a modular, energy-consistent way [2]. In traditional woodwind synthesis, the nonlinear reed excitation and the acoustic resonator (bore and toneholes) have often been modeled separately with methods like digital waveguides or finite-difference schemes. This article instead formulates each major component-the reed, the air column, and an open/closed tonehole-as interconnected subsystems in the port-Hamiltonian framework. By doing so, the authors ensure that energy flow between components is explicitly tracked and conserved, conferring numerical stability and physical interpretability (power balance) to the complete model. The paper presents a number of contributions: a linearly implicit integration scheme for the beating reed dynamics using an energy quadratization method to handle the nonlinear collision force (Hunt-Crossley model) coupled with nonlinear airflow, a symplectic discretisation of the bore's 1D wave dynamics that aligns with known finite-difference time-domain results, and a new low-frequency effective model for tonehole impedance including a switching mechanism to simulate tonehole closure/opening. These elements are assembled such that the composite simulation remains passive and stable by construction. The benefit of the PHS approach is clearly demonstrated-once each component is cast in this form, they can be connected via power-conserving ports, and the overall instrument inherits guaranteed stability (no numerical energy gain) regardless of strong nonlinearity at the reed or rapid state changes at toneholes. Darabundit and Scavone validate their model on a virtual clarinet, showing that it can reproduce sustained oscillations and note transitions without instability. Romain Michon and co-authors tackle the question of real-time performance for large-scale physical models by comparing CPU, GPU, and FPGA implementations of a modal reverberation algorithm [3]. Modal synthesis, which represents an object's vibration as a sum of many independent harmonic oscillators, is an attractive technique for artificial reverberation and resonator effects due to its physical interpretability and inherently parallel structure. However, simulating thousands of modes at audio rate is computationally intensive. This study provides a timely examination of how modern computing platforms fare in this task. The authors implement a high-order modal reverb (a plate reverb with thousands of modes) on three platforms: a multi-core CPU (with vectorisation and threading optimisations), a general-purpose GPU (using hundreds of parallel threads), and an FPGA (using custom hardware pipelines), each carefully optimised to exploit the architecture's strengths. They then measure maximum achievable modal complexity (number of modes), processing latency, and resource usage in various real-time scenarios. The results reveal a nuanced trade-off: GPUs excel at scalability, comfortably handling the largest number of modes due to their massive parallelism, whereas FPGAs deliver unparalleled low-latency processing, making them ideal for time-critical applications. Meanwhile, modern CPUs-benefiting from increasing core counts and SIMD vector units-show surprisingly strong performance for moderate polyphony, approaching the throughput of specialised hardware for mid-sized problems. These findings underscore that no single processor is "best" for all cases; instead, the choice depends on the specific requirements (e.g. maximum reverb length vs. latency tolerance vs. development flexibility). Michon et al. conclude with discussions on each platform's practical role in audio DSP-GPUs for heavy parallel workloads in studio or cloud settings, FPGAs for ultra-low-latency embedded systems, and CPUs for general-purpose use where moderate parallelism suffices. Ewa Matusiak, Vasileios Chatziioannou, and Maarten van Walstijn present a rigorous study of the frictional interaction between a bowed string and the bow hair [4]. Their work takes the elasto-plastic (E-P) friction law -valued for its ability to describe sticking, pre-sliding, and slip, but prone to spurious energy injection under naïve discretisation-and subjects it to detailed energy analysis. By reformulating the bristle damping term, they obtain a version of the model that is unconditionally passive, i.e. incapable of injecting net energy regardless of parameter choice. Building on this, they derive a finite-difference scheme whose discrete energy mirrors the continuous balance, ensuring stability even in demanding transient regimes. A further theoretical advance, rarely achieved for friction models, is a proof of existence and uniqueness for the "bowed-mass" case. This result dispenses with the ad-hoc remedies (caps on bow force, Friedlander's selection rules, or regularised Stribeck curves) often used to avoid Painlevé-type paradoxes and associated velocity jumps.With this foundation in place, the authors embed the refined law into a complete bowedstring setting. Here, the bow is treated as a ribbon of finite width interacting with several adjacent string elements, its compliance represented explicitly, and the string's torsional motion is included alongside transverse vibration. These aspects, while known from earlier studies, are necessary for a credible rendering of how a bow distributes force and how torsion moderates slip behaviour. Numerical experiments span both lumped and distributed formulations, illustrating transients and steady Helmholtz motion. Where the original Dupont model may lose passivity or admit multiple solutions, the refined version remains stable, passive, and free of non-physical solution branches, reproducing measured slip amplitudes and spectral content without recourse to artificial constraints. Thomas Risse, Thomas Hélie, and Fabrice Silva extend physical modeling to voice synthesis with a reduced-order yet physically grounded model of the human vocal apparatus formulated in a port-Hamiltonian (pH) framework [5]. A fluid-structure interaction governs the sound production mechanism: airflow from the lungs excites oscillations of the vocal folds, which modulate the flow into acoustic pressure waves in the vocal tract. Modeling voiced speech thus entails coupling a compressible airflow with vibrating tissue and a time-varying acoustic cavity. Risse et al. propose a quasi-1D distributed model for the glottal airflow and vocal tract, capturing effects such as cross-sectional area variation and air compressibility, and couple this with lumped representations of vocal fold dynamics and tract wall vibration. Each component is cast as a pH or energy-conserving subsystem, ensuring that when interconnected, power exchanges between flow, tissue, and acoustic radiation remain balanced, with no spurious energy gain or loss. A dedicated regularisation method is introduced to handle glottal closure (when the folds collide and airflow is momentarily cut off) in a numerically robust way. The continuous formulation is then discretised using structure-preserving methods, combining finite-volume schemes in space with an energy-consistent time integrator. This yields simulations that preserve the passivity of the full self-oscillating system while enabling real-time execution for the vocal tract. Numerical experiments explore both linear and nonlinear behaviour: frequency responses of static vocal-tract configurations (to verify formant placement), dynamic vowel transitions, and finally phonation, where the model generates self-sustained oscillations and synthesises vowel sequences with co-articulation effects. By combining theoretical rigour, computational efficiency, and stability, this work advances physically based voice synthesis beyond either unwieldy finite-element models or simplified low-order approximations prone to instability. Marco Comunità, Christian Steinmetz, and Joshua Reiss examine how far differentiable learning can take the emulation of nonlinear audio effects [6]. They map out the terrain of black-, grey-, and white-box strategies, then focus on the first two, comparing a wide set of architectures on guitar amplifiers, overdrive, distortion, fuzz, and compression. On the black-box side, they test LSTMs, temporal and gated convolutional networks, and structured state-space models; for grey-box, they propose block-oriented designs for compressors and for drive/fuzz circuits, using differentiable controllers to capture time-varying behaviour such as bias shift in fuzz pedals. A major practical contribution is the ToneTwist AFx dataset: forty analog and digital devices, recorded as dry/wet pairs over varied sources and playing styles, released with code and training scripts. Extensive experiments-objective metrics and listening tests-show no single method dominates across all devices, but highlight trade-offs: recurrent nets excel at strongly dynamic effects, convolutional nets at static nonlinearities. At the same time, grey-box models achieve good accuracy with far fewer parameters. The paper gives a clear picture of current capabilities and points to hybrid approaches and better training protocols as the next step for universal, differentiable models of audio effects. The articles gathered here show how far physics-based sound synthesis has come, and how diverse its concerns have become. A first theme is the drive for physical accuracy with numerical robustness. Whether through port-Hamiltonian formulations or detailed treatments of nonlinear friction, the models respect energy or passivity constraints while remaining stable. This marks a shift from early virtual instruments, which often traded rigour for simplicity, toward systems that achieve both fidelity and reliability, helped by modern computing power.A second thread is the use of machine learning as an ally rather than a rival: differentiable methods are being used to calibrate or extend physical models, or to learn sub-blocks (resonators, nonlinear mappings) under physical constraints. Virtual-analog, in particular, has seen a surge of black-and grey-box applications, marking a significant shift from traditional DSP techniques. Work on modal reverberation, such as large plate and room simulations, demonstrates how increased computing power through GPUs, FPGAs, and multicore CPUs has enabled the simulation of systems once out of reach.There is also an apparent concern for real-time performance and playability. Many contributions focus on achieving low latency and efficient implementation, utilising model-order reduction, hardware parallelism, or explicit schemes where stability permits. Physical models become most compelling when they can be performed, and efficiency often sharpens rather than dilutes their essence.

Keywords: Physical modeling, Real-time performance, hardware-efficient computation, machine-learning–assisted modeling, multi-physics and nonlinear systems, energy-consistent numerical schemes, Nonlinear systems, energy-consistent algorithms

Received: 29 Sep 2025; Accepted: 07 Oct 2025.

Copyright: © 2025 Hélie, Van Walstijn and Ducceschi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Michele Ducceschi, michele.ducceschi@unibo.it

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.