AUTHOR=Donhauser Jonas , Tur Bogac , Döllinger Michael 

TITLE=Neural network-based estimation of biomechanical vocal fold parameters

JOURNAL=Frontiers in Physiology

VOLUME=Volume 15 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2024.1282574

DOI=10.3389/fphys.2024.1282574

ISSN=1664-042X

ABSTRACT=Vocal Fold (VF) vibrations are the primary source of human phonation. High-Speed Video endoscopy (HSV) enables the computation of descriptive VF parameters for the assessment of physiological properties of laryngeal dynamics, i.e. the vibration of the vocal folds. However, underlying biomechanical factors being responsible for physiological and disordered vocal fold vibrations cannot be accessed. In contrast, physically based numerical VF models reveal insights into the organ's oscillations, which remain inaccessible through endoscopy. To estimate biomechanical properties, previous research has fitted subglottal pressure driven Mass-Spring-Damper systems, as inverse problem to the HSV recorded VF trajectories, by global optimization of the numerical model. A Neural Network trained on the numerical model, may be used as substitution for computationally expensive optimization, yielding a fast evaluating surrogate of the biomechanical inverse problem. This paper proposes a Convolutional Recurrent Neural Network (CRNN) based architecture trained on regression of a physiological based biomechanical Six-Mass Model (6MM). To compare with previous research, the underlying biomechanical factor "subglottal pressure" prediction was tested against 288 HSV ex-vivo porcine recordings. The contributions of this work are twofold: Firstly, the presented CRNN with the 6MM handles multiple trajectories along the vocal folds, which allows for investigations on local changes in VF characteristics. Secondly, the network was trained to reproduce further important biomechanical model parameters like VF mass and stiffness on synthetic data. Unlike in previous work, the presented network is therefore an entire surrogate of the inverse problem, which allowed for explicit computation of the fitted model using our approach. The presented approach achieves a best-case Mean Absolute Error (MAE) of 133Pa (13.9%) in subglottal pressure prediction with 76.6% correlation on experimental data and a reestimated fundamental frequency MAE of 15.9Hz (9.9%). In-detail training analysis revealed subglottal pressure as the most learnable parameter.With the physiological based model design and advances in fast parameter prediction, this work is a next step in biomechanical VF model fitting and the estimation of laryngeal kinematics.