Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med., 08 December 2025

Sec. Geriatric Medicine

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1719174

This article is part of the Research TopicDementia and Long-Term CareView all articles

Development and validation of a screening model for dysphagia in the elderly based on acoustic features


Hongdan Song&#x;Hongdan SongDan Li&#x;Dan LiTao LiuTao LiuWei LuoWei LuoXu DongXu DongXiaoyan Jin
&#x;Xiaoyan Jin*‡Shaomei Shang
&#x;Shaomei Shang*‡
  • School of Nursing, Peking University, Beijing, China

Background: Dysphagia is a prevalent and serious condition among the elderly, yet scalable screening tools are lacking. This study aimed to develop and validate an automated machine learning model based on acoustic features for screening dysphagia risk in the elderly.

Methods: Adhering to TRIPOD guidelines, we conducted a study in three stages: variable screening, model construction, and evaluation. Audio data (voice, cough, swallow) were collected from the elderly in nursing homes. A modeling dataset (Beijing area, n = 419) was used to screen key features via LASSO regression. Models were built using Logistic Regression, Random Forest, SVM, and XGBoost, with performance evaluated on an internal test set. The best-performing model was subsequently validated on an external dataset (Shijiazhuang area, n = 216).

Results: The XGBoost model demonstrated superior performance, with an area under the curve (AUC) of 0.86 in internal validation and an AUC of 0.71 in external validation, showing good discrimination, calibration, and clinical utility.

Conclusion: The acoustic feature-based XGBoost model serves as an effective and automated tool for screening dysphagia risk in the elderly. It has the potential to assist healthcare professionals in identifying high-risk individuals for early intervention, thereby improving clinical outcomes.

1 Introduction

Dysphagia, also known as swallowing disorders or deglutition disorders, refers to a condition where the structure and/or function of organs such as the lower jaw, lips, tongue, soft palate, pharynx, and esophagus are impaired (1). The symptoms of being unable to safely and effectively transport food into the stomach can cause serious complications such as malnutrition, dehydration, and aspiration pneumonia (1), and significantly increase the psychological burden of patients (2, 3) and the pressure on the medical system (36). The prevalence rate of dysphagia among the elderly worldwide is as high as 48.1% (7). In China, the prevalence of this disease shows significant differences: it is 13.9% among the elderly in the community, rises to 26.4% in elderly care institutions, while it is as high as 57.7% among the hospitalized elderly (8, 9). Dysphagia screening refers to the initial determination of the existence and severity of dysphagia by identifying a series of related symptoms and signs, which is helpful for the early identification of related problems (10). Numerous studies at home and abroad, expert consensus and clinical guidelines have shown that implementing early assessment and intervention for the population at risk of dysphagia can effectively prevent related complications, control the progression of dysphagia, reduce mortality, and significantly improve prognosis (1115). Therefore, early dysphagia screening is the key to reducing the burden of dysphagia (10).

The change of swallowing acoustic signal can directly reflect the abnormal swallowing function. Normal swallowing produces a characteristic acoustic signature, typically comprising three distinct components corresponding to specific physiological activities: the initial sound related to laryngeal elevation and epiglottic movement, the main click associated with the opening of the upper esophageal sphincter, and the final sound linked to the separation of the tongue from the pharyngeal wall and the descent of the larynx (16). Specifically, when these acoustic features are abnormal, it indicates the corresponding physiological dysfunction. Prolonged duration may signify a delayed swallowing reflex or inefficient pharyngeal contraction, while abnormal signal properties (e.g., skewness, kurtosis) can indicate irregular bolus flow or the presence of pharyngeal residue (1721). Furthermore, the presence of pathological adventitious sounds, such as post-swallow wet voice or cough, is a critical acoustic marker of impaired airway protection and potential aspiration, often resulting from incomplete laryngeal closure or weakened pharyngeal motility (2224). Therefore, as a simple and non-invasive method, swallowing acoustic signal analysis can provide key physiological information reflecting the dynamic process of swallowing and is of significant value in the screening of dysphagia.

In recent years, swallowing acoustic analysis technology has become a research hotspot in the screening of dysphagia due to its non-invasiveness, portability and dynamic monitoring ability (17, 20). The acoustic analysis model based on machine learning has demonstrated high recognition performance in small-sample studies, confirming the representational potential of acoustic features for swallowing function (2527). However, at present, there is still a lack of a simple, safe, effective and user-friendly screening tool (10, 28). Studies show that many elderly people in communities and elderly care institutions do not routinely undergo dysphagia screening (29), and the lack of effective tools for swallowing function assessment is the main reason (30, 31). The “European Society for Dysphagia - EU Geriatrics White Paper” (10) proposes that the ideal dysphagia screening tool should be suitable for use by caregivers and primary health care providers, and have the characteristics of simplicity, safety, rapidity, accuracy and economy. Due to the degenerative changes of the swallowing organs and the high incidence of latent aspiration in the elderly population, there is an urgent need for a screening tool that is applicable to various environments and has higher safety and accuracy in order to carry out effective intervention and management.

This study constructed a prediction model based on the swallowing acoustic data of the elderly in Beijing, China, and screened out the optimal model through internal verification. Further external verification was conducted using the independent dataset in Shijiazhuang area, confirming that the model has good generalization ability. The research results show that this model demonstrates high clinical practical value among the elderly population with different risk levels. It can provide reliable decision support for clinical risk screening and remote home care, which is significant for the early identification and intervention of dysphagia in the elderly.

2 Materials and methods

This study consists of three consecutive steps: ① Screening of identification variables; ② Construction of the recognition model; ③ Evaluation of the recognition model (32, 33). Refer to Figure 1 for a detailed description of the study design.

FIGURE 1
Flowchart depicting a process for generating and validating a model using acoustic data. The process includes dataset integration from the Beijing Dysphagia Acoustic Database, data preprocessing, model generation using techniques like logistic regression and random forest, and model performance evaluation using tools like ROC curves. It incorporates a validation stage with external data to ensure model robustness. Steps are interconnected with arrows, showcasing data cleaning, feature selection, dataset splitting, and feedback for model adjustment to enhance performance.

Figure 1. Model construction and validation process.

2.1 Study participants

This study took the elderly in nursing homes as the research subjects. During the two time periods from September 2023 to February 2024 and from October 2024 to November 2024, research subjects were recruited from a total of 13 elderly care institutions in Beijing and Shijiazhuang, respectively according to the inclusion and exclusion standards.

Inclusion criteria: ①Age ≥ 60 years old; ② The condition is stable and the patient can eat orally. ③ Voluntarily participate in the research and sign the informed consent form. Exclusion criteria: ① Cognitive or behavioral disorders affect the execution of instructions; ② Acute stage/terminal stage of the disease; ③ Those who are unable to cooperate with the assessment due to other serious physical or mental illnesses.

Based on previous studies, the sample size of this research was determined by taking into account the following factors:

1. Calculation based on effect size: the sample size is calculated (34) as n=(Zα/2+Zβ)2×kf2. To ensure adequate power (90%) for detecting significant differences in acoustic features across groups (dysphagia vs. non-dysphagia; and across the three audio modalities) with a small effect size (f = 0.1), a minimum of 7,288 audio samples was required, accounting for a 15% attrition rate due to quality control.

2. Calculation based on machine learning requirements: adhering to the widely accepted heuristic for binary classification models, we ensured that the number of outcome events would be at least 10 times the number of candidate predictor variables (n = 23), requiring a minimum of 230 audio samples with a positive outcome.

3. Translation to participant number: the most stringent requirement came from the effect size calculation (7,288 samples). Given that each participant would contribute a minimum of 12 audio samples, the required number of participants was 608.

2.2 Outcome variables and assessment tools

The binary outcome variable was defined based on the Water Swallow Test (WST), a practical and widely used screening tool that, while imperfect, provides a standardized reference for initial risk assessment. Audio data from participants with WST grades 3–5 were labeled as “high risk for dysphagia.”

2.3 Identify variables and evaluation tools

Based on literature evidence and prior validation, this study selected 23 discriminative acoustic features across four domains as model inputs: (1) five time-domain features (duration, mean amplitude, amplitude SD, skewness, kurtosis); (2) 10 frequency-domain features (fundamental frequency, formants F1-F3, peak/center frequencies, bandwidth, mean/SD harmonic-to-noise ratio, spectral centroid); (3) three energy features (total energy, average/peak power); and (4) five non-linear features (amplitude/frequency perturbation, fractal dimension, aggregation/average entropy).

2.4 Acoustic data acquisition

All audio was recorded on-site in isolated rooms within the nursing homes to ensure acoustic fidelity. We used a contact microphone (C411L, AKG) affixed inferior to the cricoid cartilage (35, 36) (Figure 2) and connected to a digital recorder (Portacapture X8, TASCAM) configured at a 44.1 kHz sampling rate and 24-bit depth. Participants were positioned in a standardized seated posture with their backs fully supported against the chair backrest, feet flat on the floor, and knees flexed at approximately 90 degrees to ensure stability. The head and neck were maintained in a neutral position throughout recording, and participants were instructed to avoid head movements to maintain consistent microphone placement and signal integrity. The recording protocol included: (1) A 10-s baseline ambient noise capture; (2) A series of predefined vocal, swallowing, and cough tasks performed sequentially (detailed in Supplementary Table 1); (3) A concluding 10-s ambient recording.

FIGURE 2
Illustration of an individual wearing a dark top, with a medical device secured under a bandage on their neck. A wire emerges from the bandage, which the person is indicating with their finger. No identifiable facial features are included in the illustration.

Figure 2. Microphone placement and securement.

2.5 Dataset construction

The modeling dataset comprised 4,965 high-quality audio samples from 419 participants after quality control (see Supplementary Figure 1 for the detailed construction process). For external validation, a refined dataset of 2,957 samples from 216 subjects was prospectively collected using identical protocols (Supplementary Figure 2).

2.6 Model predictors

From the initial 23 extracted acoustic features in the training set, LASSO regression (37, 38) identified 12 discriminative variables for elderly dysphagia recognition: (1) two time-domain features (audio duration, skewness); (2) four frequency-domain features (center frequency, formants F1–F2, frequency bandwidth); (3) two energy-related features (total energy, peak power); and (4) four non-linear features (amplitude perturbation, waveform fractal dimension, aggregation entropy, average entropy).

2.7 Statistics

The determination of an adequate minimum sample size for developing a multivariate predictive model hinges on ensuring a robust representation of individuals and outcome events in relation to predictor parameters (39). The combined sample size from Beijing’s eight nursing homes and Shijiazhuang’s five nursing homes provided ample support for utilizing 12 predictive variables in both modeling and validation phases. Specifically, there were 419 cases (4,965 audio samples) in the modeling dataset, encompassing 104 (24.82%) patients who experienced dysphagia; while external validation involved 216 cases (2,957 audio samples), encompassing 62 (28.0%) patients who experienced dysphagia. Statistical analyses were performed according to the data characteristics. Normally distributed continuous variables were expressed as mean ± standard deviation and compared using Student’s t-tests. Non-normally distributed variables were reported as median (interquartile range) with between-group comparisons analyzed by Mann-Whitney U-tests. Categorical data were presented as counts and percentages, with chi-square tests used for group comparisons. Statistical significance was defined as p < 0.05 for all analyses. All the statistical analyses of this study were conducted in the Python3.12 software environment.

2.8 Model development and comparison

The modeling dataset was randomly split into a training set (80%) and an internal test set (20%) for model development and initial validation. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training data. Using the 12 predictor variables selected by LASSO regression, we constructed and compared four distinct machine learning algorithms: Logistic Regression, Random Forest, Support Vector Machine (SVM), and XGBoost. Model performance was evaluated based on discrimination (assessed by the area under the receiver operating characteristic curve, AUC), calibration, and clinical utility (using decision curve analysis). The optimal model was subsequently interpreted using SHapley Additive exPlanations (SHAP) to quantify feature contributions and was finally validated on the independent external validation set.

All analyses were conducted in Python 3.12 using standard libraries (scikit-learn, XGBoost). Model hyperparameters were optimized via cross-validation (detailed configurations are provided in Supplementary Table 2).

2.9 Ethics

This study received ethical approval (IRB00001052-23160). Written informed consent was obtained from both institutional authorities (participating nursing homes and community health service stations) and individual participants prior to data collection. All collected survey data were maintained with strict confidentiality, and the entire research process complied with ethical requirements.

3 Results

3.1 Characteristics of the model development set

A total of 419 participants were included in the model development cohort. Based on the Water Swallow Test, 104 participants (24.82%) were identified as having dysphagia (WST grade ≥ 3). Comparative analysis revealed that the dysphagia and non-dysphagia groups showed statistically significant differences (P < 0.05) in key clinical characteristics, including the number of medication types, history of dysphagia-related diseases, BMI, nutritional status, daily activity ability, and MMSE scores. Detailed demographic and clinical characteristics for the entire cohort and by group are provided in Supplementary Table 3.

The 4,965 high-quality audio samples formed the basis for feature extraction. A total of 23 acoustic features were extracted from each sample, which were then combined with the outcome variable to create the final modeling dataset. A detailed description of the extracted acoustic features is provided in Supplementary Table 4.

3.2 Characteristics of the external validation set

The external validation cohort consisted of 216 participants. Among them, 62 (28.70%) were classified as having dysphagia (WST grade ≥ 3), which is comparable to the prevalence in the development cohort. The basic characteristics of this cohort are summarized in Supplementary Table 5.

The external validation set consisted of 2,957 validated audio samples obtained from 216 participants. Complete data specifications and categorical distributions are detailed in Supplementary Table 6.

3.3 Variable screening

LASSO regression identified 12 predictive acoustic features from the initial set of 23. The selected features encompassed temporal, spectral, energy, and non-linear domains, including audio duration, formants F1 and F2, and fractal dimension (a complete list is provided in see section “2.6 Model predictors”).

Analysis of the feature selection process revealed that formant F1, fractal dimension, and aggregation entropy were the most robust predictors, as they maintained non-zero coefficients across the strongest regularization penalties (Supplementary Figure 3). The optimal regularization parameter was determined via 10-fold cross-validation (Supplementary Figure 4).

This refined set of 12 features was used for all subsequent model development, following appropriate data balancing.

3.4 Interpretation of the model

In the training set, the XGBoost model exhibited the highest accuracy at 0.78, significantly outperforming both Random Forest (0.70) and SVM (0.73) (P < 0.05). In terms of AUC value comparison, the XGBoost model achieved a value of 0.86 (95% CI: 0.81∼0.91), significantly outperforming other models (P < 0.05). And the XGBoost model demonstrated optimal sensitivity (0.80), correctly identifying 80% of elderly dysphagia cases, while maintaining superior specificity (0.76). The probability calibration curve of the XGBoost model is closest to the ideal calibration line and shows good calibration effects in different probability intervals. In terms of clinical efficacy, the XGBoost model shows the highest net benefit in the decision curve analysis, especially in the medium and high threshold range (0.3–1.0), and can provide the maximum benefit for clinical decision-making (Figures 35). The comprehensive comparison of all performance metrics is presented in Supplementary Table 7.

FIGURE 3
ROC curves comparing Logistic Regression, Random Forest, SVM, and XGBoost for classification performance. XGBoost shows the highest performance with an AUC of 0.86, followed by SVM (0.80), Random Forest (0.79), and Logistic Regression (0.57). The plot includes a baseline for reference.

Figure 3. ROC curves of the four models.

FIGURE 4
Four probability calibration curves are shown comparing model predictions to perfect calibration. (a) Logistic Regression: curve slightly below the diagonal. (b) Random Forest: curve below the diagonal. (c) SVM: curve crossing the diagonal. (d) XGBoost: curve approaches diagonal more closely. Each plot shows mean predicted probability versus fraction of positives.

Figure 4. Probability calibration curves for dysphagia recognition models, including Logistic Regression (a), Random Forest (b), SVM (c), and XGBoost (d).

FIGURE 5
Four decision curve analysis graphs comparing different models are shown. (a) Logistic Regression, (b) Random Forest, (c) SVM, and (d) XGBoost. Each graph plots net benefit against threshold probability, highlighting areas under the curve in blue, with lines for “Treat all” and “Treat none” scenarios.

Figure 5. Decision curve analysis for dysphagia recognition models, including Logistic Regression (a), Random Forest (b), SVM (c), and XGBoost (d).

3.5 Internal and external validation of the final model

Internal validation of the XGBoost model on the hold-out test set demonstrated an AUC of 0.86 (95% CI: 0.81–0.91), with an accuracy of 0.78, a sensitivity of 0.80, and a specificity of 0.76.

External validation revealed that the model’s performance when applied directly to the independent cohort was initially limited (AUC = 0.52). Following hyperparameter optimization on the external validation set, the model’s discrimination improved to an AUC of 0.71 (95% CI: 0.67–0.75) and showed good calibration.

The ROC curves, calibration plots, and feature importance rankings for the final model in both internal and external validations are provided in Supplementary Figures 58.

4 Discussion

This study has developed a ML model based on EMR data from multiple center hospitals, demonstrating strong discriminative ability and clinical utility in identifying dysphagia in the elderly. The model’s performance has been externally validated across regions. Compared with traditional assessment methods such as the Water Swallow Test, it has the following advantages: (1) Objective acoustic analysis can reduce the deviation of human assessment; (2) It is easy to operate and suitable for community screening; (3) Combined with intelligent devices, it is expected to achieve remote monitoring and provide emergency early warnings such as aspiration and asphyxia. The model demonstrates significant clinical value by effectively assisting healthcare providers in identifying high-risk dysphagia patients, particularly in resource-limited settings. Its implementation facilitates early intervention, reduces complication rates, and advances timely diagnosis and treatment for elderly populations in China.

This study presents a novel approach to dysphagia screening by developing a machine learning model based on multimodal acoustic features specifically for community-dwelling the elderly. Unlike previous studies that often relied on single-modality analysis or focused on neurologically impaired populations, our model leverages the integration of voice, cough, and swallow sounds to capture a more comprehensive picture of swallowing physiology in a broader geriatric population. The rigorous external validation across different geographic regions further underscores the model’s generalizability and practical application potential for large-scale community screening.

It is widely acknowledged that ML models offer superior predictive performance compared to traditional linear models (40). This advantage enables the construction of a relatively robust model from complex data (41). ML excels in handling heterogeneous multidimensional data, as demonstrated by the inclusion of 12 predictive variables in this study, such as the audio duration and skewness, which exhibited high heterogeneity.

Recent studies have focused on the accuracy of identifying dysphagia. Donohue et al. (42) used a linear hybrid model and a machine learning classifier to distinguish the swallowing ability of healthy individuals and patients with neurodegenerative diseases based on 170 swallowing data from 20 patients with neurodegenerative diseases and 171 swallowing data from 51 healthy adults. Although they achieved a relatively high accuracy rate (99%), the sample size was limited and there was a lack of cross-population validation. Steele et al. (43) developed an LDA model based on samples from 305 patients with oropharyngeal dysphagia from seven medical centers in the United States. During the swallowing of liquid barium, VFSS and biaxial acceleration signals were simultaneously collected, achieving an average AUC of 81.5%. The advantage of this model lies in the use of multi-center data, which enhanced the representativeness of the model. This study, through large-sample cross-regional external validation, not only surpassed the research of Donohue et al. (42) in sample size, but also outperformed the results of Steele et al. (43) in classification performance (AUC = 0.86). These findings not only confirm the theoretical validity of the model, but also highlight its reliability and universality in clinical application through strict cross-regional validation, effectively compensating for the methodological limitations of existing studies that emphasize model performance over practical application value.

We developed and externally validated a screening model for dysphagia in the elderly using 12 acoustic features derived from participants across 13 nursing homes in Beijing and Shijiazhuang. Ultimately, the XGBoost algorithm yielded a model with superior accuracy (0.78), AUC (0.86), and clinical net benefit performs the best, especially in the medium and high threshold range (0.3–1.0). Similarly, this model demonstrated excellent performance on both internal and external validation sets.

Furthermore, in this study, by integrating multimodal acoustic characteristics, while maintaining a comparable accuracy rate to the single-modal study, it maintained a relatively high sensitivity (0.84) and specificity (0.79), achieving a balance between sensitivity and specificity. At the same time, the misdiagnosis rate (false positive) could be controlled below 21%, and the missed diagnosis rate (false negative) could be controlled within 16%.

A critical consideration is the fundamental advantage of this acoustic model over simpler, established screening tools like the EAT-10 questionnaire (44). While the EAT-10 relies on subjective patient self-reporting, which can be unreliable in older adults with cognitive impairment or lack of insight, our model provides a fully objective and quantitative assessment. It does not depend on patient comprehension or subjective symptom reporting. The acoustic analysis automates the screening process, reducing reliance on clinical intuition and potential bias in human-administered swallowing trials. Therefore, its primary advantage lies in its potential to offer a standardized, objective, and automatable first-line screening method, particularly valuable in settings with high patient volume or limited specialist availability, and for populations where traditional self-reporting tools are less effective.

This study has several limitations that should be acknowledged. The primary limitation is the use of the Water Swallow Test (WST) as the reference standard instead of instrumental assessments like videofluoroscopy (VFSS) or endoscopy (FEES). This pragmatic choice was necessitated by the large-scale, community-based design of our study. Consequently, our model should be interpreted as a screening tool for dysphagia risk, not a definitive diagnostic instrument. The model’s performance is inherently bounded by the accuracy of the WST, particularly in detecting silent aspiration. Future research must validate this model against VFSS/FEES in a clinical setting to establish its diagnostic accuracy and ability to detect silent aspiration.

Additionally, at the data level, collecting multiple audio samples from each participant poses a risk of reducing data independence. Future larger-scale studies can adopt mixed-effects models for in-depth analysis. At the hardware level, the current equipment may cause discomfort during prolonged use due to rigid materials, and wired connections limit patient mobility. The lack of integrated remote, real-time data transmission also curtails its potential for telemedicine. Future iterations should focus on developing ergonomic, wireless devices to enhance user comfort and enable continuous monitoring.

In conclusion, we have developed a recognition model for dysphagia in the elderly based on acoustic features, which has clinical advantages and high promotion value. This is of great significance for improving the quality of life of patients, reducing medical costs caused by late diagnosis, and achieving early intervention and optimized management of dysphagia.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Institutional Review Board of Peking University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

DL: Conceptualization, Data curation, Investigation, Software, Writing – original draft, Writing – review & editing. HS: Conceptualization, Data curation, Investigation, Software, Writing – original draft, Writing – review & editing. TL: Investigation, Writing – review & editing. WL: Investigation, Writing – review & editing. XD: Data curation, Formal analysis, Writing – review & editing. XJ: Formal analysis, Project administration, Writing – review & editing. SS: Conceptualization, Methodology, Project administration, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1719174/full#supplementary-material

References

1. van der Maarel-Wierink C, Vanobbergen J, Bronkhorst E, Schols J, de Baat C. Meta-analysis of dysphagia and aspiration pneumonia in frail elders. J Dent Res. (2011) 90:1398–404. doi: 10.1177/0022034511422909

PubMed Abstract | Crossref Full Text | Google Scholar

2. Doruk C, Mocchetti V, Rives H, Christos P, Rameau A. Correlations between anxiety and/or depression diagnoses and dysphagia severity. Laryngoscope. (2024) 134:2115–20. doi: 10.1002/lary.31164

PubMed Abstract | Crossref Full Text | Google Scholar

3. Ekberg O, Hamdy S, Woisard V, Wuttge-Hannig A, Ortega P. Social and psychological burden of dysphagia: its impact on diagnosis and treatment. Dysphagia. (2002) 17:139–46. doi: 10.1007/s00455-001-0113-5

PubMed Abstract | Crossref Full Text | Google Scholar

4. Panebianco M, Marchese-Ragona R, Masiero S, Restivo D. Dysphagia in neurological diseases: a literature review. Neurol Sci. (2020) 41:3067–73. doi: 10.1007/s10072-020-04495-2

PubMed Abstract | Crossref Full Text | Google Scholar

5. Oliveira A, Friche A, Salomão M, Bougo G, Vicente L. Predictive factors for oropharyngeal dysphagia after prolonged orotracheal intubation. Braz J Otorhinolaryngol. (2018) 84:722–8. doi: 10.1016/j.bjorl.2017.08.010

PubMed Abstract | Crossref Full Text | Google Scholar

6. Carrión S, Cabré M, Monteis R, Roca M, Palomera E, Serra-Prat M, et al. Oropharyngeal dysphagia is a prevalent risk factor for malnutrition in a cohort of older patients admitted with an acute disease to a general hospital. Clin Nutr. (2015) 34:436–42. doi: 10.1016/j.clnu.2014.04.014

PubMed Abstract | Crossref Full Text | Google Scholar

7. Rajati F, Ahmadi N, Naghibzadeh Z, Kazeminia M. The global prevalence of oropharyngeal dysphagia in different populations: a systematic review and meta-analysis. J Transl Med. (2022) 20:175. doi: 10.1186/s12967-022-03380-0

PubMed Abstract | Crossref Full Text | Google Scholar

8. Li C, Zhang M, Dou Z, Wen H, An L. Prevalence of dysphagia in China: an epidemiology survey of 6102 participants. Chinese J Phys Med Rehabil. (2017) 39:937–43. doi: 10.3760/cma.j.issn.0254-1424.2017.12.014

Crossref Full Text | Google Scholar

9. Chen Y, Fei R, Gao Y, Zhang Q. Application of swallowing function assessment in hospitalized elderly patients. Chinese J General Pract. (2015) 13:1880–2. doi: 10.16766/j.cnki.issn.1674-4152.2015.11.022

Crossref Full Text | Google Scholar

10. Baijens L, Clavé P, Cras P, Ekberg O, Forster A, Kolb G, et al. European society for swallowing disorders - European union geriatric medicine society white paper: oropharyngeal dysphagia as a geriatric syndrome. Clin Interv Aging. (2016) 11:1403–28. doi: 10.2147/CIA.S107750

PubMed Abstract | Crossref Full Text | Google Scholar

11. Carnaby G, Hankey G, Pizzi J. Behavioural intervention for dysphagia in acute stroke: a randomised controlled trial. Lancet Neurol. (2006) 5:31–7. doi: 10.1016/S1474-4422(05)70252-0

PubMed Abstract | Crossref Full Text | Google Scholar

12. Dennis M, Lewis S, Warlow C. Effect of timing and method of enteral tube feeding for dysphagic stroke patients (FOOD): a multicentre randomised controlled trial. Lancet. (2005) 365:764–72. doi: 10.1016/S0140-6736(05)17983-5

PubMed Abstract | Crossref Full Text | Google Scholar

13. Palli C, Fandler S, Doppelhofer K, Niederkorn K, Enzinger C, Vetta C, et al. Early dysphagia screening by trained nurses reduces pneumonia rate in stroke patients: a clinical intervention study. Stroke. (2017) 48:2583–5. doi: 10.1161/STROKEAHA.117.018157

PubMed Abstract | Crossref Full Text | Google Scholar

14. Hines S, Kynoch K, Munday J. Nursing interventions for identifying and managing acute dysphagia are effective for improving patient outcomes: a systematic review update. J Neurosci Nurs. (2016) 48:215–23. doi: 10.1097/JNN.0000000000000200

PubMed Abstract | Crossref Full Text | Google Scholar

15. Chinese Dysphagia Rehabilitation Assessment and Treatment Expert Consensus Group. Chinese expert consensus on dysphagia assessment and treatment (2017 edition) Part 1: Assessment. Chinese J Phys Med Rehabil. (2017) 39:881–92. doi: 10.3760/cma.j.issn.0254-1424.2017.12.001

Crossref Full Text | Google Scholar

16. Morinière S, Boiron M, Alison D, Makris P, Beutter P. Origin of the sound components during pharyngeal swallowing in normal subjects. Dysphagia. (2008) 23:267–73. doi: 10.1007/s00455-007-9134-z

PubMed Abstract | Crossref Full Text | Google Scholar

17. Santamato A, Panza F, Solfrizzi V, Russo A, Frisardi V, Megna M, et al. Acoustic analysis of swallowing sounds: a new technique for assessing dysphagia. J Rehabil Med. (2009) 41:639–45. doi: 10.2340/16501977-0384

PubMed Abstract | Crossref Full Text | Google Scholar

18. Zenner P, Losinski D, Mills R. Using cervical auscultation in the clinical dysphagia examination in long-term care. Dysphagia. (1995) 10:27–31. doi: 10.1007/BF00261276

PubMed Abstract | Crossref Full Text | Google Scholar

19. Stroud A, Lawrie B, Wiles C. Inter- and intra-rater reliability of cervical auscultation to detect aspiration in patients with dysphagia. Clin Rehabil. (2002) 16:640–5. doi: 10.1191/0269215502cr533oa

PubMed Abstract | Crossref Full Text | Google Scholar

20. Borr C, Hielscher-Fastabend M, Lücking A. Reliability and validity of cervical auscultation. Dysphagia. (2007) 22:225–34. doi: 10.1007/s00455-007-9078-3

PubMed Abstract | Crossref Full Text | Google Scholar

21. Leslie P, Drinnan M, Finn P, Ford G, Wilson J. Reliability and validity of cervical auscultation: a controlled comparison using videofluoroscopy. Dysphagia. (2004) 19:231–40. doi: 10.1007/BF02638588

Crossref Full Text | Google Scholar

22. Takahashi K. Cervical auscultation-clinical tool for detecting dysphagia. J Showa Univer Dental Soc. (2005) 25:167. doi: 10.11516/dentalmedres1981.25.167

Crossref Full Text | Google Scholar

23. Rubesin S. Oral and pharyngeal dysphagia. Gastroenterol Clin North Am. (1995) 24:331–52. doi: 10.1016/S0889-8553(21)00196-5

Crossref Full Text | Google Scholar

24. Groves-Wright K, Boyce S, Kelchner L. Perception of wet vocal quality in identifying penetration/aspiration during swallowing. J Speech Lang Hear Res. (2010) 53:620–32. doi: 10.1044/1092-4388(2009/08-0246)

PubMed Abstract | Crossref Full Text | Google Scholar

25. Aboofazeli M, Moussavi Z. Analysis of swallowing sounds using hidden Markov models. Med Biol Eng Comput. (2008) 46:307–14. doi: 10.1007/s11517-007-0285-8

PubMed Abstract | Crossref Full Text | Google Scholar

26. Dudik J, Coyle J, El-Jaroudi A, Mao Z, Sun M, Sejdić E. Deep learning for classification of normal swallows in adults. Neurocomputing. (2018) 285:1–9. doi: 10.1016/j.neucom.2017.12.059

PubMed Abstract | Crossref Full Text | Google Scholar

27. Aboofazeli M, Moussavi Z. Automated extraction of swallowing sounds using a wavelet-based filter. Conf Proc IEEE Eng Med Biol Soc. (2006) 2006:5607–10. doi: 10.1109/IEMBS.2006.259353

PubMed Abstract | Crossref Full Text | Google Scholar

28. Wirth R, Dziewas R, Beck A, Clavé P, Hamdy S, Heppner H, et al. Oropharyngeal dysphagia in older persons - from pathophysiology to adequate intervention: a review and summary of an international expert meeting. Clin Interv Aging. (2016) 11:189–208. doi: 10.2147/CIA.S97481

PubMed Abstract | Crossref Full Text | Google Scholar

29. Lam P, Bailey E, Steel C. Exploring dysphagia assessment and management in canadian primary care: a clinical practice survey. Can J Diet Pract Res. (2025) 86:84–9. doi: 10.3148/cjdpr-2025-001

PubMed Abstract | Crossref Full Text | Google Scholar

30. Imaizumi M, Suzuki T, Ikeda M, Matsuzuka T, Goto A, Omori K. Implementing a flexible endoscopic evaluation of swallowing at elderly care facilities to reveal characteristics of elderly subjects who screened positive for a swallowing disorder. Auris Nasus Larynx. (2020) 47:602–8. doi: 10.1016/j.anl.2020.02.004

PubMed Abstract | Crossref Full Text | Google Scholar

31. Tsang K, Lau E, Shazra M, Eyres R, Hansjee D, Smithard DG. A new simple screening tool-4QT: can it identify those with swallowing problems? A pilot study. Geriatrics. (2020) 5:11. doi: 10.3390/geriatrics5010011

PubMed Abstract | Crossref Full Text | Google Scholar

32. Moons K, Kengne A, Woodward M, Royston P, Vergouwe Y, Altman D, et al. Risk prediction models: i. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. (2012) 98:683–90. doi: 10.1136/heartjnl-2011-301246

PubMed Abstract | Crossref Full Text | Google Scholar

33. Moons K, Kengne A, Grobbee D, Royston P, Vergouwe Y, Altman D, et al. Risk prediction models: ii. External validation, model updating, and impact assessment. Heart. (2012) 98:691–8. doi: 10.1136/heartjnl-2011-301247

PubMed Abstract | Crossref Full Text | Google Scholar

34. Liu X. Statistical Power Analysis for the Social and Behavioral Sciences: Basic and Advanced Techniques. Milton Park: Routledge (2013).

Google Scholar

35. Cichero J, Murdoch B. Detection of swallowing sounds: methodology revisited. Dysphagia. (2002) 17:40–9. doi: 10.1007/s00455-001-0100-x

PubMed Abstract | Crossref Full Text | Google Scholar

36. Takahashi K, Groher M, Michi K. Methodology for detecting swallowing sounds. Dysphagia. (1994) 9:54–62. doi: 10.1007/BF00262760

PubMed Abstract | Crossref Full Text | Google Scholar

37. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol. (1996) 58:267–88. doi: 10.1111/j.2517-6161.1996.tb02080.x

Crossref Full Text | Google Scholar

38. Ruppert D. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Milton Park: Taylor & Francis (2004).

Google Scholar

39. Riley R, Snell K, Ensor J, Burke D, Harrell F, Moons K, et al. Minimum sample size for developing a multivariable prediction model: part II - binary and time-to-event outcomes. Stat Med. (2019) 38:1276–96. doi: 10.1002/sim.7992

PubMed Abstract | Crossref Full Text | Google Scholar

40. Ota R, Yamashita F. Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics. J Control Release. (2022) 352:961–9. doi: 10.1016/j.jconrel.2022.11.014

PubMed Abstract | Crossref Full Text | Google Scholar

41. Zhou S, Jv D, Meng X, Zhang J, Liu C, Wu Z, et al. Feasibility of machine learning-based modeling and prediction using multiple centers data to assess intrahepatic cholangiocarcinoma outcomes. Ann Med. (2023) 55:215–23. doi: 10.1080/07853890.2022.2160008

PubMed Abstract | Crossref Full Text | Google Scholar

42. Donohue C, Khalifa Y, Perera S, Sejdić E, Coyle JL. A preliminary investigation of whether HRCA signals can differentiate between swallows from healthy people and swallows from people with neurodegenerative diseases. Dysphagia. (2021) 36:635–43. doi: 10.1007/s00455-020-10177-0

PubMed Abstract | Crossref Full Text | Google Scholar

43. Steele C, Mukherjee R, Kortelainen J, Pölönen H, Jedwab M, Brady S, et al. Development of a non-invasive device for swallow screening in patients at risk of oropharyngeal Dysphagia: results from a prospective exploratory study. Dysphagia. (2019) 34:698–707. doi: 10.1007/s00455-018-09974-5

PubMed Abstract | Crossref Full Text | Google Scholar

44. Belafsky P, Mouadeb D, Rees C, Pryor J, Postma G, Allen J, et al. Validity and reliability of the Eating assessment tool (EAT-10). Ann Otol Rhinol Laryngol. (2008) 117:919–24. doi: 10.1177/000348940811701210

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: dysphagia, XGBoost, acoustic analysis, the elderly, screening

Citation: Song H, Li D, Liu T, Luo W, Dong X, Jin X and Shang S (2025) Development and validation of a screening model for dysphagia in the elderly based on acoustic features. Front. Med. 12:1719174. doi: 10.3389/fmed.2025.1719174

Received: 05 October 2025; Revised: 12 November 2025; Accepted: 17 November 2025;
Published: 08 December 2025.

Edited by:

Ivo Popivanov, New Bulgarian University, Bulgaria

Reviewed by:

Xiaoyan Chen, Zigong City Mental Health Center, China
Vuk Milošević, University of Niš, Serbia

Copyright © 2025 Song, Li, Liu, Luo, Dong, Jin and Shang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiaoyan Jin, anh5Ym11QHNpbmEuY29t; Shaomei Shang, c2hhbmdzaGFvbWVpQDEyNi5jb20=

These authors have contributed equally to this work

These authors have contributed equally as corresponding authors

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.